CN115170836A

CN115170836A - Cross-domain re-identification method based on shallow texture extraction and related equipment

Info

Publication number: CN115170836A
Application number: CN202210905641.6A
Authority: CN
Inventors: 徐颖; 陈晓清; 蔡大森; 汤俊杰; 陈明伟
Original assignee: Shenzhen University
Current assignee: Shenzhen University
Priority date: 2022-07-29
Filing date: 2022-07-29
Publication date: 2022-10-11

Abstract

The invention discloses a cross-domain re-recognition method and related equipment based on shallow texture extraction.A re-recognition model is added on the basis of circularly generating a countermeasure network, the re-recognition model extracts a generated image and a shallow characteristic diagram of a target domain image, and the image generated by a generator is closer to the target domain image at a shallow texture level by using texture loss for supervision; the re-recognition model extracts deep feature codes of images on a source domain, a generation domain and a target domain on a measurement branch, identity information of the generated images is reserved by using a measurement learning loss function, the re-recognition model distinguishes the generated images from the target domain images, the method can generate images which are closer to the target domain style and are more suitable for a re-recognition task, and in the re-recognition task, a multi-granularity feature extraction model is improved by using an adaptive local feature segmentation method and a local information dynamic matching method, so that the problem of posture mismatch can be relieved to a certain extent.

Description

Cross-domain re-identification method based on shallow texture extraction and related equipment

Technical Field

The invention relates to the technical field of machine vision, in particular to a cross-domain re-identification method, a system, a terminal and a computer readable storage medium based on shallow texture extraction.

Background

With the rapid development of economy and science and technology, the urbanization process of China is steadily advancing, the number of monitoring cameras in each scene is continuously increased, and the intelligent monitoring technology is very favorably landed. It has been emphasized that additional annotation information or training samples are added to enhance the features learned by the neural network model by using the auxiliary feature characterization learning method.

In recent years, a plurality of methods are proposed, for example, a deep attribute learning framework is proposed by combining predicted semantic attribute information, so that the generalization and robustness of feature representation in semi-supervised learning are enhanced; enhancing features with viewpoint information; regarding each camera as a different domain, providing a multi-camera consistent matching constraint for obtaining a global optimal representation in a deep learning framework; attempts to use GAN for re-recognition improve supervised feature representation learning based on generated pedestrian images. However, none of the above methods consider texture information. It is known that the styles of data sets acquired from different scenes are different, and in order to increase the number of samples to improve the generalization performance of the model, the styles of different data sets need to be migrated to generate more training samples. The style is often related to the texture features of the sample, and the texture features are restricted, so that the style migration effect can be improved. In the re-recognition task, if the angle of view of the camera for acquiring the sample is not ideal, the effective sensing area and position of the target on the image are different, and the background area in part of the detection results is too large.

Accordingly, the prior art is yet to be improved and developed.

Disclosure of Invention

The invention mainly aims to provide a cross-domain re-recognition method, a cross-domain re-recognition system, a cross-domain re-recognition terminal and a computer-readable storage medium based on shallow texture extraction, and aims to solve the problem that in the re-recognition task in the prior art, if the camera angle for acquiring a sample is not ideal, the effective sensing area and the position of a target on an image are different, and the occupation ratio of a background area in a part of detection results is too large.

In order to achieve the above object, the present invention provides a cross-domain re-identification method based on shallow texture extraction, wherein the cross-domain re-identification method based on shallow texture extraction comprises the following steps:

acquiring a source domain data set and a target domain data set, and inputting the source domain data set and the target domain data set into a cyclic generation countermeasure network;

respectively sampling the source domain data set and the target domain data set to obtain a source domain picture and a target domain picture, circularly generating a generated picture by the countermeasure network according to the style of the target domain picture, converting the source domain picture into a generated picture close to the style of the target domain picture, and inputting the generated picture, the source domain picture and the target domain picture into a re-identification model;

respectively extracting the shallow feature map and the deep feature map which are obtained after the generated picture, the source domain picture and the target domain picture are input into the re-recognition model, and performing loss calculation;

according to the loss calculation gradient, updating parameters of the circularly generated countermeasure network and the re-identification model;

obtaining an updated cycle-generating confrontation network, and converting the source domain data set into the style of the target domain data set by using the updated cycle-generating confrontation network;

and sending the updated picture generated by the circularly generated confrontation network into a multi-granularity feature extraction model for training to obtain a re-recognition model with improved cross-domain recognition performance.

In addition, to achieve the above object, the present invention further provides a cross-domain re-identification system based on shallow texture extraction, wherein the cross-domain re-identification system based on shallow texture extraction includes:

the data acquisition module is used for acquiring a source domain data set and a target domain data set and inputting the source domain data set and the target domain data set into the circularly generated countermeasure network;

the picture generation module is used for respectively sampling the source domain data set and the target domain data set to obtain a source domain picture and a target domain picture, circularly generating a generated picture which is similar to the style of the target domain picture by the countermeasure network according to the style of the target domain picture, and inputting the generated picture, the source domain picture and the target domain picture into a re-identification model;

the feature extraction module is used for respectively extracting the shallow feature map and the deep feature map which are obtained after the generated picture, the source domain picture and the target domain picture are input into the re-recognition model, and performing loss calculation;

the parameter updating module is used for updating the parameters of the circularly generated countermeasure network and the re-recognition model according to the loss calculation gradient;

the data conversion module is used for obtaining an updated cycle generation countermeasure network and converting the source domain data set into the style of the target domain data set by using the updated cycle generation countermeasure network;

and the model training module is used for sending the updated pictures generated by the circularly generated countermeasure network into the multi-granularity feature extraction model for training so as to obtain the re-recognition model with improved cross-domain recognition performance.

In addition, to achieve the above object, the present invention further provides a terminal, wherein the terminal includes: the processor is further configured to execute a shallow texture fetch-based cross-domain re-identification program stored on the memory and executable on the processor, the shallow texture fetch-based cross-domain re-identification program implementing the steps of the shallow texture fetch-based cross-domain re-identification method as described above when executed by the processor.

In addition, to achieve the above object, the present invention further provides a computer readable storage medium, wherein the computer readable storage medium stores a cross-domain re-identification program based on shallow texture extraction, and when executed by a processor, the cross-domain re-identification program based on shallow texture extraction implements the steps of the cross-domain re-identification method based on shallow texture extraction as described above.

According to the method, the re-recognition model is added on the basis of circularly generating the countermeasure network to restrict the generated picture to be closer to the shallow texture information of the target picture, and the style of the generated picture is closer to the style of the target picture than other methods through the improvement, so that the generalization capability of the model in the re-recognition task is improved, and the posture mismatch problem is relieved by a mode of adaptively segmenting local features.

Drawings

FIG. 1 is a flow chart of a cross-domain re-identification method based on shallow texture extraction according to a preferred embodiment of the present invention;

FIG. 2 is a schematic diagram of a training framework of a supervised style migration model of a re-recognition model in a preferred embodiment of the cross-domain re-recognition method based on shallow texture extraction of the present invention;

FIG. 3 is a schematic diagram of a multi-granularity feature extraction network structure in an embodiment of the present invention based on the cross-domain re-recognition method of shallow texture extraction;

FIG. 4 is a diagram illustrating adaptive local feature segmentation using cumulative distribution according to a preferred embodiment of the present invention based on the cross-domain re-identification method of shallow texture extraction;

FIG. 5 is a diagram of a dynamic local information matching method in an embodiment of the present invention based on the cross-domain re-identification method of shallow texture extraction;

FIG. 6 is a schematic diagram of the dynamic alignment of local branch features of a multi-granularity feature extraction network in the preferred embodiment of the cross-domain re-identification method based on shallow texture extraction according to the present invention;

FIG. 7 is a schematic diagram illustrating a cross-domain re-recognition system based on shallow texture extraction according to an embodiment of the present invention;

FIG. 8 is a diagram illustrating an operating environment of a terminal according to a preferred embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention clearer and clearer, the present invention is further described in detail below with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

The cross-domain re-recognition method based on shallow texture extraction in the preferred embodiment of the invention aims to generate a picture which is closer to the style of a target domain and can retain original identity information, and simultaneously relieves the problem of posture mismatch in a re-recognition task to a certain extent. As shown in fig. 1, the cross-domain re-identification method based on shallow texture extraction includes the following steps:

s10, acquiring a source domain data set and a target domain data set, and inputting the source domain data set and the target domain data set into a circularly generated countermeasure network.

Specifically, the source domain data set and the target domain data set are acquired, and k samples are randomly sampled from the source domain data set and the target domain data set.

As shown in the lattice migration model box in fig. 2, the source domain picture a and the target domain picture B are sent to a style generator, i.e. a cyclic generated adaptive network (cyclic gan), i.e. the acquired k samples are input to the cyclic generated adaptive network, and the generated adaptive losses are respectively:

wherein G represents a target domain generator for circularly generating the antagonizing network; f represents a source domain generator; d _T A representation target domain discriminator; d _S A representation source domain discriminator; a represents a source domain picture; b represents a target domain picture; l is _adv (G，D _T ) Generation pair of target domain generator G representing loop generation countermeasure networkLoss resistance; l is _adv (F，D _S ) A generation countermeasure loss of a source domain generator F representing a cyclic generation countermeasure network;

the target domain picture B used for calculation in the expression formula is from a target domain, namely, the data distribution obeying to the target domain picture B is expressed; d _T (B) Representation target domain discriminator D _T Judging whether the target domain picture B is the target domain or not;

the source domain picture A used for calculation in the expression is from Yu Yuanyu, that is, the data distribution obeying to the source domain picture A is represented; g (A) is a picture generated after the source domain picture A is sent to a target domain generator G which generates the countermeasure network circularly; d _S (A) Representation source domain discriminator D _S Judging whether the source domain picture A is a source domain or not, and respectively representing the results by 1 and 0; d _S (F (B)) represents a source domain discriminator D _S And judging whether the picture generated after the target domain picture B is sent into a source domain generator F of the circularly generated countermeasure network is a source domain or not, wherein the results are respectively represented by 1 and 0.

The cycle consistent loss is expressed as:

wherein, F (G (A)) represents a picture generated after a picture G (A) generated after a source domain picture A is sent to a target domain generator G of the loop generation countermeasure network is sent to a source domain generator F of the loop generation countermeasure network; g (F (B)) represents a picture generated by sending the target domain map B to the source domain generator F of the loop generation countermeasure network, and then sending the picture F (B) to the target domain generator G of the loop generation countermeasure network.

The loss of identity within a domain is expressed as:

wherein, F (A) represents a picture generated after the source domain picture A is sent to a source domain generator F which circularly generates a countermeasure network; f (B) represents a picture generated by sending the target domain picture B to the source domain generator F which generates the countermeasure network cyclically.

The overall loss function is expressed as:

L _cycleGAN (G，F，D _S ，D _T )＝L _adv (G，D _T )+L _adv (F，D _s )+λ ₁ L _rec (G，F)+λ ₂ L _iat (G，F)；

wherein λ is ₁ And λ ₂ The weight is expressed and can be adjusted according to actual conditions, and can be generally set to 1.

Step S20, respectively sampling the source domain data set and the target domain data set to obtain a source domain picture and a target domain picture, circularly generating a generated picture by the countermeasure network according to the style of the target domain picture, converting the source domain picture into a generated picture close to the style of the target domain picture, and inputting the generated picture, the source domain picture and the target domain picture into a re-recognition model.

Specifically, as shown in a re-recognition model box in fig. 2, a source domain picture a, a generation picture a' and a target domain picture B are input to the re-recognition model; the re-recognition model utilizes Resnet50 as a feature extraction model backbone network M (·); the features of the source domain picture A, the generation picture A 'and the target domain picture B which are extracted by the re-recognition model are respectively expressed as M (A), M (A') and M (B).

And S30, respectively extracting the shallow feature map and the deep feature map which are obtained after the generated picture, the source domain picture and the target domain picture are input into the re-recognition model, and performing loss calculation.

Specifically, the deep feature encoding vectors of the re-recognition model extraction generation picture a ', the source domain picture a and the target domain picture B are M (a'), M (a) and M (B), respectively.

Using Triplet losses, the losses for constructing deep signature codes are:

L _triplet (A，B)＝max{||M(G(A))-M(A)|| ₂ -||M(G(A))-M(B)|| ₂ +m，0}；

wherein, M (G (A)) represents the characteristic that the picture G (A) generated after the source domain picture A is sent to a target domain generator G of the loop generation countermeasure network is subjected to re-recognition model extraction; m represents a threshold value, i.e. the distance between features M (G (a)) and M (a) should be at least M distances smaller than M (G (a)) and M (B), which value is adjusted according to the actual situation.

The shallow layer textural features of the re-recognition model extraction generated picture A' and the target domain picture B are respectively M _j (A') and M _j (B) Wherein M is _j (. The) represents the characteristic diagram of the j convolutional layer output extracted by the re-recognition model backbone network.

The calculated shallow style texture loss is expressed as:

wherein, C _j H _j W _j Represents a feature M _j (G (A)) and M _j (B) The length obtained after the three dimensions of width, height and channel are expanded is similar to the total number of pixels of the picture in the three dimensions of width, height and RGB channel; m _j And (G (A)) shows the characteristics of the picture G (A) generated after the source domain picture A is sent to a target domain generator G of the loop generation countermeasure network and is extracted through the jth convolutional layer of the re-recognition model.

And S40, calculating a gradient according to the loss, and updating the parameters of the circularly generated countermeasure network and the re-identification model.

Specifically, the resulting loss is expressed as:

L _Proposed (G，F，D _S ，D _T )＝L _adv (G，D _T )+L _adv (F，D _S )+λ ₁ L _rec (G，F)+λ ₂ L _idt (G，F)+λ ₃ L _triplet (G，F，M)+λ ₄ L _texture (G，F，M)；

wherein λ is ₃ And λ ₄ Represents a weight, can be rootThe adjustment is carried out according to the actual situation, and the value can be generally set to 1; l is _triplet (G, F, M) represents L _tripley (A，B)；L _texture (G, F, M) represents L _texture (A，B)。

And solving partial derivatives of the characteristics according to the loss, and updating model parameters.

Respectively carrying out gradient descent method iterative learning on the negative direction of the gradient direction according to the partial derivatives, and updating the parameters of the loop generation countermeasure network and the re-identification model; and judging whether iteration is finished (the value of the loss function in the training process is in the process of descending, slightly ascending and overall descending, the iteration is finished after the value of the loss function is reduced and tends to be stable, namely the partial derivative obtained according to the loss function tends to be 0), and returning to the step S10 if the iteration is not finished.

And S50, obtaining an updated cycle generation countermeasure network, and converting the source domain data set into the style of the target domain data set by using the updated cycle generation countermeasure network.

Specifically, a trained loop is called to generate a confrontation network, and a source domain data set is input into the trained loop to generate the confrontation network; a generator of the round robin generation countermeasure network converts the source domain dataset into a generated dataset having the style of the target domain dataset.

And S60, sending the updated pictures generated by the circularly generated countermeasure network into a multi-granularity feature extraction model for training to obtain a re-recognition model with improved cross-domain recognition performance.

Specifically, the generated data set is input to a multi-granular feature extraction model, as shown in FIG. 3.

The multi-granularity feature extraction network also utilizes ResNet-50 as a Backbone network, the part with the same structure as ResNet50 is regarded as a backhaul (the part shared by all branches, the model main body), and the structure behind the rest global average pooling layers is regarded as a Head (branch).

For a backhaul part, an original ResNet-50 Backbone network is decomposed into two groups according to stages, wherein the first group is 1, 2 and 3-stages, and all branches share the convolution kernel parameters of the layer; the second group is 4-stage, which makes the Head of different branches possess a set of 4-stages with the same structure but without sharing parameters, and for the Head part, the part behind the original global average pooling layer is deleted. For the extracted feature part, a BNNeck structure formed by a 1 x 1 convolution layer and a BatchNorm layer is added into the Head of each branch, the dimension reduction is carried out on the extracted feature, and the number of feature channels of each branch is reduced from 2048 to 256 dimensions, so that the complexity of calculating a distance matrix during training and reasoning retrieval is reduced.

Respectively monitoring the features extracted by the multi-granularity feature extraction model by utilizing classification Loss and measurement Loss, monitoring the extracted local features and global features by utilizing Softmax Loss for the classification Loss, and monitoring the extracted ith local or global feature vector x _i The classification penalty is expressed as:

wherein, W _k A weight vector representing a kth class;

a transpose of the weight vector representing the kth class;

a transpose of the weight vector representing the yi category; n represents the number of training samples in one batch iteration; c represents the total number of categories in the training set;

for metric Loss, the traditional Triplet Loss function is used for optimization, and the optimization is expressed as:

wherein,

and

representing the anchor point, the positive sample and the feature vector of the negative sample of the i triples in a batch of training samples; k represents the number of categories contained in a training batch sample; p represents the number of samples contained in each category;

the loss function ultimately used to optimize the re-identified network is represented as:

Loss＝L _{classfication} +L _triplet ；

for the blocks of the local features in the local branches, a self-adaptive local feature segmentation method is adopted, the schematic diagram of the method is shown in fig. 4, the size of a feature activation graph x extracted by using a feature extraction model is C multiplied by H multiplied by W, wherein C is the channel number of the feature graph, H is the height of the feature graph, and W is the width of the feature graph; let the activation value of the feature map at the c-th channel, the spatial coordinate at (h, w), be denoted as x _h，w，c And h and w represent height and width, respectively.

Defining an indicator function I _x (h, w, c) indicating whether the output of the neuron at the (h, w) position in the same channel plane c is the maximum activation value, and the indicator function is defined as follows:

in order to represent the distribution of the maximum activation value height on each channel, the frequency of the maximum activation value height of each height is counted by using an indicating function, and the distribution function D of the frequency of the maximum activation value height _x (h) Expressed as:

wherein the distribution function D _x (h) Is a function of the height h, the output of the distribution function representing how many channels the height of the maximum activation value of the profile is exactly h;

in order to make each local feature block occupy the same number of channels, firstly, the distribution function is calculatedNumber D _x (h) Cumulative distribution function H _x (h)：

Wherein the cumulative distribution function H _x (h) The output of (a) represents a profile of how many channels there are, the height of the maximum activation value is less than or equal to the height H, and the range of values H _x (h)∈[0，C](ii) a Defining the inverse of the cumulative distribution function as

Is c E [0,C]Value range of

By an inverse function in cumulative distribution

Setting sampling points at equal intervals to obtain the segmentation height; for n _s For each local block, each partition height point h _k In-situ cumulative distribution function H _x (h) Satisfies the following relationship:

wherein H _x (h _k ) Indicates the kth height division point h _k And the (k-1) th height division point h _k-1 The number of channels with the maximum activation value in between; h _x (h _k + 1) denotes the kth height division point h _k And the k-1 th height division point h _k-1 The number of channels with the maximum activation value is less than

After the local features are obtained by adaptively segmenting according to the channel information of the feature map, in order to further improve the robustness of matching between feature sequences, a method for dynamically matching the local information is introduced, and the effect of improving the algorithm for dynamically matching the local information by combining the obtained adaptively segmented local features is shown in fig. 5.

Global feature activation map x extracted for pedestrian image A and pedestrian image B in FIG. 5 _A And x _B Respectively performing self-adaptive pooling segmentation, and obtaining n from each image _s A C-dimensional local feature sequence, wherein the local feature sequences of the two images are respectively recorded as

And

n-th representing a pedestrian image A _s A C-dimensional local feature sequence is generated,

n-th representing a pedestrian image B _s And C-dimensional local feature sequences.

In order to ignore the deviation caused by the characteristic scale, a normalization method is needed to calculate the local characteristic sequence L _A And L _B The distance matrix between, normalizing the values to the [0,1) interval, the calculation formula is as follows:

wherein d is _i，j Representing the normalized distance between the ith block local block of the pedestrian image A and the jth block local block of the pedestrian image B;

representing the jth C-dimension local feature sequence of the pedestrian picture A;

representing the jth C-dimensional local feature sequence of the pedestrian picture B; d represents a distance matrix, the distances between the local blocks of the two images and the other local blocks are recorded as D, and the value of the distance matrix D at the (i, j) position is represented by D _i，j Composition is carried out; in order to obtain the distance after the local feature sequences are aligned, in the distance matrix D, searching from (1,1) to (n) by using a dynamic programming algorithm _s ，n _s ) The shortest path of (3) is the aligned distance.

The search procedure for the shortest path is expressed as:

wherein S is _i，j Represents the shortest path distance from (1,1) to (i, j) in distance matrix D; s _i-1，j Represents the shortest path distance from (1,1) to (i-1,j) in distance matrix D; s _i，j-1 Represents the shortest path distance from (1,1) to (i, j-1) in the distance matrix D.

Finally, after the characteristic alignment correction, the distance d between the local characteristic sequences of the two images _l (A, B) is represented by:

for the processing of global and local features, the flow of the network structure framework in the local branch of the multi-granularity feature extraction structure is shown in fig. 6; on the basis of global features obtained by global branches, on one hand, hard samples are mined on the basis of the global features due to the fact that the global feature calculation speed is high, and local features only calculate the distance of local feature alignment of the hard samples; on the other hand, the global features extracted from the pedestrian image A and the pedestrian image B are respectively recorded as f _A And f _B Then the distance between global features is expressed as:

d _g (A，B)＝||f _A -f _B || ₂ ；

in the re-identification reasoning stage, the distance between the global feature and the local feature is weighted and characterized, and the influence of the two distances on the sample is considered together, so that the distance used in the retrieval is as follows:

d(A，B)＝d _g (A，B)+λd _l (A，B)；

where λ is a hyper-parameter for balancing the two distances, which can be set to λ =1; d _l (A, B) represents the distance between local features.

And finally, solving a partial derivative of the characteristics according to the loss, and updating the model parameters. And respectively carrying out gradient descent method iterative learning on the negative direction of the gradient direction by the partial derivative. And updating parameters of the multi-granularity feature extraction network. And judging whether the iteration is finished or not, and returning to the step of sending the generated data set into the multi-granularity feature extraction model if the iteration is not finished.

In order to verify the proposed learning framework for supervising style migration by using the re-recognition model, a cross-domain re-recognition experiment is performed from two aspects of deep feature constraint and shallow feature constraint respectively.

Firstly, table 1 shows the influence of the heavy recognition deep feature constraint on the cyclic generation of the antagonistic network style migration, and by introducing the identity supervision of the heavy recognition deep feature, the style migration framework based on the cyclic generation of the antagonistic network is used as a source data domain in the cross-domain heavy recognition task, and the cross-domain performance respectively improves the average precision of 2.5% mAP and the accuracy of 3.0% rank-1; in experiments with DukeMTMC-reid as the source data domain, the cross-domain performance improved the average accuracy of 4.8% mAP and the accuracy of 2.6% rank-1, respectively. Therefore, the effectiveness of introducing the re-recognition deep features to supervise the style migration model is verified.

Table 1: heavy recognition model deep feature constraint pair cycleGAN style migration experiment result

Furthermore, in order to explore the influence of cyclic generation on the style migration model of the anti-network for the style loss of shallow feature layers with different depths of the re-identification backbone network, shallow feature constraints are respectively added on different stages in the re-identification backbone network to constrain the style among data domains. The results are shown in Table 2.

Table 2: influence of shallow feature constraints of different stages of re-recognition model on CycleGAN style migration

In order to verify the improved effectiveness of the proposed adaptive local feature segmentation method on the local information dynamic matching method, the related experiment results are shown in table 3.

Table 3: improved experimental result of feature extraction model by self-adaptive local feature segmentation method

Further, as shown in fig. 7, based on the above cross-domain re-identification method based on shallow texture extraction, the present invention also provides a cross-domain re-identification system based on shallow texture extraction, wherein the cross-domain re-identification system based on shallow texture extraction includes:

a data obtaining module 51, configured to obtain a source domain data set and a target domain data set, and input the source domain data set and the target domain data set to a cyclic generation countermeasure network;

a picture generation module 52, configured to sample a source domain picture and a target domain picture from the source domain data set and the target domain data set, respectively, the cyclic generation countermeasure network converts the source domain picture into a generated picture having a style similar to that of the target domain picture according to the style of the target domain picture, and inputs the generated picture, the source domain picture, and the target domain picture into a re-recognition model;

a feature extraction module 53, configured to extract a shallow feature map and a deep feature map obtained after the generated picture, the source domain picture, and the target domain picture are input into the re-recognition model, and perform loss calculation;

a parameter updating module 54, configured to update parameters of the loop-generated countermeasure network and the re-recognition model according to the loss calculation gradient;

a data conversion module 55, configured to obtain an updated cycle generation countermeasure network, and convert the source domain data set into a style of the target domain data set using the updated cycle generation countermeasure network;

and the model training module 56 is configured to send the updated images generated by the circularly generated countermeasure network into the multi-granularity feature extraction model for training, so as to obtain a re-recognition model with improved cross-domain recognition performance.

Further, as shown in fig. 8, based on the above cross-domain re-identification method and system based on shallow texture extraction, the present invention also provides a terminal, which includes a processor 10, a memory 20 and a display 30. Fig. 8 shows only some of the components of the terminal, but it is to be understood that not all of the shown components are required to be implemented, and that more or fewer components may be implemented instead.

The memory 20 may in some embodiments be an internal storage unit of the terminal, such as a hard disk or a memory of the terminal. The memory 20 may also be an external storage device of the terminal in other embodiments, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like provided on the terminal. Further, the memory 20 may also include both an internal storage unit and an external storage device of the terminal. The memory 20 is used for storing application software installed in the terminal and various types of data, such as program codes of the installation terminal. The memory 20 may also be used to temporarily store data that has been output or is to be output. In an embodiment, the memory 20 stores a cross-domain re-identification program 40 based on shallow texture extraction, and the cross-domain re-identification program 40 based on shallow texture extraction can be executed by the processor 10, so as to implement the cross-domain re-identification method based on shallow texture extraction in the present application.

The processor 10 may be a Central Processing Unit (CPU), microprocessor or other data Processing chip in some embodiments, and is used for running program codes stored in the memory 20 or Processing data, such as executing the cross-domain re-identification method based on shallow texture extraction.

The display 30 may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch panel, or the like in some embodiments. The display 30 is used for displaying information at the terminal and for displaying a visual user interface. The components 10-30 of the terminal communicate with each other via a system bus.

In one embodiment, the steps of the above-described cross-domain re-identification method based on shallow texture extraction are implemented when the processor 10 executes the cross-domain re-identification program 40 based on shallow texture extraction in the memory 20.

The present invention also provides a computer-readable storage medium, wherein the computer-readable storage medium stores a cross-domain re-identification program based on shallow texture extraction, and when being executed by a processor, the cross-domain re-identification program based on shallow texture extraction implements the steps of the cross-domain re-identification method based on shallow texture extraction as described above.

In summary, the present invention provides a cross-domain re-identification method and related device based on shallow texture extraction, the method includes: respectively sampling the source domain data set and the target domain data set to obtain a source domain picture and a target domain picture, circularly generating a generated picture by the countermeasure network according to the style of the target domain picture, converting the source domain picture into a generated picture close to the style of the target domain picture, and inputting the generated picture, the source domain picture and the target domain picture into a re-identification model; respectively extracting the shallow feature map and the deep feature map which are obtained after the generated picture, the source domain picture and the target domain picture are input into the re-recognition model, and performing loss calculation; according to the loss calculation gradient, updating parameters of the circularly generated countermeasure network and the re-identification model; obtaining an updated cycle generation countermeasure network, and converting the source domain data set into the style of the target domain data set by using the updated cycle generation countermeasure network; and sending the updated pictures generated by the circularly generated countermeasure network into a multi-granularity feature extraction model for training to obtain a re-recognition model with improved cross-domain recognition performance. According to the method, the re-recognition model is added on the basis of circularly generating the countermeasure network to restrict the generated picture to be closer to the shallow texture information of the target picture, and the style of the generated picture is closer to the style of the target picture than other methods through the improvement, so that the generalization capability of the model in the re-recognition task is improved, and the posture mismatch problem is relieved by a mode of adaptively segmenting local features.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or terminal that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or terminal. Without further limitation, an element defined by the phrase "comprising a … …" does not exclude the presence of another identical element in a process, method, article, or terminal that comprises the element.

Of course, it will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by instructing relevant hardware (such as a processor, a controller, etc.) through a computer program, and the program can be stored in a computer readable storage medium, and when executed, the program can include the processes of the embodiments of the methods described above. The computer readable storage medium may be a memory, a magnetic disk, an optical disk, etc.

It will be understood that the invention is not limited to the examples described above, but that modifications and variations will occur to those skilled in the art in light of the above teachings, and that all such modifications and variations are considered to be within the scope of the invention as defined by the appended claims.

Claims

1. A cross-domain re-identification method based on shallow texture extraction is characterized by comprising the following steps:

according to the loss calculation gradient, updating parameters of the circularly generated countermeasure network and the re-recognition model;

obtaining an updated cycle generation countermeasure network, and converting the source domain data set into the style of the target domain data set by using the updated cycle generation countermeasure network;

and sending the updated pictures generated by the circularly generated countermeasure network into a multi-granularity feature extraction model for training to obtain a re-recognition model with improved cross-domain recognition performance.

2. The method according to claim 1, wherein the obtaining a source domain data set and a target domain data set, and inputting the source domain data set and the target domain data set to a cyclic generation countermeasure network, specifically comprises:

acquiring the source domain data set and the target domain data set, and randomly sampling k samples from the source domain data set and the target domain data set;

inputting the collected k samples into the cyclic generation countermeasure network, and generating countermeasure losses respectively as follows:

wherein G represents a target domain generator for circularly generating the antagonizing network; f represents a source domain generator; d _T A representation target domain discriminator; d _S A representation source domain discriminator; a represents a source domain picture; b represents a target domain picture; l is _adv (G，D _T ) A generation countermeasure loss of a target domain generator G representing a cyclic generation countermeasure network; l is _adv (F，D _S ) A generation countermeasure loss of a source domain generator F representing a cyclic generation countermeasure network;

representing a data distribution subject to the target domain picture B; d _T (B) Representation target domain discriminator D _T Judging whether the target domain picture B is the target domain or not;

representing a data distribution subject to source domain picture a; g (A) is a picture generated after the source domain picture A is sent to a target domain generator G which generates the countermeasure network circularly; d _S (A) Representation source domain discriminator D _S Judging whether the source domain picture A is a source domain or not; d _S (F (B)) denotes a source domain discriminator D _S Judging whether the picture generated after the target domain picture B is sent into a source domain generator F of the circularly generated countermeasure network is a source domain or not;

the cycle consistent loss is expressed as:

wherein, F (G (A)) represents a picture generated after a picture G (A) generated after a source domain picture A is sent to a target domain generator G of the loop generation countermeasure network is sent to a source domain generator F of the loop generation countermeasure network; g (F (B)) represents a picture generated after the target domain graph B is sent to a source domain generator F of the loop generation countermeasure network, and a picture generated after the picture F (B) is sent to a target domain generator G of the loop generation countermeasure network;

the loss of identity within a domain is expressed as:

wherein, F (A) represents a picture generated after the source domain picture A is sent to a source domain generator F which circularly generates a countermeasure network; f (B) represents a picture generated after the target domain picture B is sent to a source domain generator F of the circularly generated countermeasure network;

the overall loss function is expressed as:

L _CycleGAN (G，F，D _S ，D _T )＝L _adv (G，D _T )+L _adv (F，D _S )+λ ₁ L _rec (Ｇ，F)+λ ₂ L _idt (G，F)；

wherein λ is ₁ And λ ₂ Representing the weight.

3. The cross-domain re-recognition method based on shallow texture extraction as claimed in claim 2, wherein the inputting the generated picture, the source domain picture and the target domain picture into a re-recognition model specifically comprises:

inputting the source domain picture A, the generated picture A' and the target domain picture B into a re-identification model;

the features of the source domain picture A, the generation picture A 'and the target domain picture B which are extracted by the re-recognition model are respectively expressed as M (A), M (A') and M (B).

4. The cross-domain re-recognition method based on shallow texture extraction according to claim 3, wherein the extracting shallow feature maps and deep feature maps obtained after the generated picture, the source domain picture and the target domain picture are input into the re-recognition model respectively, and performing loss calculation specifically includes:

deep feature coding vectors of the generated picture A ', the source domain picture A and the target domain picture B are respectively M (A'), M (A) and M (B) through the re-recognition model extraction;

using Triplet losses, the losses for constructing deep signatures are:

L _triplet (A，B)＝max{||M(G(A))-M(A)|| ₂ -||M(G(A))-M(B)|| ₂ +m，0}；

wherein, M (G (A)) represents the characteristic that the picture G (A) generated after the source domain picture A is sent to a target domain generator G of the loop generation countermeasure network is subjected to re-recognition model extraction; m represents a threshold value;

the shallow texture features of the re-recognition model extraction generated picture A' and the target domain picture B are respectively M _j (A') and M _j (B) Wherein M is _j (. H) a feature graph representing the output of the jth convolutional layer extracted by the re-recognition model backbone network;

the shallow style texture penalty is computed as:

wherein, C _j H _j W _j Represents a feature M _j (G (A)) and M _j (B) The length obtained after the three dimensions of width, height and channel are expanded; m _j And (G (A)) shows the characteristics of the picture G (A) generated after the source domain picture A is sent to a target domain generator G of the loop generation countermeasure network and is extracted by the jth convolutional layer of the re-identification model.

5. The method according to claim 4, wherein the updating the parameters of the loop-generated countermeasure network and the re-recognition model according to the gradient of the loss calculation specifically comprises:

the resulting loss is expressed as:

wherein λ is ₃ And λ ₄ Representing a weight; l is _triplet (G, F, M) represents L _triplet (A，B)；L _texture (G, F, M) represents L _texture (A，B)；

Calculating partial derivatives of the characteristics according to the loss;

and respectively carrying out gradient descent method iterative learning on the negative direction of the gradient direction according to the partial derivatives, and updating the parameters of the loop generation countermeasure network and the re-identification model.

6. The method according to claim 5, wherein the obtaining an updated rotation generation countermeasure network, and the converting the source domain data set into the style of the target domain data set using the updated rotation generation countermeasure network, specifically comprises:

calling the trained cycle to generate a confrontation network, and inputting the source domain data set into the trained cycle to generate the confrontation network;

a generator of the recurrent antagonistic network converts the source domain dataset into a generated dataset having the style of the target domain dataset.

7. The cross-domain re-recognition method based on shallow texture extraction as claimed in claim 6, wherein the sending the updated image generated by the cyclic generation countermeasure network into the multi-granularity feature extraction model for training to obtain the re-recognition model with improved cross-domain recognition performance specifically comprises:

inputting the generated data set into a multi-granularity feature extraction model;

respectively supervising the features extracted by the multi-granularity feature extraction model by utilizing classification loss and measurement loss, and extracting the ith local or global feature vector x _i The classification penalty is expressed as:

wherein, W _k A weight vector representing a kth class;

a transpose of the weight vector representing the kth class;

transpose of weight vectors representing the yi category; n represents the number of training samples in one batch iteration; c represents the total number of categories in the training set;

wherein,

and

representing the anchor point, the positive sample and the feature vector of the negative sample of the i triples in a batch of training samples; k represents the number of categories contained in one training batch sample; p represents the number of samples contained in each category;

Loss＝L _{classfication} +L _triplet ；

for the blocks of the local features in the local branches, a self-adaptive local feature segmentation method is adopted, the size of a feature activation graph x extracted by a feature extraction model is C multiplied by H multiplied by W, wherein C is the number of channels of the feature graph, H is the height of the feature graph, and W is the width of the feature graph; let the activation value of the feature map at the c-th channel, the spatial coordinate at (h, w), be denoted as x _h，w，c ；

Defining an indicator function I _x (h, w, c) indicating whether the output of the neuron at the (h, w) position in the same channel plane c is the maximum activation value, the indicator function is defined as follows:

counting the frequency of the maximum activation height of each height by using an indication function, and distributing the frequency of the maximum activation height by using a distribution function D _x (h) Expressed as:

making each local feature block occupy the same number of channels, firstly solving a pair distribution function D _x (h) Cumulative distribution function H _x (h)：

Is c epsilon [0,C]Value range of

By inverse function of cumulative distribution

Setting sampling points at equal intervals to obtain the segmentation height; for n _s For each local block, each segmentation height point h _k In-situ cumulative distribution function H _x (h) Satisfies the following relationship:

wherein H _x (h _k ) Indicates the kth height division point h _k And the (k-1) th height division point h _k-1 The number of channels with the maximum activation value in between; h _x (h _k + 1) denotes the kth height division point h _k And the (k-1) th height division point h _k-1 The number of channels with the maximum activation value in between is less than

Global feature activation map x extracted for pedestrian image A and pedestrian image B _A And x _B Respectively carrying out self-adaptive pooling segmentation, and obtaining n from each image _s C-dimensional local feature sequences, and recording the local feature sequences of the two images as the local feature sequences

And

n-th representing pedestrian image B _s C-dimensional local feature sequences;

neglecting the deviation brought by the characteristic scale, calculating the local characteristic sequence L by utilizing a normalization method _A And L _B The distance matrix between, normalizing the values to the [0,1) interval, the calculation formula is as follows:

wherein d is _i，j The normalized distance between the ith local block of the pedestrian image A and the jth local block of the pedestrian image B is represented;

representing a j-th C-dimensional local feature sequence of the pedestrian picture A;

representing a j-th C-dimensional local feature sequence of the pedestrian picture B; d represents a distance matrix, and the value of the distance matrix D at the (i, j) position is represented by D _i，j Composition is carried out; in the distance matrix D, search is performed from (1,1) to (n) using a dynamic programming algorithm _s ，n _s ) The shortest path of (3) is the distance after alignment;

the search procedure for the shortest path is expressed as:

wherein S is _i，j Represents the shortest path distance from (1,1) to (i, j) in distance matrix D; s _i-1，j Represents the shortest path distance from (1,1) to (i-1,j) in distance matrix D; s _i，j-1 Represents the shortest path distance from (1,1) to (i, j-1) in the distance matrix D;

after the characteristic alignment correction, the distance d between the local characteristic sequences of the two images _l (A, B) is represented by:

on the global features obtained by the global branches, hard samples are mined on the basis of the global features, and the local features only calculate the distance of local feature alignment of the hard samples; the global features extracted from the pedestrian image A and the pedestrian image B are respectively recorded as f _A And f _B Then the distance between global features is expressed as:

d _g (A，B)＝||f _A -f _B || ₂ ；

in the re-identification reasoning stage, the distance between the global feature and the local feature is weighted and characterized, the influence of the two distances on the sample is considered together, and the used distance during retrieval is as follows:

d(A，B)＝d _g (A，B)+λd _l (A，B)；

where λ is a hyper-parameter for balancing two distances; d _l (A, B) represents the distance between local features.

8. A cross-domain re-identification system based on shallow texture extraction is characterized in that the cross-domain re-identification system based on shallow texture extraction comprises:

9. A terminal, characterized in that the terminal comprises: a memory, a processor and a shallow texture extraction based cross-domain re-identification program stored on the memory and executable on the processor, the shallow texture extraction based cross-domain re-identification program when executed by the processor implementing the steps of the shallow texture extraction based cross-domain re-identification method as claimed in any one of claims 1-7.

10. A computer-readable storage medium, wherein the computer-readable storage medium stores a cross-domain re-identification program based on shallow texture extraction, and when the cross-domain re-identification program based on shallow texture extraction is executed by a processor, the steps of the cross-domain re-identification method based on shallow texture extraction as claimed in any one of claims 1-7 are implemented.