CN113869193A

CN113869193A - Training method of pedestrian re-identification model, and pedestrian re-identification method and system

Info

Publication number: CN113869193A
Application number: CN202111131114.6A
Authority: CN
Inventors: 倪子凡; 王健宗
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2021-09-26
Filing date: 2021-09-26
Publication date: 2021-12-31
Anticipated expiration: 2041-09-26
Also published as: CN113869193B

Abstract

The invention relates to the technical field of image recognition, and provides a training method of a pedestrian re-recognition model, a pedestrian re-recognition method and a system, wherein a source domain and a target domain original feature vector of a training sample are respectively extracted, and a domain invariant identity feature and a domain specific enhancement feature are obtained through decomposition of the pedestrian re-recognition model; the original characteristic vector, the domain invariant identity characteristic and the domain specific enhancement characteristic are repeated to obtain a reconstructed characteristic vector group; inputting the reconstructed feature vector group into a cross-domain face recognition loss function and a domain classification loss function; and (5) finishing the training of all training samples according to the loop iteration, and selecting the model with the minimum sum of cross-domain face recognition loss and domain classification loss as the trained pedestrian re-recognition model. The reconstruction feature set increases the diversity of samples used in training, inherits the reliable identity labels in the source domain, can well represent the data distribution of the source domain and the target domain, and trains the pedestrian re-recognition model with high-efficiency recognition under the condition of less samples.

Description

Training method of pedestrian re-identification model, and pedestrian re-identification method and system

Technical Field

The invention relates to the technical field of image recognition, in particular to a training method of a pedestrian re-recognition model, a pedestrian re-recognition method and a system.

Background

Pedestrian re-identification (ReID) is a technique that uses computer vision techniques to determine whether a particular pedestrian is present in an image or video sequence, such as retrieving images of the pedestrian in a plurality of camera surveillance videos given a pedestrian.

An unsupervised Domain adaptive technology UDA (unsupervised Domain adaptation) transfers knowledge from a source Domain with a label to a target Domain without a label, so that the knowledge obtains better performance in a new environment, and is widely applied to a pedestrian re-recognition scene.

Disclosure of Invention

In view of the above drawbacks of the prior art, an object of the present invention is to provide a training method for a pedestrian re-recognition model, a pedestrian re-recognition method and a system thereof, which are used to solve the problems of high training complexity and poor recognition effect in the prior art.

A first aspect of the present invention provides a training method of a pedestrian re-recognition model, including:

respectively extracting original feature vectors of character images of a source domain and a target domain of a training sample, and decomposing the original feature vectors through a pedestrian re-recognition model to obtain domain invariant identity features and domain specific enhancement features;

reconstructing the original characteristic vector, the domain invariant identity characteristic and the domain specific enhancement characteristic to obtain a reconstructed characteristic vector group;

inputting the reconstructed feature vector group into a cross-domain face recognition loss function and a domain classification loss function to calculate corresponding cross-domain face recognition loss and domain classification loss;

and circularly iterating the steps until the training of all the training samples is completed, and selecting the model with the minimum sum of cross-domain face recognition loss and domain classification loss as the trained pedestrian re-recognition model.

In an embodiment of the present invention, the step of respectively extracting original feature vectors of the human images of a source domain and a target domain of the training sample, and obtaining the domain-invariant identity features and the domain-specific enhanced features from the original feature vectors through a pedestrian re-recognition model includes:

respectively extracting original feature vectors of the character images of the source domain and the target domain in a training sample;

obtaining the domain-invariant identity features and the domain-specific enhancement features through full-scale network OSNet decomposition:

B＝(1-O(F))⊙F,

E＝O(F)⊙F，

wherein F is a feature vector; b is the identity characteristic of the invariable domain; e is a domain-specific enhancement feature; element-by-element multiplication; o (-) is a response of the OSNet network, and

wherein, T is 4; g (F)^t) For length spanning input F^tA vector of the entire channel dimension.

In an embodiment of the present invention, the step of reconstructing the original feature vector, the domain-invariant identity feature, and the domain-specific enhancement feature to obtain a reconstructed feature vector set includes:

recombining the domain-invariant identity features and the domain-specific enhancement features of the person images of the source domain and the target domain to obtain a first reconstructed feature vector and a second reconstructed feature vector;

and rearranging and combining the original characteristic vector of the character image of the source domain, the original characteristic vector of the character image of the target domain, the first reconstruction characteristic vector and the second reconstruction characteristic vector according to different orders to obtain a reconstruction characteristic vector group.

In an embodiment of the present invention, the step of recombining the domain-invariant identity features and the domain-specific enhanced features of the human images of the source domain and the target domain to obtain the first reconstructed feature vector and the second reconstructed feature vector includes:

recombining the domain-invariant identity features of the person image of the source domain and the domain-specific enhancement features of the person image of the target domain to obtain the first reconstructed feature vector;

recombining the domain-specific enhanced features of the person image of the source domain and the domain-invariant identity features of the person image of the target domain to obtain the second reconstructed feature vector.

In an embodiment of the present invention, the cross-domain face recognition loss function is:

wherein m is

The index number of the element in (1);

representing the (cosine) similarity of the corresponding alignment;

representing the corresponding positive pair of the nth pair of the negative pairs; τ represents a trainable temperature value initialized to 1.

In an embodiment of the present invention, the domain classification loss function is:

where p (-) represents the probability that the trained domain classifier classified it as the source domain.

The second aspect of the present invention also provides a pedestrian re-identification method, including:

acquiring a figure image to be identified;

inputting a character image to be recognized into the pedestrian re-recognition model in any one of the first aspect, extracting a feature vector in the character image to be recognized, calculating the similarity between the feature vector in the character image to be recognized and the feature vector of the character image in the sample library, comparing the similarity with a set threshold, if the similarity is greater than the threshold, judging that the face image is the same person, otherwise, judging that the face image is not the same person, and obtaining the recognition result of the pedestrian re-recognition model.

The third aspect of the present invention also provides a pedestrian re-recognition system including:

the processing module is used for respectively extracting original feature vectors of character images of a source domain and a target domain of a training sample, and decomposing the original feature vectors through a pedestrian re-recognition model to obtain domain invariant identity features and domain specific enhancement features;

the reconstruction module is used for reconstructing the original characteristic vector, the domain invariant identity characteristic and the domain specific enhancement characteristic to obtain a reconstructed characteristic vector group;

the calculation module is used for inputting the reconstruction feature group into a cross-domain face recognition loss function and a domain classification loss function to calculate corresponding cross-domain face recognition loss and domain classification loss;

and the control training module is used for controlling all training samples to carry out circular iterative training, and selecting the model with the minimum sum of cross-domain face recognition loss and domain classification loss as the trained pedestrian re-recognition model.

The fourth aspect of the present invention also provides a computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, wherein the processor implements the training method of the pedestrian re-recognition model according to any one of the first aspect when executing the computer program, or implements the pedestrian re-recognition method according to the second aspect when executing the computer program.

The fifth aspect of the present invention also provides a computer-readable storage medium storing a computer program, wherein the computer program is configured to implement a training method of a pedestrian re-recognition model according to any one of the first aspect when executed by a processor, or to implement a pedestrian re-recognition method according to the second aspect when executed by a processor.

As described above, the training method, the pedestrian re-recognition method and the system of the pedestrian re-recognition model according to the present invention have the following advantages:

according to the method, the characteristic vectors of the character images from the source domain and the target domain are extracted, each characteristic vector is decomposed into a domain-invariant identity characteristic and a domain-specific enhancement characteristic, cross-domain characteristic recombination is performed, the obtained reconstruction characteristic group not only increases the diversity of samples used in training, but also inherits the reliable identity label in the source domain, and the data distribution of the source domain and the target domain can be well represented; and a pedestrian re-recognition model with high recognition efficiency can be trained under the condition of less samples by combining the supervision of a target loss function (a cross-domain face recognition loss function and a domain classification loss function).

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings needed to be used in the description of the embodiments of the present application will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without inventive exercise.

FIG. 1 is a schematic flow chart illustrating a training method of a pedestrian re-identification model according to an embodiment of the present invention;

FIG. 2 is a schematic sub-flow chart of a training method for a pedestrian re-identification model according to an embodiment of the present invention;

FIG. 3 is a schematic sub-flow chart of a training method for a pedestrian re-identification model according to an embodiment of the present invention;

FIG. 4 is a schematic sub-flow chart of a training method for a pedestrian re-identification model according to an embodiment of the present invention;

FIG. 5 is a flow chart illustrating a method for pedestrian re-identification according to an embodiment of the present invention;

FIG. 6 shows a schematic block diagram of a training system for a pedestrian re-identification model provided for an embodiment of the present invention;

FIG. 7 shows a schematic block diagram of a computer apparatus provided for an embodiment of the present invention.

Detailed Description

The embodiments of the present invention are described below with reference to specific embodiments, and other advantages and effects of the present invention will be easily understood by those skilled in the art from the disclosure of the present specification. The invention is capable of other and different embodiments and of being practiced or of being carried out in various ways, and its several details are capable of modification in various respects, all without departing from the spirit and scope of the present invention. It is to be noted that the features in the following embodiments and examples may be combined with each other without conflict.

It should be noted that the drawings provided in the following embodiments are only for illustrating the basic idea of the present invention, and the drawings only show the components related to the present invention rather than the number, shape and size of the components in practical implementation, and the type, quantity and proportion of the components in practical implementation can be changed freely, and the layout of the components can be more complicated.

The embodiment of the application can acquire and process related data based on an artificial intelligence technology. Among them, Artificial Intelligence (AI) is a theory, method, technique and application system that simulates, extends and expands human Intelligence using a digital computer or a machine controlled by a digital computer, senses the environment, acquires knowledge and uses the knowledge to obtain the best result.

The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a robot technology, a biological recognition technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.

Referring to fig. 1, a first embodiment of the present invention relates to a training method of a pedestrian re-recognition model, wherein the pedestrian re-recognition model is used to extract a feature vector of a to-be-recognized character image, compare the feature vector of the to-be-recognized character image with a feature vector of a character image in a sample library, calculate a similarity between the feature vector and the feature vector, compare the similarity with a set threshold, and if the similarity is greater than the threshold, determine that the face images are the same person.

As shown in fig. 1, the training method of the pedestrian re-identification model of the embodiment includes:

and step 100, respectively extracting original feature vectors of the character images of the source domain and the target domain of a training sample, and decomposing the original feature vectors through a pedestrian re-recognition model to obtain domain-invariant identity features and domain-specific enhanced features.

In particular, as shown in figure 2,

step 110, preprocessing the character images of the source domain and the target domain of a training sample, extracting an original characteristic vector:

the source domain data is data carrying a label, and the label is pre-labeled on the classification result of the source domain data; the target domain data is data without carrying a label, and the source domain data and the target domain data have certain commonality and certain difference. In this embodiment, the source domain data and the target domain data are both human images.

The figure image is a group of figure images continuously collected by the camera equipment, and before the figure images are used, the figure images need to be preprocessed to obtain preprocessed figure images, wherein the preprocessing comprises the following steps: adjusting illumination, histogram equalization processing and normalization processing. Wherein, should satisfy when adjusting illumination: reducing the brightness of the image in the highlight area, improving the brightness of the image in the shadow area, and keeping the brightness of the image in the transition area; carrying out gray level transformation on the figure image by adopting histogram equalization processing so as to facilitate smooth operation of the system; and carrying out normalization processing on pixel values in the human image to finally obtain a standard image in the same form. In addition, the image capturing device may be a person image obtained through a camera or the internet, for example, the image to be detected may be an image obtained by an electronic device through a camera of a smart phone, a tablet computer, an electronic eye, or the like; alternatively, the image may be an image acquired by the electronic device through the internet, for example, an image captured randomly from the internet, or an image transmitted by another device and received by the electronic device through a social application installed on the electronic device, and the source of the person image is not limited here.

It should be understood that the tagged personal images in the source domain may be self-marked by the user as needed, or may be obtained from an existing personal image library.

Continuing to explain, after the human image is preprocessed, the feature vectors of the source domain and the target domain, namely the original feature vectors, are extracted. The feature vector extraction can be performed in various ways, for example, it can be a residual network (ResNet) with different depths, such as ResNet-50, ResNet-34, ResNet-152, or other depth residual networks; alternatively, it may be a deep convolutional Neural Network (VGG), or it may also be a dense convolutional Network (DenseNet) or a Neural Architecture Search Network on neurons (NASNet), etc. In addition, the full scale network in the present embodiment may also be used to extract the original feature vector. Since extracting feature vectors from an image is a conventional technical means in the art, it is not described herein again.

Step 120, decomposing the feature vector through a pedestrian re-identification model to obtain a domain invariant identity feature and a domain specific enhancement feature:

it should be understood that the domain-invariant identity feature is a feature that is independent of the domain to which the training data belongs, and is a feature that does not vary due to domain differences. Taking the application scenario of pedestrian re-identification as an example, the identification information of the pedestrian is not changed along with the change of the outside, such as the wearing, posture and hair style of the pedestrian; meanwhile, the target object of the target detection task is the pedestrian in the character image, and therefore the identification information of the pedestrian is the domain-invariant identity feature to be extracted. In the learning scene of target detection, the purpose is to accurately extract the identification information of pedestrians from the acquired person image to complete the target detection task.

The domain-specific enhanced features are features characterizing a domain to which the training data belongs, are features specific to the domain to which the training data belongs, and can change due to domain differences; for example, in the application scenario of pedestrian re-recognition, the background of the pedestrian is irrelevant to the identification information of the pedestrian, the pedestrian recognition does not need to know the features, and the features vary with the domain difference.

The domain-invariant features and the domain-specific enhancement features together characterize the data distribution of the source domain and the target domain, and the domain-invariant identity features of different domains are exchangeable between domains without disrupting the distribution of each domain.

Further, in the embodiment, the pedestrian re-identification model adopts a full Scale Network (Omni-Scale Network, OSNet) to extract the pedestrian features.

It should be appreciated that pedestrian re-identification (ReID) relies on features with identification capabilities that can not only capture different spatial scales, but also encapsulate any combination of multiple scales, these isomorphic and heterogeneous scale features being referred to as full scale features; the OSNet network can be used for full-scale feature learning of the ReID, the OSNet network is realized by designing a residual block composed of a plurality of convolution feature streams, and each residual block detects features of a certain scale. Importantly, the OSNet network also introduces a new unified aggregation gate to perform dynamic multi-scale feature combination by using each channel weight which is depended by the input; to effectively learn spatial channel correlations, avoiding overfitting, the building blocks use both point and depth convolutions. By stacking these blocks layer by layer, the OSNet network is very lightweight and can be trained from scratch on an existing ReID basis.

Specifically, the process of extracting the pedestrian features through the OSNet network comprises the following steps: a person image of a given source domain and a given target domain is extracted from the person image, and an original feature vector F epsilon R with the channel number of C and the spatial resolution of H multiplied by W is extracted from the person image^C×H×WDecomposing the original feature vector F to obtain a domain-invariant identity feature B and a domain-specific enhancement feature E, wherein the expression is as follows:

F＝B+E；

the domain invariant identity feature B is a basic feature of the identity of a person and is dominant in the process of identifying the identity of the person; the domain-specific enhancement feature E is complementary to the former.

The extracted calculation formula is as follows:

B＝(1-O(F))⊙F,

E＝O(F)⊙F，

wherein F is an original feature vector; b is the identity characteristic of the invariable domain; e is a domain-specific enhancement feature; element-by-element multiplication; o (-) is a response of the OSNet network, and

wherein, G (F)^t) For length spanning input F^tThe t index represents the characteristic scale.

G is implemented as a mini network consisting of nonparametric global averaging pooling layers and multi-layered perceptron (MLP) with a ReLU activation hidden layer, followed by Sigmoid activation.

For data from source domain D_sExtracting the original characteristic vector F of the character image i_i ^sAnd the original feature vector F is processed by an OSNet network_i ^sDecomposition into domain-invariant identity features B_iAnd domain specific enhancement features

And the expression after decomposition is

For a target domain D_tThe figure image j of which the original feature vector is extracted

And the original feature vector is processed by an OSNet network

Decomposition into domain-invariant identity features B_jAnd domain specific enhancement features

And the expression after decomposition is

And 200, reconstructing the original characteristic vector, the domain invariant identity characteristic and the domain specific enhancement characteristic to obtain a reconstructed characteristic vector group.

In particular, since domain-invariant identity signatures of different domains are exchangeable between domains, without disrupting the distribution of each domain, cross-domain signature reorganization may be performed in order to increase the diversity of training samples.

As shown in fig. 3, the step 200 includes:

step 210, recombining the domain-invariant identity features and the domain-specific enhancement features of the person images of the source domain and the target domain to obtain a first reconstruction feature vector and a second reconstruction feature vector;

and recombining the domain invariant identity feature and the domain specific enhancement feature of the character image of the source domain with the domain invariant identity feature and the domain specific enhancement feature of the character image of the target domain to obtain a first reconstruction feature vector and a second reconstruction feature vector. It is noted that the first and second reconstructed feature vectors each comprise a domain-invariant identity feature vector and a domain-specific enhancement feature.

As shown in fig. 4, specifically:

step 211, recombining the domain-invariant identity features of the person image in the source domain and the domain-specific enhancement features of the person image in the target domain to obtain the first reconstructed feature vector;

identity feature B of source domain_iDomain specific enhancement features with target domains

Combining to obtain a first reconstructed feature vector

And

the expression is as follows:

step 212, recombining the domain-specific enhanced features of the human image of the source domain and the domain-invariant identity features of the human image of the target domain to obtain the second reconstructed feature vector;

enhancing features of domain-specific source domains

Domain invariant identity feature with target domain B_jObtaining a second reconstructed feature vector

The expression is as follows:

for the first reconstructed feature vector

Or second reconstructed feature vector

Its identification information will be inherited from domain-invariant identity feature B, while the domain information will be inherited from domain-specific enhanced feature E.

Step 220, rearranging and combining the original feature vector of the character image of the source domain, the original feature vector of the character image of the target domain, the first reconstruction feature vector and the second reconstruction feature vector according to different sequences to obtain a reconstruction feature vector group.

Through the steps ofS210, obtaining an original characteristic vector F of the source domain for two person images of two persons from the source domain and the target domain_i ^sOriginal feature vector of target domain

First reconstructed feature vector

And a second reconstructed feature vector

And 4 eigenvectors are mutually arranged and combined in different orders to obtain a reconstructed eigenvector group. By the aid of the reconstructed feature vector group, diversity of training samples is increased, reliable identity labels are inherited, target and source domain data distribution can be well represented, and after training in a loss function, the recombined features support reliable identity inheritance and approximate actual distribution.

It should be noted that in this embodiment, the number of reconstructed feature vector groups obtained by recombination in different permutation and combination manners is 24. However, in actual use, the permutation and combination method may be set according to actual needs, and the number of reconstructed feature vector groups varies accordingly. For example: one reconstructed feature vector set in this embodiment is

Another reconstructed set of feature vectors is

And 300, inputting the reconstructed feature vector group into a cross-domain face recognition loss function and a domain classification loss function to calculate corresponding cross-domain face recognition loss and domain classification loss.

Specifically, in the process of training the deep neural network, because the output of the deep neural network is expected to be as close as possible to the value really expected to be predicted, the weight vector of each layer of the neural network can be updated according to the difference between the predicted value of the current network and the target value really expected (of course, an initialization process is usually performed before the first update, that is, parameters are configured in advance for each layer in the deep neural network). For example, if the predicted value of the network is high, the weight vector is adjusted to make the predicted value lower, and the adjustment is continued until the deep neural network can predict the real desired target value or a value very close to the real desired target value. Therefore, it is necessary to define in advance "how to compare the difference between the predicted value and the target value", which are loss functions (loss functions), which are important equations for measuring the difference between the predicted value and the target value. Taking the loss function as an example, if the higher the output value (loss) of the loss function indicates the larger the difference, the training of the deep neural network becomes the process of reducing the loss as much as possible.

In this embodiment, the objective loss function used includes: cross-domain face recognition loss function L_CIDSum domain classification loss function L_DomainAnd respectively supervising the training of the pedestrian re-recognition model from the character similarity degree and the prediction probability by simultaneously using the cross-domain face recognition loss function and the domain classification loss function.

The number of the reconstructed feature vector groups obtained for one training sample is 24, and the reconstructed feature vector group input to the target loss function may be all 24 reconstructed feature vector groups or may be some of the 24 reconstructed feature vector groups extracted at random. Further, since the objective function includes a cross-domain face recognition loss function and a domain classification loss function, the reconstructed feature vector groups input to the cross-domain face recognition loss function and the domain classification loss function may be the same or different.

Further, the cross-domain face recognition loss function is used for measuring the similarity of the people in the two images, and the expression is as follows:

where m is a set of reconstructed feature vectors input into a cross-domain face recognition penalty function, e.g.

The index number of the element(s) in (1),

representing the (cosine) similarity of the corresponding subtends,

representing the corresponding positive and negative n-th pair. τ represents a trainable temperature value initialized to 1.

Reconstructing the feature vector set

And inputting a cross-domain face recognition loss function, taking each element in the set as an anchor point, sequentially pulling out the features of the same identity and pushing in the features of different identities, calculating to obtain a first loss value, and adjusting a first parameter of the to-be-trained pedestrian re-recognition model according to the first loss value by adopting a back propagation algorithm.

It should be noted that the anchor point image is a person image carrying a label in the source domain, and the positive sample image corresponding to the anchor point image is a training image having the same pedestrian identification information as that in the anchor point image; the negative sample image corresponding to the anchor point image is a training image different from the pedestrian identification information in the anchor point image. Thus, there are one pair of positive samples and two pairs of negative samples in the combined set of reconstructed feature vectors that are related to the identity of the person.

The domain classification loss function is a cross entropy-based domain classification loss function L_DomainFor calculating the probability of predicting the target domain sample as the source domain sample, the expression is:

In this embodiment, another set of reconstructed feature vectors is used

And inputting the second loss value into the domain classification loss function, calculating to obtain a second loss value, and adjusting a second parameter of the pedestrian re-recognition model to be trained according to the second loss value by adopting a back propagation algorithm.

Step 400, circularly iterating the steps until the training of all training samples is completed, and selecting a model with the minimum sum of cross-domain face recognition loss and domain classification loss as a trained pedestrian re-recognition model;

and (3) performing loop iterative training on all training samples according to the steps 100-300 until the iteration of all training samples is completed, and taking the weight value corresponding to the minimum sum of the first loss value and the second loss value in multiple times of training as the weight value of the trained pedestrian re-recognition model.

In the optimization process, a cross-domain face recognition loss function L_CIDSum domain classification loss function L_DomainConstrained with respect to each other to suppress trivial solutions of re-identification and domain classification. Meanwhile, cross-domain face recognition loss function L_CIDSum domain classification loss function L_DomainIs driven by joint learning of B_iAnd B_jLearning domain sharing base features, driving

And

enhanced features specific to the field of learning.

It can be seen that, in the embodiment, by extracting feature vectors of character images from a source domain and a target domain, decomposing each feature vector into a domain-invariant identity feature and a domain-specific enhancement feature, and performing cross-domain feature recombination, an obtained reconstructed feature vector group not only increases the diversity of samples used in training, but also inherits reliable identity labels in the source domain, and can well represent the data distribution of the source domain and the target domain; through the decomposition and combination of the target loss function, a pedestrian re-recognition model with high recognition efficiency can be trained under the condition of fewer samples.

Referring to fig. 5, a second embodiment of the present invention relates to a pedestrian re-identification method, including:

step 501, obtaining a person image to be identified.

Specifically, the character image is a group of character images continuously acquired by the camera device, and before the character image is used, the character image needs to be preprocessed to obtain a preprocessed character image, wherein the preprocessing comprises: adjusting illumination, histogram equalization processing and normalization processing. Wherein, should satisfy when adjusting illumination: reducing the brightness of the image in the highlight area, improving the brightness of the image in the shadow area, and keeping the brightness of the image in the transition area; carrying out gray level transformation on the figure image by adopting histogram equalization processing so as to facilitate smooth operation of the system; and carrying out normalization processing on pixel values in the human image to finally obtain a standard image in the same form. In addition, the image capturing device may be a person image obtained through a camera or the internet, for example, the image to be detected may be an image obtained by an electronic device through a camera of a smart phone, a tablet computer, an electronic eye, or the like; alternatively, the image may be an image acquired by the electronic device through the internet, for example, an image captured randomly from the internet, or an image transmitted by another device and received by the electronic device through a social application installed on the electronic device, and the source of the person image is not limited here.

Step 502, inputting a character image to be recognized into a pedestrian re-recognition model, extracting a feature vector in the character image to be recognized, calculating the similarity between the feature vector in the character image to be recognized and the feature vector of the character image in the sample library, comparing the similarity with a set threshold, if the similarity is greater than the threshold, judging that the face image is the same person, otherwise, judging that the face image is not the same person, and obtaining the recognition result of the pedestrian re-recognition model.

Specifically, the pedestrian re-identification model is obtained by pre-training, wherein the training step comprises the following steps:

And the expression after decomposition is

And the original feature vector is processed by an OSNet network

And the expression after decomposition is

Identity feature B of source domain_iAnd domain specific enhancement features

Domain invariant identity feature with target domain B_jAnd domain specific enhancement features

Combining to obtain a reconstructed feature vector

And

the expression is as follows:

the original feature vector F_i ^s、

And reconstructing the feature vector

Mutually arranging and combining in different orders to obtain a reconstructed feature vector group

Reconstructing the feature vector set

Inputting the cross-domain face recognition loss function, calculating to obtain a first loss value, and adjusting a first parameter of a pedestrian re-recognition model to be trained according to the first loss value by adopting a back propagation algorithm; the expression of the cross-domain face recognition loss function is as follows:

wherein m is

The index number of the element(s) in (1),

representing the (cosine) similarity of the corresponding subtends,

Reconstructing another set of feature vectors

Inputting the second loss value into a domain classification loss function, calculating to obtain a second loss value, and adjusting a second parameter of the pedestrian re-recognition model to be trained according to the second loss value by adopting a back propagation algorithm; wherein, the expression of the domain classification loss function is:

And repeating the steps, carrying out circulating iterative training until the set iteration times is finished, and taking the model corresponding to the minimum sum of the first loss value and the second loss value in multiple times of training as the re-identification model of the trained pedestrian.

Inputting the preprocessed figure image to be recognized into a pedestrian re-recognition model, calculating the similarity between the feature vector in the figure image to be recognized and the feature vector of the figure image in the sample library, comparing the similarity with a set threshold, judging that the face image is the same person if the similarity is larger than the threshold, and obtaining the recognition result of the pedestrian re-recognition model if the face image is not the same person if the similarity is not larger than the threshold. In this embodiment, the sample library may be a source domain, and the person image to be identified may be from a target domain. In the testing process, the target image j is input into a trained pedestrian re-identification model, and features consisting of base information shared by the fields and enhancement information specific to the fields are used

And performing character matching to obtain a matching result.

Therefore, in the embodiment, the pedestrian re-recognition result is obtained by inputting the acquired figure image to be recognized into the trained pedestrian re-recognition model; the pedestrian re-recognition model decomposes each feature vector into a domain-invariant identity feature and a domain-specific enhancement feature by extracting the feature vectors of the figure images from the source domain and the target domain, and performs cross-domain feature recombination, so that the obtained reconstructed feature vector group not only increases the diversity of samples used in training, but also inherits reliable identity labels in the source domain, and can well represent the data distribution of the source domain and the target domain; through the decomposition and combination of the objective loss function, fewer samples are required and the efficiency of identification is high.

Referring to fig. 6, a third embodiment of the present invention relates to a training system of a pedestrian re-identification model, including:

the processing module 601 is configured to extract original feature vectors of character images of a source domain and a target domain of a training sample, and decompose the original feature vectors through a pedestrian re-recognition model to obtain a domain invariant identity feature and a domain specific enhancement feature;

the character images comprise character images with labels in the source domain and character images without labels in the target domain, and the character images with the labels in the source domain can be marked by a user according to needs and can also be obtained from an existing character image library.

Further, the character image is a group of character images continuously acquired by the camera device, and before the character image is used, the character image needs to be preprocessed to obtain a preprocessed character image, where the preprocessing includes: adjusting illumination, histogram equalization processing and normalization processing. Wherein, should satisfy when adjusting illumination: reducing the brightness of the image in the highlight area, improving the brightness of the image in the shadow area, and keeping the brightness of the image in the transition area; carrying out gray level transformation on the figure image by adopting histogram equalization processing so as to facilitate smooth operation of the system; and carrying out normalization processing on pixel values in the human image to finally obtain a standard image in the same form. In addition, the image capturing device may be a person image obtained through a camera or the internet, for example, the image to be detected may be an image obtained by an electronic device through a camera of a smart phone, a tablet computer, an electronic eye, or the like; alternatively, the image may be an image acquired by the electronic device through the internet, for example, an image captured randomly from the internet, or an image transmitted by another device and received by the electronic device through a social application installed on the electronic device, and the source of the person image is not limited here.

Further explanation, for D from the source domain_sExtracting the original characteristic vector F of the character image i_i ^sAnd the original feature vector F is processed by an OSNet network_i ^sDecomposition into domain-invariant identity features B_iAnd domain specific enhancement features

And the expression after decomposition is

And the original feature vector is processed by an OSNet network

And the expression after decomposition is

A reconstructing module 602, configured to reconstruct the original feature vector, the domain-invariant identity feature, and the domain-specific enhancement feature to obtain a reconstructed feature vector group;

combining the domain invariant identity feature and the domain specific enhancement feature of the source domain with the domain invariant identity feature and the domain specific enhancement feature of the target domain to obtain a plurality of reconstructed feature vectors; wherein each reconstructed feature vector comprises a domain-invariant identity feature and a domain-specific enhancement feature.

Further explaining, the domain invariant identity characteristic B of the source domain_iAnd domain specific enhancement features

Combining to obtain a reconstructed feature vector

And

the expression is as follows:

further arranging and combining the original characteristic vectors and the reconstructed characteristic vectors in different orders to obtain a reconstructed characteristic vector group;

further, the original feature vector F is described_i ^s、

And reconstructing the feature vector

Mutually arranging and combining in different orders to obtain a plurality of different reconstructed feature vector groups, for example:

a calculating module 603, configured to input the reconstructed feature group into a cross-domain face recognition loss function and a domain classification loss function to calculate a corresponding cross-domain face recognition loss and a corresponding domain classification loss;

in this embodiment, one of the feature vector groups is reconstructed

wherein m is

The index number of the element(s) in (1),

representing the (cosine) similarity of the corresponding subtends,

Reconstructing another set of feature vectors

And the control training module 604 is used for controlling all training samples to carry out the training of loop iteration, and selecting the model with the minimum sum of cross-domain face recognition loss and domain classification loss as the trained pedestrian re-recognition model.

And repeating the steps, carrying out the cycle iteration training until all the training samples finish the cycle iteration, and taking the model corresponding to the minimum sum of the first loss value and the second loss value in the multiple times of training as the trained pedestrian re-identification model.

As can be seen, in the present embodiment, the person image is acquired by the acquisition module; extracting feature vectors in the character image through a processing module, and decomposing the original feature vectors through a pedestrian re-identification model to obtain domain invariant identity features and domain specific enhancement features; performing feature recombination through a reconstruction module to obtain a reconstructed feature vector group; and finally, training to obtain a pedestrian re-recognition model through a calculation module and a control training module. The pedestrian re-recognition model decomposes each feature vector into a domain-invariant identity feature and a domain-specific enhancement feature by extracting the feature vectors of the figure images from the source domain and the target domain, and performs cross-domain feature recombination, so that the obtained reconstructed feature vector group not only increases the diversity of samples used in training, but also inherits reliable identity labels in the source domain, and can well represent the data distribution of the source domain and the target domain; through the decomposition and combination of the objective loss function, fewer samples are required and the efficiency of identification is high.

Referring to fig. 7, a fourth embodiment of the present invention relates to a computer device, which includes a memory 701, a processor 702, and a computer program stored in the memory 701 and executable on the processor 702, wherein the processor 702 implements the method for training the pedestrian re-recognition model according to any one of the first embodiment when executing the computer program, or the processor 702 implements the method for recognizing the pedestrian according to the second embodiment when executing the computer program.

The memory 701 and the processor 702 are coupled by a bus, which may comprise any number of interconnecting buses and bridges that couple one or more of the various circuits of the processor 702 and the memory 701 together. The bus may also connect various other circuits such as peripheral devices 703, voltage regulators 704, and power management circuits, which are well known in the art, and therefore, will not be described any further herein. A bus interface provides an interface between the bus and the transceiver. The transceiver may be one element or a plurality of elements, such as a plurality of receivers and transmitters, providing a means for communicating with various other apparatus over a transmission medium. Data processed by the processor 702 may be transmitted over a wireless medium through an antenna, which may receive the data and transmit the data to the processor 702.

The processor 702 is responsible for managing the bus and general processing, and may also provide various functions including timing, peripheral interfaces, voltage regulation, power management, and other control functions. And memory 701 may be used for storing data used by processor 702 in performing operations.

Those skilled in the art will appreciate that the architecture shown in fig. 7 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.

A fifth embodiment of the present invention relates to a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements a training method for a pedestrian re-recognition model as described in any one of the first embodiments above, or which, when executed by a processor, implements a pedestrian re-recognition method as described in the second embodiments above.

The application is operational with numerous general purpose or special purpose computing system environments or configurations. For example: personal computers, server computers, hand-held or portable devices, tablet-type devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like. The application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.

In summary, the training method, the pedestrian re-recognition method and the system for the pedestrian re-recognition model of the present invention extract the feature vectors of the character images from the source domain and the target domain, decompose each feature vector into a domain-invariant identity feature and a domain-specific enhancement feature, and perform cross-domain feature reorganization to increase the diversity of the samples used in the training; the recombined characteristics inherit the reliable identity label and can well represent the data distribution of the source domain and the target domain; in addition, the reorganized features are decomposed and combined under the constraint of cross-domain face recognition loss and domain classification loss, and the recognition efficiency is further improved. Therefore, the invention effectively overcomes various defects in the prior art and has high industrial utilization value.

The foregoing embodiments are merely illustrative of the principles and utilities of the present invention and are not intended to limit the invention. Any person skilled in the art can modify or change the above-mentioned embodiments without departing from the spirit and scope of the present invention. Accordingly, it is intended that all equivalent modifications or changes which can be made by those skilled in the art without departing from the spirit and technical spirit of the present invention be covered by the claims of the present invention.

Claims

1. A training method of a pedestrian re-identification model is characterized by comprising the following steps:

2. The training method of the pedestrian re-recognition model according to claim 1, characterized in that: the step of respectively extracting the original feature vectors of the character images of one source domain and one target domain of the training sample, and obtaining the domain-invariant identity features and the domain-specific enhancement features from the original feature vectors through a pedestrian re-recognition model comprises the following steps:

B＝(1-O(F))⊙F,

E＝O(F)⊙F，

wherein, T is 4; g (Ft)) For length spanning input F^tA vector of the entire channel dimension.

3. The training method of the pedestrian re-identification model according to claim 1, wherein the step of reconstructing the original feature vectors, the domain-invariant identity features and the domain-specific enhancement features to obtain the reconstructed feature vector set comprises:

4. The method for training a pedestrian re-recognition model according to claim 3, wherein the step of recombining the domain-invariant identity features and the domain-specific enhancement features of the human images of the source domain and the target domain to obtain the first reconstructed feature vector and the second reconstructed feature vector comprises:

5. The training method of the pedestrian re-recognition model according to claim 1, wherein the cross-domain face recognition loss function is:

wherein m is an element index number in the reconstructed feature vector group;

representing the (cosine) similarity of the corresponding alignment;

6. The training method of the pedestrian re-identification model according to claim 1, wherein the domain classification loss function is:

7. A pedestrian re-identification method is characterized in that: the method comprises the following steps:

acquiring a figure image to be identified;

inputting a character image to be recognized into the pedestrian re-recognition model according to any one of claims 1 to 6, extracting a feature vector in the character image to be recognized, calculating the similarity between the feature vector in the character image to be recognized and the feature vector of the character image in the sample library, comparing the similarity with a set threshold, if the similarity is larger than the threshold, judging that the face image is the same person, otherwise, judging that the face image is not the same person, and obtaining the recognition result of the pedestrian re-recognition model.

8. A training system for a pedestrian re-recognition model, comprising:

the calculation module is used for inputting the reconstruction characteristic vector group into a cross-domain face recognition loss function and a domain classification loss function to calculate corresponding cross-domain face recognition loss and domain classification loss;

9. A computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements a training method of a pedestrian re-recognition model according to any one of claims 1 to 6 when executing the computer program or implements a pedestrian re-recognition method according to claim 7 when executing the computer program.

10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out a method for training a pedestrian re-recognition model according to any one of claims 1 to 6, or which, when being executed by a processor, carries out a method for pedestrian re-recognition according to claim 7.