CN114782979A - Training method and device for pedestrian re-recognition model, storage medium and terminal - Google Patents
Training method and device for pedestrian re-recognition model, storage medium and terminal Download PDFInfo
- Publication number
- CN114782979A CN114782979A CN202210204501.6A CN202210204501A CN114782979A CN 114782979 A CN114782979 A CN 114782979A CN 202210204501 A CN202210204501 A CN 202210204501A CN 114782979 A CN114782979 A CN 114782979A
- Authority
- CN
- China
- Prior art keywords
- loss function
- pedestrian
- human body
- identification model
- local
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/213—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
- G06F18/2135—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on approximation criteria, e.g. principal component analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2415—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/047—Probabilistic or stochastic networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- Molecular Biology (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Mathematical Physics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Biology (AREA)
- Probability & Statistics with Applications (AREA)
- Image Analysis (AREA)
Abstract
The application relates to a training method and device for a pedestrian re-recognition model, a storage medium and a terminal. The method comprises the following steps: collecting a human body data sample set; inputting the human body data sample set into a hierarchical self-attention network model in a pedestrian re-identification model, and outputting human body global features of the pedestrian re-identification model; acquiring the human body local characteristics of the pedestrian re-identification model according to the human body global characteristics; obtaining a local loss function and a global loss function of the pedestrian re-identification model according to the global human body feature, the local human body feature and a classifier, a triple loss function, a class center loss function, a batch normalization algorithm and a loss function based on an angle interval in the pedestrian re-identification model; and performing iterative training on the pedestrian re-recognition model according to the local loss function and the global loss function in a back propagation mode. The pedestrian re-identification model comprises a hierarchical self-attention network model, and can identify human global features and human local features, so that people can be identified.
Description
Technical Field
The invention relates to the technical field of computer vision, in particular to a training method, a training device, a storage medium and a terminal for a pedestrian re-identification model.
Background
Pedestrian re-identification (Person re-identification), also known as pedestrian re-identification, is a technique that uses computer vision techniques to determine whether a particular pedestrian is present in an image or video sequence. Is widely considered as a sub-problem for image retrieval. Given a monitored pedestrian image, the pedestrian image is retrieved across the device. The visual limitation of a fixed camera is overcome, the pedestrian detection/pedestrian tracking technology can be combined, and the method can be widely applied to the fields of intelligent video monitoring, intelligent security and the like.
The application provides a training method, a device, a storage medium and a terminal of a pedestrian re-identification model, wherein a hierarchical self-attention network model network is used as a backbone model, and the global characteristics of a human body can be directly obtained; iterative training can be carried out on the pedestrian weight recognition model according to the obtained local loss function and the obtained global loss function, so that the pedestrian weight recognition model can recognize the global features and the local features of the human body, and then the pedestrian can be recognized.
Disclosure of Invention
The embodiment of the application provides a training method and device for a pedestrian re-recognition model, a storage medium and a terminal. The following presents a simplified summary in order to provide a basic understanding of some aspects of the disclosed embodiments. This summary is not an extensive overview and is intended to neither identify key/critical elements nor delineate the scope of such embodiments. Its sole purpose is to present some concepts in a simplified form as a prelude to the more detailed description that is presented later.
In a first aspect, an embodiment of the present application provides a training method for a pedestrian re-recognition model, where the method includes:
collecting a human body data sample set of a pedestrian re-identification model;
inputting the human body data sample set into a hierarchical self-attention network model in the pedestrian re-identification model, and outputting human body global features of the pedestrian re-identification model;
acquiring the human body local features of the pedestrian re-identification model according to the human body global features;
obtaining a local loss function and a global loss function of the pedestrian re-identification model according to the human body global feature, the human body local feature, and a classifier, a triple loss function, a class center loss function, a batch normalization algorithm and a loss function based on an angle interval in the pedestrian re-identification model;
and performing iterative training on the pedestrian re-recognition model according to the local loss function and the global loss function in a back propagation mode to obtain the trained pedestrian re-recognition model.
Optionally, the acquiring a human body data sample set of the pedestrian re-identification model includes:
acquiring an RGB image containing human body dimensions and width and height characteristics;
and carrying out image enhancement operation on the RGB image to obtain a human body data sample set of the pedestrian re-identification model.
Optionally, the obtaining of the human body local feature of the pedestrian re-identification model according to the human body global feature includes:
inputting the human body global features into an adaptive global average pooling layer of the pedestrian re-identification model;
and cutting the human body global features output by the self-adaptive global average pooling layer into the human body local features along the depth direction.
Optionally, the classifier includes a local classifier and an angle classifier;
the obtaining of the local loss function and the global loss function of the pedestrian re-identification model according to the human global feature, the human local feature, and the classifier, the triple loss function, the class center loss function, the batch normalization algorithm and the loss function based on the angle interval in the pedestrian re-identification model includes:
inputting the human body local features into the local classifier corresponding to the human body local features, and outputting the local feature class probability of the pedestrian re-identification model;
and calculating the local loss function of the pedestrian re-identification model according to the local feature class probability and the real label corresponding to the local feature class probability.
Optionally, the obtaining the local loss function and the global loss function of the pedestrian re-identification model according to the human body global feature, the human body local feature, and the classifier, the triple loss function, the class center loss function, the batch normalization algorithm and the loss function based on the angle interval in the pedestrian re-identification model includes:
acquiring a free Euclidean space representation loss function of the pedestrian re-identification model according to the human body global feature, the triple loss function and the class center loss function;
acquiring a hypersphere characterization loss function of the pedestrian re-identification model according to the human body global feature, the batch normalization algorithm, the angle classifier and the loss function based on the angle interval;
and acquiring the global loss function of the pedestrian re-identification model according to the local loss function, the free Euclidean space characterization loss function and the hypersphere characterization loss function.
Optionally, the obtaining a free euclidean space characterization loss function of the pedestrian re-identification model according to the human global feature, the triplet loss function, and the class center loss function includes:
extracting an anchor sample of the pedestrian re-recognition model according to the human body global features;
obtaining a class loss function of the pedestrian re-identification model according to the anchor sample, the homogeneous sample of the anchor sample, the heterogeneous sample of the anchor sample and the triple loss function;
obtaining a distance loss function of the pedestrian re-identification model according to the anchor sample, the class center of the class to which the anchor sample belongs and the class center loss function;
and acquiring a free Euclidean space representation loss function of the pedestrian re-identification model according to the category loss function and the distance loss function.
Optionally, the obtaining a hypersphere characterization loss function of the pedestrian re-identification model according to the human body global feature, the batch normalization algorithm, the angle classifier, and the loss function based on the angle interval includes:
normalizing the dimension distribution of each characteristic channel of the human body global characteristic through the batch normalization algorithm;
inputting the human body global features after normalization into the angle classifier, and obtaining a final sample representation of the pedestrian re-identification model;
and calculating a hypersphere characterization loss function of the pedestrian re-identification model according to the final sample characterization and the loss function based on the angle interval.
In a second aspect, an embodiment of the present application provides a training apparatus for a pedestrian re-recognition model, where the apparatus includes:
the data sample acquisition module is used for acquiring a human body data sample set of the pedestrian re-identification model;
the global feature acquisition module is used for inputting the human body data sample set into a hierarchical self-attention network model in the pedestrian re-identification model and outputting human body global features of the pedestrian re-identification model;
the local characteristic acquisition module is used for acquiring the human body local characteristics of the pedestrian re-identification model according to the human body global characteristics;
a loss function obtaining module, configured to obtain a local loss function and a global loss function of the pedestrian re-identification model according to the human body global feature, the human body local feature, and a classifier, a triple loss function, a class center loss function, a batch normalization algorithm and a loss function based on an angle interval in the pedestrian re-identification model;
and the model training module is used for performing iterative training on the pedestrian re-recognition model according to the local loss function and the global loss function in a back propagation mode to obtain the trained pedestrian re-recognition model.
In a third aspect, embodiments of the present application provide a computer storage medium having stored thereon a plurality of instructions adapted to be loaded by a processor and to perform the above-mentioned method steps.
In a fourth aspect, an embodiment of the present application provides a terminal, which may include: a processor and a memory; wherein the memory stores a computer program adapted to be loaded by the processor and to perform the above-mentioned method steps.
The technical scheme provided by the embodiment of the application can have the following beneficial effects:
in the embodiment of the application, the training method of the pedestrian re-identification model comprises the steps of firstly collecting a human body data sample set of the pedestrian re-identification model; inputting the human body data sample set into a hierarchical self-attention network model in the pedestrian re-identification model, and outputting the human body global features of the pedestrian re-identification model; then according to the human body global features, obtaining human body local features of the pedestrian re-identification model; secondly, according to the human body global features, the human body local features, a classifier, a triple loss function, a class center loss function, a batch normalization algorithm and a loss function based on angle intervals in the pedestrian re-identification model, obtaining a local loss function and a global loss function of the pedestrian re-identification model; and finally, performing iterative training on the pedestrian re-recognition model in a back propagation mode according to the local loss function and the global loss function to obtain the trained pedestrian re-recognition model. The method of the embodiment of the application adopts a technical means different from the prior art, and uses a hierarchical self-attention network model as a backbone model, so that the problem that the global characteristics of a human body cannot be directly obtained in the prior art can be solved; iterative training can be carried out on the pedestrian re-identification model according to the obtained local loss function and the global loss function, so that the pedestrian re-identification model can identify the human global features and the human local features, and then the pedestrian can be identified.
In the embodiment of the application, the training method of the pedestrian re-identification model not only introduces the attribute loss function included by the local loss function, but also introduces the free Euclidean space characterization loss function and the hypersphere characterization loss function, the free Euclidean space characterization loss function and the hypersphere characterization loss function can directly measure and learn the human body global features obtained from the hierarchical self-attention network model, and the gradient can directly act on the parameters of the last layer of the hierarchical self-attention network model during reverse propagation, so that the pedestrian re-identification model can be trained more efficiently, and better characterization capability is obtained, and a better identification effect is achieved.
In the embodiment of the application, the training method of the pedestrian re-identification model introduces the appearance (attribute) characteristic information included in the local characteristics of the human body, and various characteristic information can be fully fused and utilized through the PAC classifier corresponding to the appearance (attribute) characteristic information and the characteristic fusion device between the local blocks, so that the abundance of the characteristic information is improved.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention.
Fig. 1 is a flowchart illustrating a training method of a pedestrian re-identification model according to an embodiment of the present application;
FIG. 2 is a schematic structural diagram of a training method for a pedestrian re-identification model according to an embodiment of the present application;
FIG. 3 is another schematic structural diagram of a training method for a pedestrian re-identification model according to an embodiment of the present application;
FIG. 4 is a schematic flow chart of another training method for a pedestrian re-identification model according to an embodiment of the present disclosure;
FIG. 5 is a schematic diagram of an apparatus for training a pedestrian re-identification model according to an embodiment of the present disclosure;
fig. 6 is a schematic diagram of a terminal according to an embodiment of the present application.
Detailed Description
The following description and the drawings sufficiently illustrate specific embodiments of the invention to enable those skilled in the art to practice them.
It should be understood that the described embodiments are only some embodiments of the invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The implementations described in the following exemplary examples do not represent all implementations consistent with the present invention. Rather, they are merely examples of systems and methods consistent with certain aspects of the invention, as detailed in the appended claims.
In the description of the present invention, it is to be understood that the terms "first," "second," and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance. The specific meanings of the above terms in the present invention can be understood in a specific case to those of ordinary skill in the art. In addition, in the description of the present invention, "a plurality" means two or more unless otherwise specified. "and/or" describes the association relationship of the associated objects, meaning that there may be three relationships, e.g., a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship.
Re-identification of pedestrians, called ReID for short; in a monitoring video, due to the camera resolution and the shooting angle, a face picture with high quality cannot be obtained, and cross-lens tracking on a specific target cannot be completed under the condition that face recognition fails; in this case, the ReID technique becomes a very important alternative.
With the development of deep learning in recent years, the ReID technology has a very huge breakthrough, and from a technical perspective, the ReID technology can be mainly classified into characterization-based learning, metric-based learning, local feature-based learning and the like. For the model structure of the algorithm, the ReID technology mostly adopts the traditional mature CNN network technology; the CNN network technology adopts a local sliding window filtering operation in the conventional digital image processing, and although local features can be effectively extracted, the problem of relatively limited receptive field also exists: global features with a large scope of receptive fields are lacking. In order to expand the sensing area of the CNN network and obtain the features with a large sensing area, the mainstream method is to implement the construction of a feature pyramid from local to global through a multilayer convolution-pooling-stacking structure; in addition, the CNN network technology is only good at analyzing image information and is not good at processing information of other modalities, especially timing information (such as text, voice, long video, etc.), and the ReID technology using the CNN network will largely limit the ReID technology to spatial information and image modality data, which is not beneficial to the utilization of multi-modality data.
The invention provides a training method and device for a pedestrian re-recognition model, a storage medium and a terminal. The technical means different from the prior art is adopted, and the hierarchical self-attention network model is used as the backbone model, so that the problem that the global characteristics of the human body cannot be directly obtained in the prior art can be solved; iterative training can be carried out on the pedestrian re-identification model according to the obtained local loss function and the global loss function, so that the pedestrian re-identification model can identify the human global features and the human local features, and then the pedestrian can be identified.
A method for training a pedestrian re-identification model according to an embodiment of the present application will be described in detail below with reference to fig. 1 to 4.
Referring to fig. 1-3, a flow chart of a training method of a pedestrian re-identification model is provided for the embodiment of the present application. As shown in fig. 1-3, the method of the embodiments of the present application may include the steps of:
and S110, collecting a human body data sample set of the pedestrian re-identification model.
The S110 includes: and acquiring an RGB image containing human body dimensions and width and height characteristics.
In the embodiment of the present application, according to the human scale and the aspect ratio, the RGB image input from the hierarchical self-attention network model can be designed as [ B,3,224 ] 224 on the premise of ensuring the accuracy of the hierarchical self-attention network model]And [ B,3,224,112]Two RGB images, both of which have an input shape of ([ B, C, H, W)]) Wherein B represents the data amount of the batch of RGB images, C represents the number of channels, H represents height, and W represents width. Aspect ratio of RGB image of these two input shapesAre respectively 1: 1 and 1: 2; in the aspect ratio of the RGB image of 1: 1, the RGB image can be directly input into the hierarchical self-attention network model, the RGB image can be better adapted to the hierarchical self-attention network model, and the hierarchical self-attention network model can more fully extract the human body local features in the RGB image. Since Resize is adjusted to 1 in the RGB image: 1, the body shape characteristics (i.e. the body size and the aspect ratio characteristics) of the pedestrian can be damaged, and in practical cases, the body shape ratio of the pedestrian is not higher than 1: 2, considering the body size scale of the pedestrian, the input of the hierarchical self-attention network model can select the input shape as [ B,3,224,112]]The RGB image of (a).
And carrying out image enhancement operation on the RGB image to obtain a human body data sample set of the pedestrian re-identification model. The image enhancement operation may be Zero Padding, i.e. completed with 0.
In the embodiment of the present application, in order to adapt to the hierarchical self-attention network model, after a certain small batch of RGB image input I is adjusted to [ B,3,224,112], image enhancement operations may be performed on the left and right sides of the RGB image in the row direction, respectively, to obtain a small batch of input RGB image I ' of [ B,3,224,112], where the effective region (original image content region) in the new RGB image I ' is centered on the new RGB image I '. The formula of the image enhancement operation is as follows:
wherein padding ═ padleft,padright,padtop,paddown)
In the formula, ZeroPad means a padding operation with 0, width means the width W of the RGB image, padleftIndicates the amount of left supplement, padrightIndicates the amount of rightward supplement, padtopIndicates the amount of upward supplement, paddownIndicating the amount of refill down.
Several new RGB images I' may be formed into a human body data sample set.
And S120, inputting the human body data sample set into a hierarchical self-attention network model in the pedestrian re-identification model, and outputting human body global features of the pedestrian re-identification model.
In the embodiment of the application, a hierarchical self-attention network model is adopted as a backbone model of the pedestrian re-identification model, and the hierarchical self-attention network model is a SwinTransformer network obtained based on a Transformer network improvement. The hierarchical self-attention network model consists of 1 local block embedding vectorization stage and s stages, wherein each stage consists of 1 local block feature fusion and t transform modules. Wherein the Transformer represents a network structure based on an attention-deficit mechanism; compared with a CNN (natural language processing) network technology, the Transformer network technology has the capability of directly extracting the global characteristics of a human body in a long range, can realize the extraction of the global characteristics of the human body without a multi-layer stacking mode, can solve the problem that the CNN network technology cannot directly obtain the global characteristics of the human body, and enables the Transformer network to pay attention to the characteristics of a plurality of important areas through a multi-head self-attention mechanism (multi-head attention) in the Transformer network. Meanwhile, the Transformer network has better multi-modal fusion capability, is very good at processing time sequence information, is beneficial to the expansion of the ReiD technology on multi-modal data, and can promote the ReiD technology by fusing other modal data; the Swin Transformer represents a Transformer model designed aiming at the field of computer vision, and the Swin Transformer model constructed based on the Transformer has the capability of extracting the human global features in a long range, so that the quality of the human global features is improved, and the characterization learning is more effective.
The local block Embedding vectorization is Patch Embedding, and is represented by PE, and the process of obtaining the local block Embedding vectorization PE can be represented by the following expression:
PE(I′,psize,edim)=LayerNorm(Conv2D(I′,3,edim,psize,psize))
Conv2D(x,inputchannel,outputchannelksize, stride) is a 2D convolution operation
Wherein psize represents the size of the cut block, edim represents the size of the output characteristic channel, Conv2D represents the 2D convolution, and LayerNorm represents the normalization layer; input unitchannelDepth dimension, output, representing the input tensorchannelThe depth dimension of the output tensor is represented, ksize represents the size of a convolution kernel, and stride represents the downsampling rate during convolution operation; 3 denotes a number of channels of 3; x represents the input of a function, and broadly refers to any input, and x may be the RGB image I' input in the embodiment of the present application.
The local block feature fusion is PatchMerging and is expressed by PM, the Transformer module is Transformer Block and is expressed by TB, and the network module based on the Transformer structure is expressed; given an arbitrary feature tensor f, taking f as the input of the function, the local block feature fusion, the Transformer module, and the sub-modules that make up the Transformer module can be represented by the following expressions:
PM(f)=Linear(LayerNorm(Downsample(f,2)),4*inputdim,2*inputdim)
Linear(x,inputchannel,outputchannel) Is a full connection layer
TB(f)=SWMSABlock(WMSABlock(f))
SWMSABlock(f)=MLP(LayerNorm(GSWMSA(f)))+GSWMSA(f)
WMSABlock(f)=MLP(LayerNorm(GWMSA(f)))+GWMSA(f)
GWMSA(f)=WMSA((LayerNorm(f)))+f
GSWMSA(f)=SWMSA((LayerNorm(f)))+f
In the formula, down sample (f,2) represents that f is down-sampled with a stride of 2, 4 inputdimRepresenting 4 times the dimensional input, 2 inputdimRepresenting a dimension input of 2 times, Linear representing a fully connected layer, MLP representing a fully connected layer with an activation function and Dropout, WMSABlock representing a window multi-headed self-attention module, swmsoblock representing a moving window multi-headed self-attention module, WMSA representing a window multi-headed self-attention network, and SWMSA representing a moving window multi-headed attention network.
In the embodiment of the application, a new RGB image I' (i.e. an RGB image in a human body data sample set) obtained by performing a series of image enhancement operations on a given small batch of RGB images I is input to a hierarchical self-attention network model, and a normalization layer is required to be connected after a last Stage of the hierarchical self-attention network model to obtain a final human body global feature U; the calculation process after the human body data samples are input into the hierarchical self-attention network model can be expressed by the following mathematical formula:
wherein E represents the output of PE, R represents any real number, and B represents the amount of image data batch size of a certain batch; TB×2Indicating that there are 2 TBs at this stage that are composed in a nested fashion, TB×6Indicating that 6 TBs in this phase are composed in a nested fashion; d1Representing the output of the first stage in a hierarchical self-attention network model, D2Representing the output of the second stage in the hierarchical self-attention network model, D3Representing the output of the third stage in the hierarchical self-attention network model, D4Representing the output of a fourth stage in the hierarchical self-attention network model, wherein U represents the human global features output by the normalization layer after the output of the fourth stage is input to the normalization layer; 96. 192, 384 and 768 are changed according to the structural change of the hierarchical self-attention network model, and since the size psize of the segmentation block of the hierarchical self-attention network model is fixed to 4, that is, 4 × 4 to 16 small blocks are cut out from the entire RGB image, and the RGB image has 3 channels, 16 small blocks are stacked along the feature channels, a tensor with a channel size of 16 × 3 to 48 and an RGB image with a shape of (B, H/4, W/4,48) are obtained, and finally the number of channels is doubled by convolution to 96. These operations can be performed together by setting the step size stride of Conv2D of the PE module, so these values are also fixed to the following. In the embodiment of the application, too few or too many cutting blocks can cause the reduction of the calculation efficiency.
In the embodiment of the present application, the input of the hierarchical self-attention network model is a fixed size I for the sake of real-time performanceshape=[B,3,224,224]The new RGB image I' obtained by the RGB image through adjustment and image enhancement operation is ensured on the premise of ensuring the accuracy of the hierarchical self-attention network modelThe size of a slice block of the local block embedding vectorization PE is psize ═ 4, the output dimension of the PE is edim ═ 96, the Stage s ═ 4, which indicates 4 stages Stage in total, and the number of transform modules TB is T ═ 2,2,6,2](corresponding to Stage), and finally the output size of the human body global features is
And S130, acquiring the human body local characteristics of the pedestrian re-identification model according to the human body global characteristics. Specifically, S130 includes:
and inputting the human body global features into an adaptive global average pooling layer of the pedestrian re-identification model. In the embodiment of the application, in the process of training the pedestrian re-identification model, after the global human body feature U is acquired from the hierarchical self-attention network model, the U is unfolded to form the global human body feature UWill U2dInputting the data into a self-adaptive global average pooling layer (adaptive AvgPool) with the window size of (6,1) to obtain a human body global feature U 'output by the self-adaptive global average pooling layer, wherein at the moment, U' belongs to R(B,768,6,1)And the human body global feature U' is subsequently used for cutting the human body local feature and calculating the local loss function of the pedestrian re-identification model.
In the embodiment of the application, after the human body global feature U is obtained from the hierarchical self-attention network model, the human body global feature U can be directly input into the adaptive global average pooling layer with the window size of (1) to obtain the human body global feature U ', and at this time, the human body global feature U' is obtainedAnd respectively inputting the human body global features U' into a free Euclidean space characterization learning module and a hypersphere characterization learning module to calculate the characterization loss under different spaces.
And cutting the human body global features output by the self-adaptive global average pooling layer into the human body local features along the depth direction.
In the embodiment of the application, the human body global feature U 'output by the adaptive global average pooling layer is cut into six blocks along the depth direction, so that six human body local features can be obtained, and each human body local feature can be U'i∈R(B,768)The six human body local characteristics can comprise the head, the shoulders, the abdomen, the thigh parts, the shank parts, the shoes parts and other related characteristics of the human body.
From the corresponding appearance posture label of the local area, the six human body local characteristics can also comprise appearance characteristic information such as backpack color, hat color, coat style, coat color, coat removing style, coat removing color and the like; the human body local features can also be labels such as pedestrian behaviors; by these attributes, more effective surveillance information can be provided for feature extraction of local details of pedestrian appearance. The color of the backpack of the human body local characteristics can be 5 types, the color of the hat can be 5 types, the style of the upper garment can be 4 types, the color of the upper garment can be 12 types, the style of the lower garment can be 4 types, and the color of the lower garment can be 12 types.
S140, obtaining a local loss function and a global loss function of the pedestrian re-identification model according to the human body global feature, the human body local feature, and a classifier, a triple loss function, a class center loss function, a batch normalization algorithm and a loss function based on an angle interval in the pedestrian re-identification model; the classifier includes a local classifier and an angle classifier.
Specifically, the S140 includes:
inputting the human body local features into the local classifier corresponding to the human body local features, and outputting the local feature class probability of the pedestrian re-identification model; the local classifier includes: a human body local feature classifier and a human body local attribute classifier.
In the embodiment of the application, when the six local human body features are related features of the head, the shoulders, the abdomen, the thighs, the shanks, the shoes and the like of the human body, the six local human body features can be respectively input into corresponding human body local feature classifiers to obtain the local feature class probability of each local human body feature. As shown in fig. 2, the six human body local feature classifiers are a PFC classifier 1, a PFC classifier 2, … …, and a PFC classifier 6; the PFC classifier is a Part-fed classifier, and the following is a related definition of the PFC classifier designed according to the local features of the human body:
PFC(f)=Linear(FeatureEmbed(f),512,classnum)
wherein, FeatureEmbedded (x)
=DropOut(LeakReLU(BN(Linear(x,inputchannel,512))))
Wherein BN represents a Batch normalization layer Batch Norm; LeakReLU denotes the activation function; DropOut denotes the neuron random deletion layer; FeatureEmbed represents a feature fuser for local block features; classnumRepresents the number of classes of the corresponding PFC classifier; in the equation of pfc (f), 512 denotes the input dimension number; in the neuron random elimination layer DropOut, 512 denotes the output dimension number.
Inputting six local human body features into six PFC classifiers respectively to obtain predicted local feature class probabilities of the six local human body features, and performing weighted fusion on the six local feature class probabilities to obtain a final predicted local feature class probability O, which is defined as follows:
Wherein i represents the ith, i is 0,1, 2, 3, 4, 5 or 6; PFCiDenotes the ith PFC classifier, U'iRepresenting local features of the body, o, in the ith classifieriThe local feature class probability of the human body local features output by the ith classifier is represented, and 1 represents the output dimension number; softmax denotes a mapping of the outputs of a plurality of neurons into the (0,1) interval, andthe outputs of all neurons after mapping add up to an activation function of 1.
In the embodiment of the application, when the six local human body features are appearance attribute information of human bodies, such as backpack color, hat color, jacket style, jacket color, lower clothing style, lower clothing color and the like, the six local human body features can be respectively input into corresponding local human body attribute classifiers to obtain the local feature class probability of each local human body feature. As shown in fig. 3, the six human body local attribute classifiers are PAC classifier 1, PAC classifier 2, … …, PAC classifier 6; the PAC classifier is a Part-associated classifier, and the following is related definition of the PAC classifier designed according to local attributes of a human body:
PAC1(V)=Linear(FeatureEmbed(V),512,5)
PAC2(V)=Linear(FeatureEmbed(V),512,5)
PAC3(V)=Linear(FeatureEmbed(V),512,4)
PAC4(V)=Linear(FeatureEmbed(V),512,12)
PAC5(V)=Linear(FeatureEmbed(V),512,4)
PAC6(V)=Linear(FeatureEmbed(V),512,12)
wherein, FeatureEmbedded (V)
=DropOut(LeakReLU(BN(Linear(V,inputchannel,512))))
V is n local block feature combinations
Wherein, in PAC1(V) formula (V) PAC1A classifier for representing the colors of the backpacks, 5 represents that the colors of the backpacks are 5 categories; at the PAC2(V) formula (V) PAC2A classifier for representing the colors of the hat, 5 representing that the colors of the hat are 5 categories; at the PAC3(V) formula (V) PAC3A classifier for representing the style of the jacket, 4 represents that the style of the jacket is 4 categories; at the PAC4(V) in the formula, PAC412, representing a classifier of the coat color, wherein 12 represents that the coat color is 12 categories; at the PAC5(V) formula (V) PAC5Classifier for showing style of lower clothes, 4 lower clothesThe styles are 4 categories; at the PAC6(V) formula (V) PAC612 represents the classification of the hat color, and 12 represents the hat color as 12 categories.
Inputting each human body local feature into a corresponding PAC classifier, the local feature class probability of the human body local feature can be obtained, and the local feature class probability is defined as follows:
wherein i represents the ith, PACiDenotes the ith PAC classifier, U'iRepresenting local features of the body, o, in the ith classifieriAnd representing the local feature class probability of the human body local feature output by the ith classifier.
In the embodiment of the present application, the FeatureEmbed represents a specially designed feature fusion device for fusing a plurality of local block features; the feature fusion of the local blocks is combined and fused according to the probability of the appearance of the local features of the human body in each local block, for example, if the probability of appearance information of the backpack appears in the first three local blocks, V is composed of the characteristics of the first three local blocks, and is fused through FeatureEmbed and then sent to the full connection layer Linear for classification.
And calculating the local loss function of the pedestrian re-identification model according to the local feature class probability and the real label corresponding to the local feature class probability.
In the embodiment of the application, in the process of training the pedestrian re-recognition model, when six human body local features are related features of the head, the shoulders, the abdomen, the thighs, the calves, the shoes and the like of the human body, the output local feature class probability o of the human body local features can be predicted through each PFC classifieriAnd calculating a local loss function of the local characteristics of the human body by using the corresponding real label K, wherein the local loss function is defined as follows:
Ltotal=CrossEntropy(softmax(o1),K)
+CrossEntropy(softmax(o2),K)+…
+CrossEntropy(softmax(o6),K)
in the formula, LtotalNamely, the cross entropy loss function is represented by crossEncopy, a certain pedestrian ID is represented by a real label K, and an image of a pedestrian is bound to the certain pedestrian ID.
In the embodiment of the application, in the process of training the pedestrian re-identification model, when the six local human body features are the appearance feature information of the human body, such as the backpack color, the hat color, the jacket style, the jacket color, the clothes removing style, the clothes removing color and the like, the output local feature class probability o of the local human body features can be predicted by each PAC classifieriAnd corresponding genuine label YiTo calculate the local loss function of the local features of the human body, the definition of the local loss function is as follows:
Lattribute=CrossEntropy(o1,Y1)+CrossEntropy(o2,Y2)+…
+CrossEntropy(o6,Y6)
in this case, the local loss function is also called attribute loss function, where LattributeI.e. the attribute loss function, Y1Indicating backpack color ID, Y2Indicating hat color ID, Y3ID, Y representing the style of the jacket4Indicating jacket color ID, Y5Indicating the lower clothes style ID, Y6Indicating the lower garment color ID.
In the embodiment of the application, in order to solve the characterization learning problem of pedestrian re-identification, the RGB image and the ID label information are fully utilized, the appearance attribute information is introduced on the basis of utilizing the ID label information, and a PAC classifier and a feature fusion device between local blocks are designed aiming at the appearance attribute information, so that the information abundance degree of characterization is improved.
In the embodiment of the application, aiming at the problem of human body global characteristics, an ID representation learning Module is designed, which means an ID Embedding Module, namely IDEM for short; the ID representation learning module comprises a free Euclidean space representation learning module and a hypersphere representation learning module, and the two loss functions can be better utilized by the combined design after the global features of the human body are respectively subjected to representation loss calculation in different spaces, so that the features can be better supervised and learned. The free Euclidean space representation learning module can calculate a free Euclidean space representation loss function of the human body global features in the free Euclidean space; the hypersphere representation learning module can carry out normalization constraint on human body global features through a batch normalization algorithm, and maps the human body global features onto a hypersphere, and obtains hypersphere representation loss functions through loss functions based on angle intervals and included by an angle classifier and an angle punishment mechanism. The design of the ID representation learning module enables the pedestrian re-identification model to separate the human body global features of different pedestrians, and the pedestrian re-identification model has better performance.
In an embodiment of the present application, the S140 includes:
in a free Euclidean space representation learning module, acquiring a free Euclidean space representation loss function of the pedestrian re-identification model according to the human body global feature, the triple loss function and the class center loss function; the free-euclidean-based space characterization learning module is composed of a triple loss function (i.e., Triplet loss) and a class-centered loss function (i.e., centrlos). The method specifically comprises the following steps:
and extracting an anchor sample of the pedestrian re-recognition model according to the human body global features. In the embodiment of the application, before calculating the class loss function of the pedestrian re-identification model, the anchor sample a needs to be selected from the human body global features of the small batch of training data according to the real ID label1。
And obtaining a category loss function of the pedestrian re-identification model according to the anchor sample, the homogeneous sample of the anchor sample, the heterogeneous sample of the anchor sample and the triple loss function. In the embodiment of the present application, the anchor sample a may be1Anchor specimen a1Of the same kind of sample a2And anchor sample a1Heterogeneous sample a of3Form a triplet (a)1,a2,a3) (ii) a According to said triplet (a)1,a2,a3) And triple loss function acquisitionClass loss function L of pedestrian re-identification modeltriplet(a1,a2,a3)。
And obtaining a distance loss function of the pedestrian re-identification model according to the anchor sample, the class center of the class to which the anchor sample belongs and the class center loss function. In the embodiment of the application, the anchor sample a is calculated according to the class center loss functioniWith anchor specimen aiObtaining the distance loss function of the pedestrian re-identification model according to the distance of the class center C of the class: l is a radical of an alcoholcenter(a1,C)。
And acquiring a free Euclidean space characterization loss function of the pedestrian re-identification model according to the ternary loss function and the distance loss function. In the embodiment of the present application, the class loss function (a)1,a2,a3) With said distance loss function Lcenter(a1And C) adding to obtain the free Euclidean space characterization loss function, wherein the expression is as follows:
LFreeEuc=Ltriplet(aa,a2,a3)+Lcenter(a1,C)。
in a hypersphere characterization learning module, acquiring a hypersphere characterization loss function of the pedestrian re-identification model according to the human body global features, the batch normalization algorithm, the angle classifier and the loss function based on the angle interval; the hypersphere characterization learning module consists of a batch normalization algorithm, an angle classifier and a loss function based on angle intervals. The method specifically comprises the following steps:
normalizing the dimension distribution of each characteristic channel of the human body global characteristic through the batch normalization algorithm; inputting the human body global features after normalization into the angle classifier, and obtaining a final sample representation of the pedestrian re-identification model; and calculating a hypersphere characterization loss function of the pedestrian re-identification model according to the final sample characterization and the loss function based on the angle interval.
In the embodiment of the application, the human body global feature U 'is input into a batch normalization algorithm, and the dimension distribution of each feature channel of the human body global feature U' is normalized through the batch normalization algorithm; after normalization of the human global feature U 'is performed, the human global feature U' is input to an angle classifier that has constrained the weight W to the hypersphere (i.e., constrained W to | W | 1), and then the output of the angle classifier is computed through a loss function based on the angular interval, resulting in a hypersphere characterizing loss function. The computational expression in the hypersphere representation learning module may be:
LSphere=Langular(AC(BN(U′)))
in the formula, LSphereRepresenting a hypersphere characterization loss function, BN representing a batch normalization algorithm BatchNorm, AC representing an angle classifier Angular classifier, LangularRepresenting the ArcFace Loss function, is selected based on the Angular interval Loss function Angular Margin Loss.
And acquiring the global loss function of the pedestrian re-identification model according to the local loss function, the free Euclidean space characterization loss function and the hypersphere characterization loss function.
In the embodiment of the present application, when the local loss function is the attribute loss function, the local loss function is determined according to the attribute loss function LattributeFree Euclidean space characterization loss function LFreeEucSaid hypersphere characterizing loss function LSphereTo obtain a global penalty function LtotalThe expression of (c) is as follows:
Ltotal=Lattribute+LFreeEuc+LSphere
and S150, performing iterative training on the pedestrian re-recognition model according to the local loss function and the global loss function in a back propagation mode to obtain the trained pedestrian re-recognition model. In the embodiment of the application, when the six local features of the human body are related features of the head, the shoulders, the abdomen, the thigh, the shank, the shoes and the like of the human body, the pedestrian weight recognition model may be composed of a hierarchical self-attention network model, a self-adaptive global average pooling layer and 6 PFC classifiers; the pedestrian re-recognition model can be iteratively trained according to a local loss function until the numerical value of the local loss function is not reduced any more, and then the training of the pedestrian re-recognition model can be stopped; when the six human body local features are appearance feature information of human bodies, such as backpack color, hat color, jacket style, jacket color, clothing style, clothing color and the like, the pedestrian re-identification model can be composed of a hierarchical self-attention network model, a self-adaptive global average pooling layer, 6 PAC classifiers and an ID representation learning module; the pedestrian re-recognition model can be iteratively trained according to the global loss function until the numerical value of the global loss function does not decrease any more, and then the training of the pedestrian re-recognition model can be stopped.
In the embodiment of the application, the introduced free Euclidean space characterization loss function and the hypersphere characterization loss function can directly perform measurement learning on the human body global features obtained from the hierarchical self-attention network model, so that a better global loss function is obtained, and when the gradient is reversely propagated, the gradient can directly act on the parameters of the last layer of the hierarchical self-attention network model (namely the last stage of the hierarchical self-attention network model), so that the pedestrian re-identification model is more efficiently trained, and the pedestrian re-identification model has better characterization capability.
In the embodiment of the application, the pedestrian re-recognition model after training can directly recognize the pedestrian. The pedestrian re-recognition model after training can also obtain the human body global features of the pedestrians through the hierarchical self-attention network model, obtain the human body local features through the human body global features, and perform feature matching on the human body global features and the human body local features and the pedestrian image registry to obtain images and/or pedestrian IDs with similar features.
In the embodiment of the application, the training method of the pedestrian re-identification model comprises the steps of firstly collecting a human body data sample set of the pedestrian re-identification model; inputting the human body data sample set into a hierarchical self-attention network model in the pedestrian re-identification model, and outputting human body global features of the pedestrian re-identification model; then according to the human body global features, acquiring human body local features of the pedestrian re-identification model; secondly, acquiring a local loss function and a global loss function of the pedestrian re-identification model according to the human body global feature, the human body local feature, a classifier, a triple loss function, a class center loss function, a batch normalization algorithm and a loss function based on an angle interval in the pedestrian re-identification model; and finally, performing iterative training on the pedestrian re-recognition model in a back propagation mode according to the local loss function and the global loss function to obtain the trained pedestrian re-recognition model. The method of the embodiment of the application adopts a technical means different from the prior art, and uses a hierarchical self-attention network model as a backbone model, so that the problem that the global characteristics of a human body cannot be directly obtained in the prior art can be solved; iterative training can be carried out on the pedestrian weight recognition model according to the obtained local loss function and the obtained global loss function, so that the pedestrian weight recognition model can recognize the global features and the local features of the human body, and then the pedestrian can be recognized.
Fig. 4 is a schematic flow chart of a training method for a pedestrian re-identification model according to an embodiment of the present application. As shown in fig. 4, the method of the embodiment of the present application may include the following steps:
s210, acquiring an RGB image containing human body dimensions and width and height characteristics;
s211, carrying out image enhancement operation on the RGB image to obtain a human body data sample set of the pedestrian re-identification model;
s212, inputting the human body data sample set into a hierarchical self-attention network model in the pedestrian re-identification model, and outputting human body global features of the pedestrian re-identification model;
s213, inputting the human body global features into an adaptive global average pooling layer of the pedestrian re-identification model;
s214, cutting the human body global features output by the self-adaptive global average pooling layer into the human body local features along the depth direction;
s215, inputting the human body local features into the local classifier corresponding to the human body local features, and outputting the local feature class probability of the pedestrian re-identification model;
s216, calculating the local loss function of the pedestrian re-identification model according to the local feature class probability and the real label corresponding to the local feature class probability;
s217, extracting an anchor sample of the pedestrian re-identification model according to the human body global features;
s218, obtaining a category loss function of the pedestrian re-identification model according to the anchor sample, the homogeneous sample of the anchor sample, the heterogeneous sample of the anchor sample and the triple loss function;
s219, obtaining a distance loss function of the pedestrian re-identification model according to the anchor sample, the class center of the class to which the anchor sample belongs and the class center loss function;
s220, acquiring a free Euclidean space representation loss function of the pedestrian re-identification model according to the category loss function and the distance loss function;
s221, normalizing the dimension distribution of each feature channel of the human body global features through the batch normalization algorithm;
s222, inputting the human body global features after normalization into the angle classifier, and obtaining a final sample representation of the pedestrian re-identification model;
s223, calculating a hypersphere characterization loss function of the pedestrian re-identification model according to the final sample characterization and the loss function based on the angle interval;
s224, obtaining the global loss function of the pedestrian re-identification model according to the local loss function, the free Euclidean space characterization loss function and the hypersphere characterization loss function;
s225, performing iterative training on the pedestrian re-recognition model according to the local loss function and the global loss function in a back propagation mode to obtain the trained pedestrian re-recognition model;
in the embodiment of the application, the training method of the pedestrian re-identification model comprises the steps of firstly collecting a human body data sample set of the pedestrian re-identification model; inputting the human body data sample set into a hierarchical self-attention network model in the pedestrian re-identification model, and outputting the human body global features of the pedestrian re-identification model; then according to the human body global features, acquiring human body local features of the pedestrian re-identification model; secondly, according to the human body global features, the human body local features, a classifier, a triple loss function, a class center loss function, a batch normalization algorithm and a loss function based on angle intervals in the pedestrian re-identification model, obtaining a local loss function and a global loss function of the pedestrian re-identification model; and finally, performing iterative training on the pedestrian re-recognition model in a back propagation mode according to the local loss function and the global loss function to obtain the trained pedestrian re-recognition model. The method of the embodiment of the application adopts a technical means different from the prior art, and uses the hierarchical self-attention network model as the backbone model, so that the problem that the global characteristics of the human body cannot be directly obtained in the prior art can be solved; iterative training can be carried out on the pedestrian weight recognition model according to the obtained local loss function and the obtained global loss function, so that the pedestrian weight recognition model can recognize the global features and the local features of the human body, and then the pedestrian can be recognized.
The following are embodiments of the apparatus of the present invention that may be used to perform embodiments of the method of the present invention. For details which are not disclosed in the embodiments of the apparatus of the present invention, reference is made to the embodiments of the method of the present invention.
Fig. 5 is a schematic structural diagram of a training apparatus for a pedestrian re-identification model according to an exemplary embodiment of the present invention. The device 1 comprises: a data sample acquisition module 10, a global feature acquisition module 20, a local feature acquisition module 30, a loss function acquisition module 40, and a model training module 50.
The data sample acquisition module 10 is used for acquiring a human body data sample set of the pedestrian re-identification model;
a global feature obtaining module 20, configured to input the human body data sample set into a hierarchical self-attention network model in the pedestrian re-identification model, and output a human body global feature of the pedestrian re-identification model;
a local feature obtaining module 30, configured to obtain a human local feature of the pedestrian re-identification model according to the human global feature;
a loss function obtaining module 40, configured to obtain a local loss function and a global loss function of the pedestrian re-identification model according to the human body global feature, the human body local feature, and a classifier, a triple loss function, a class center loss function, a batch normalization algorithm, and a loss function based on an angle interval in the pedestrian re-identification model;
and the model training module 50 is configured to perform iterative training on the pedestrian re-recognition model according to the local loss function and the global loss function in a back propagation manner, so as to obtain the trained pedestrian re-recognition model.
It should be noted that, when the training device for a pedestrian re-recognition model provided in the foregoing embodiment executes a training method for a pedestrian re-recognition model, only the division of the above functional modules is taken as an example, and in practical applications, the above function distribution may be completed by different functional modules according to needs, that is, the internal structure of the device may be divided into different functional modules to complete all or part of the above described functions. In addition, the training device of the pedestrian re-identification model provided by the above embodiment and the training method embodiment of the pedestrian re-identification model belong to the same concept, and the implementation process is embodied in the method embodiment and is not described herein again.
The above-mentioned serial numbers of the embodiments of the present application are merely for description and do not represent the merits of the embodiments.
In the embodiment of the application, the training device of the pedestrian re-identification model firstly acquires a human body data sample set of the pedestrian re-identification model; inputting the human body data sample set into a hierarchical self-attention network model in the pedestrian re-identification model, and outputting human body global features of the pedestrian re-identification model; then according to the human body global features, obtaining human body local features of the pedestrian re-identification model; secondly, acquiring a local loss function and a global loss function of the pedestrian re-identification model according to the human body global feature, the human body local feature, a classifier, a triple loss function, a class center loss function, a batch normalization algorithm and a loss function based on an angle interval in the pedestrian re-identification model; and finally, performing iterative training on the pedestrian re-recognition model in a back propagation mode according to the local loss function and the global loss function to obtain the trained pedestrian re-recognition model. The method of the embodiment of the application adopts a technical means different from the prior art, and uses a hierarchical self-attention network model as a backbone model, so that the problem that the global characteristics of a human body cannot be directly obtained in the prior art can be solved; iterative training can be carried out on the pedestrian weight recognition model according to the obtained local loss function and the obtained global loss function, so that the pedestrian weight recognition model can recognize the global features and the local features of the human body, and then the pedestrian can be recognized.
The present invention also provides a computer readable medium, on which program instructions are stored, and the program instructions, when executed by a processor, implement a training method for a pedestrian re-identification model provided by the above-mentioned method embodiments.
The present invention also provides a computer program product containing instructions which, when run on a computer, cause the computer to perform a method of training a pedestrian re-identification model of the above-described method embodiments.
Please refer to fig. 6, which provides a schematic structural diagram of a terminal according to an embodiment of the present application. As shown in fig. 6, terminal 1000 can include: at least one processor 1001, at least one network interface 1004, a user interface 1003, memory 1005, at least one communication bus 1002.
Wherein a communication bus 1002 is used to enable connective communication between these components.
The user interface 1003 may include a Display screen (Display) and a Camera (Camera), and the optional user interface 1003 may also include a standard wired interface and a wireless interface.
The network interface 1004 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface), among others.
The Memory 1005 may include a Random Access Memory (RAM) or a Read-Only Memory (Read-Only Memory). Optionally, the memory 1005 includes a non-transitory computer-readable medium. The memory 1005 may be used to store an instruction, a program, code, a set of codes, or a set of instructions. The memory 1005 may include a stored program area and a stored data area, wherein the stored program area may store instructions for implementing an operating system, instructions for at least one function (such as a touch function, a sound playing function, an image playing function, etc.), instructions for implementing the various method embodiments described above, and the like; the storage data area may store data and the like referred to in the above respective method embodiments. The memory 1005 may optionally be at least one memory device located remotely from the processor 1001. As shown in fig. 6, a memory 1005, which is a kind of computer storage medium, may include therein an operating system, a network communication module, a user interface module, and a usability analysis application program of vehicle running trajectory data.
In the terminal 1000 shown in fig. 6, the user interface 1003 is mainly used as an interface for providing input for a user, and acquiring data input by the user; in turn, the processor 1001 may be configured to invoke a training application of the pedestrian re-recognition model stored in the memory 1005 and specifically perform the following operations:
collecting a human body data sample set of a pedestrian re-identification model;
inputting the human body data sample set into a hierarchical self-attention network model in the pedestrian re-identification model, and outputting human body global features of the pedestrian re-identification model;
acquiring the human body local features of the pedestrian re-identification model according to the human body global features;
obtaining a local loss function and a global loss function of the pedestrian re-identification model according to the human body global feature, the human body local feature, and a classifier, a triple loss function, a class center loss function, a batch normalization algorithm and a loss function based on an angle interval in the pedestrian re-identification model; the classifier comprises a local classifier and an angle classifier;
and performing iterative training on the pedestrian re-recognition model according to the local loss function and the global loss function in a back propagation mode to obtain the trained pedestrian re-recognition model.
In one embodiment, when executing the step of acquiring the human body data sample set of the pedestrian re-identification model, the processor 1001 specifically performs the following operations:
acquiring an RGB image containing human body dimensions and width and height characteristics;
and carrying out image enhancement operation on the RGB image to obtain a human body data sample set of the pedestrian re-identification model.
In one embodiment, when the processor 1001 executes the obtaining of the human body local feature of the pedestrian re-identification model according to the human body global feature, specifically execute the following operations:
inputting the human body global features into an adaptive global average pooling layer of the pedestrian re-identification model;
and cutting the human body global features output by the self-adaptive global average pooling layer into the human body local features along the depth direction.
In one embodiment, when the processor 1001 executes the obtaining of the local loss function and the global loss function of the pedestrian re-identification model according to the human body global feature, the human body local feature, and the classifier, the triplet loss function, the class-center loss function, the batch normalization algorithm and the loss function based on the angle interval in the pedestrian re-identification model, the following operations are specifically executed:
inputting the human body local features into the local classifier corresponding to the human body local features, and outputting the local feature class probability of the pedestrian re-identification model;
and calculating the local loss function of the pedestrian re-identification model according to the local feature class probability and the real label corresponding to the local feature class probability.
In one embodiment, when the processor 1001 executes the obtaining of the local loss function and the global loss function of the pedestrian re-identification model according to the human body global feature, the human body local feature, and the classifier, the triplet loss function, the class-center loss function, the batch normalization algorithm and the loss function based on the angle interval in the pedestrian re-identification model, the following operations are specifically executed:
acquiring a free Euclidean space representation loss function of the pedestrian re-identification model according to the human body global feature, the triple loss function and the class center loss function;
acquiring a hypersphere characterization loss function of the pedestrian re-identification model according to the human body global feature, the batch normalization algorithm, the angle classifier and the loss function based on the angle interval;
and acquiring the global loss function of the pedestrian re-identification model according to the local loss function, the free Euclidean space characterization loss function and the hypersphere characterization loss function.
In one embodiment, when the processor 1001 executes the free euclidean space characterization loss function of the pedestrian re-identification model obtained according to the human global features, the triplet loss function, and the class-center loss function, the following operations are specifically executed:
extracting an anchor sample of the pedestrian re-recognition model according to the human body global features;
obtaining a category loss function of the pedestrian re-identification model according to the anchor sample, the homogeneous sample of the anchor sample, the heterogeneous sample of the anchor sample and the triple loss function;
obtaining a distance loss function of the pedestrian re-identification model according to the anchor sample, the class center of the class to which the anchor sample belongs and the class center loss function;
and acquiring a free Euclidean space characterization loss function of the pedestrian re-identification model according to the category loss function and the distance loss function.
In one embodiment, when the processor 1001 executes the hypersphere characterizing loss function of the pedestrian re-identification model obtained according to the human body global features, the batch normalization algorithm, the angle classifier and the loss function based on the angle interval, the following operations are specifically executed:
normalizing the dimension distribution of each characteristic channel of the human body global characteristic through the batch normalization algorithm;
inputting the human body global features after normalization into the angle classifier, and obtaining a final sample representation of the pedestrian re-identification model;
and calculating a hypersphere characterization loss function of the pedestrian re-identification model according to the final sample characterization and the loss function based on the angle interval.
In the embodiment of the application, the training method of the pedestrian re-identification model comprises the steps of firstly collecting a human body data sample set of the pedestrian re-identification model; inputting the human body data sample set into a hierarchical self-attention network model in the pedestrian re-identification model, and outputting human body global features of the pedestrian re-identification model; then according to the human body global features, obtaining human body local features of the pedestrian re-identification model; secondly, acquiring a local loss function and a global loss function of the pedestrian re-identification model according to the human body global feature, the human body local feature, a classifier, a triple loss function, a class center loss function, a batch normalization algorithm and a loss function based on an angle interval in the pedestrian re-identification model; and finally, performing iterative training on the pedestrian re-recognition model in a back propagation mode according to the local loss function and the global loss function to obtain the trained pedestrian re-recognition model. The method of the embodiment of the application adopts a technical means different from the prior art, and uses the hierarchical self-attention network model as the backbone model, so that the problem that the global characteristics of the human body cannot be directly obtained in the prior art can be solved; iterative training can be carried out on the pedestrian weight recognition model according to the obtained local loss function and the obtained global loss function, so that the pedestrian weight recognition model can recognize the global features and the local features of the human body, and then the pedestrian can be recognized.
It can be understood by those skilled in the art that all or part of the processes in the methods of the embodiments described above can be implemented by a computer program that can be stored in a computer readable storage medium and that can be executed by a computer program that instructs related hardware to implement the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a read-only memory or a random access memory.
The above disclosure is only for the purpose of illustrating the preferred embodiments of the present application and should not be taken as limiting the scope of the present application, so that the present application will be covered by the appended claims.
Claims (10)
1. A training method of a pedestrian re-recognition model is characterized by comprising the following steps:
collecting a human body data sample set of a pedestrian re-identification model;
inputting the human body data sample set into a hierarchical self-attention network model in the pedestrian re-identification model, and outputting human body global features of the pedestrian re-identification model;
acquiring the human body local features of the pedestrian re-identification model according to the human body global features;
acquiring a local loss function and a global loss function of the pedestrian re-identification model according to the human body global feature, the human body local feature, and a classifier, a triple loss function, a class center loss function, a batch normalization algorithm and a loss function based on an angle interval in the pedestrian re-identification model;
and performing iterative training on the pedestrian re-recognition model according to the local loss function and the global loss function in a back propagation mode to obtain the trained pedestrian re-recognition model.
2. The training method of claim 1, wherein the collecting a human body data sample set of a pedestrian re-identification model comprises:
acquiring an RGB image containing human body scale and width and height characteristics;
and carrying out image enhancement operation on the RGB image to obtain a human body data sample set of the pedestrian re-identification model.
3. The training method according to claim 1, wherein the obtaining of the human body local features of the pedestrian re-recognition model according to the human body global features comprises:
inputting the human body global features into an adaptive global average pooling layer of the pedestrian re-identification model;
and cutting the human body global features output by the self-adaptive global average pooling layer into the human body local features along the depth direction.
4. The training method of claim 1, wherein the classifier comprises a local classifier and an angle classifier;
the obtaining of the local loss function and the global loss function of the pedestrian re-identification model according to the human global feature, the human local feature, and the classifier, the triple loss function, the class center loss function, the batch normalization algorithm and the loss function based on the angle interval in the pedestrian re-identification model includes:
inputting the human body local features into the local classifier corresponding to the human body local features, and outputting the local feature class probability of the pedestrian re-identification model;
and calculating the local loss function of the pedestrian re-identification model according to the local feature class probability and the real label corresponding to the local feature class probability.
5. The training method according to claim 4, wherein the obtaining of the local loss function and the global loss function of the pedestrian re-identification model according to the human body global features, the human body local features, and the classifier, the triple loss function, the class-center loss function, the batch normalization algorithm and the angle-interval-based loss function in the pedestrian re-identification model comprises:
acquiring a free Euclidean space representation loss function of the pedestrian re-identification model according to the human body global feature, the triple loss function and the class center loss function;
acquiring a hypersphere characterization loss function of the pedestrian re-identification model according to the human body global feature, the batch normalization algorithm, the angle classifier and the loss function based on the angle interval;
and acquiring the global loss function of the pedestrian re-identification model according to the local loss function, the free Euclidean space characterization loss function and the hypersphere characterization loss function.
6. The training method according to claim 5, wherein the obtaining of the free Euclidean spatial characterization loss function of the pedestrian re-identification model according to the human body global features, the triple loss function and the class-center loss function comprises:
extracting an anchor sample of the pedestrian re-identification model according to the human body global features;
obtaining a class loss function of the pedestrian re-identification model according to the anchor sample, the homogeneous sample of the anchor sample, the heterogeneous sample of the anchor sample and the triple loss function;
obtaining a distance loss function of the pedestrian re-identification model according to the anchor sample, the class center of the class to which the anchor sample belongs and the class center loss function;
and acquiring a free Euclidean space characterization loss function of the pedestrian re-identification model according to the category loss function and the distance loss function.
7. The training method according to claim 5, wherein the obtaining of the hypersphere characterization loss function of the pedestrian re-identification model according to the human global features, the batch normalization algorithm, the angle classifier and the loss function based on the angle interval comprises:
normalizing the dimension distribution of each characteristic channel of the human body global characteristic through the batch normalization algorithm;
inputting the human body global features after normalization into the angle classifier, and obtaining a final sample representation of the pedestrian re-identification model;
and calculating a hypersphere characterization loss function of the pedestrian re-identification model according to the final sample characterization and the loss function based on the angle interval.
8. A training device for a pedestrian re-identification model is characterized by comprising:
the data sample acquisition module is used for acquiring a human body data sample set of the pedestrian re-identification model;
the global feature acquisition module is used for inputting the human body data sample set into a hierarchical self-attention network model in the pedestrian re-identification model and outputting the human body global features of the pedestrian re-identification model;
the local characteristic acquisition module is used for acquiring the human body local characteristics of the pedestrian re-identification model according to the human body global characteristics;
a loss function obtaining module, configured to obtain a local loss function and a global loss function of the pedestrian re-identification model according to the human body global feature, the human body local feature, and a classifier, a triple loss function, a class center loss function, a batch normalization algorithm and a loss function based on an angle interval in the pedestrian re-identification model;
and the model training module is used for performing iterative training on the pedestrian re-recognition model according to the local loss function and the global loss function in a back propagation mode to obtain the trained pedestrian re-recognition model.
9. A computer storage medium, characterized in that it stores a plurality of instructions adapted to be loaded by a processor and to perform the method steps according to any one of claims 1 to 7.
10. A terminal, comprising: a processor and a memory; wherein the memory stores a computer program adapted to be loaded by the processor and to perform the method steps of any of claims 1-7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210204501.6A CN114782979A (en) | 2022-03-02 | 2022-03-02 | Training method and device for pedestrian re-recognition model, storage medium and terminal |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210204501.6A CN114782979A (en) | 2022-03-02 | 2022-03-02 | Training method and device for pedestrian re-recognition model, storage medium and terminal |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114782979A true CN114782979A (en) | 2022-07-22 |
Family
ID=82423806
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210204501.6A Pending CN114782979A (en) | 2022-03-02 | 2022-03-02 | Training method and device for pedestrian re-recognition model, storage medium and terminal |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114782979A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115661909A (en) * | 2022-12-14 | 2023-01-31 | 深圳大学 | Face image processing method, device and computer readable storage medium |
CN116503914A (en) * | 2023-06-27 | 2023-07-28 | 华东交通大学 | Pedestrian re-recognition method, system, readable storage medium and computer equipment |
-
2022
- 2022-03-02 CN CN202210204501.6A patent/CN114782979A/en active Pending
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115661909A (en) * | 2022-12-14 | 2023-01-31 | 深圳大学 | Face image processing method, device and computer readable storage medium |
CN116503914A (en) * | 2023-06-27 | 2023-07-28 | 华东交通大学 | Pedestrian re-recognition method, system, readable storage medium and computer equipment |
CN116503914B (en) * | 2023-06-27 | 2023-09-01 | 华东交通大学 | Pedestrian re-recognition method, system, readable storage medium and computer equipment |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107292256B (en) | Auxiliary task-based deep convolution wavelet neural network expression recognition method | |
CN109685819B (en) | Three-dimensional medical image segmentation method based on feature enhancement | |
CN112800903B (en) | Dynamic expression recognition method and system based on space-time diagram convolutional neural network | |
WO2020114118A1 (en) | Facial attribute identification method and device, storage medium and processor | |
WO2017113232A1 (en) | Product classification method and apparatus based on deep learning | |
CN110428428A (en) | A kind of image, semantic dividing method, electronic equipment and readable storage medium storing program for executing | |
CN108830237B (en) | Facial expression recognition method | |
CN108960059A (en) | A kind of video actions recognition methods and device | |
CN105528575B (en) | Sky detection method based on Context Reasoning | |
CN110929593A (en) | Real-time significance pedestrian detection method based on detail distinguishing and distinguishing | |
CN109903339B (en) | Video group figure positioning detection method based on multi-dimensional fusion features | |
CN104063721B (en) | A kind of human behavior recognition methods learnt automatically based on semantic feature with screening | |
CN113762138B (en) | Identification method, device, computer equipment and storage medium for fake face pictures | |
CN107066916A (en) | Scene Semantics dividing method based on deconvolution neutral net | |
CN108960288B (en) | Three-dimensional model classification method and system based on convolutional neural network | |
CN111597870A (en) | Human body attribute identification method based on attention mechanism and multi-task learning | |
CN114782979A (en) | Training method and device for pedestrian re-recognition model, storage medium and terminal | |
CN107025444A (en) | Piecemeal collaboration represents that embedded nuclear sparse expression blocks face identification method and device | |
CN113298018A (en) | False face video detection method and device based on optical flow field and facial muscle movement | |
CN112132145A (en) | Image classification method and system based on model extended convolutional neural network | |
CN113033321A (en) | Training method of target pedestrian attribute identification model and pedestrian attribute identification method | |
CN111815582B (en) | Two-dimensional code region detection method for improving background priori and foreground priori | |
CN110211127A (en) | Image partition method based on bicoherence network | |
CN113221770A (en) | Cross-domain pedestrian re-identification method and system based on multi-feature hybrid learning | |
CN114758382B (en) | Face AU detection model building method based on self-adaptive patch learning and application |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |