CN115393953B - Pedestrian re-recognition method, device and equipment based on heterogeneous network feature interaction - Google Patents

Pedestrian re-recognition method, device and equipment based on heterogeneous network feature interaction Download PDF

Info

Publication number
CN115393953B
CN115393953B CN202210897792.1A CN202210897792A CN115393953B CN 115393953 B CN115393953 B CN 115393953B CN 202210897792 A CN202210897792 A CN 202210897792A CN 115393953 B CN115393953 B CN 115393953B
Authority
CN
China
Prior art keywords
pedestrian
recognition
heterogeneous
branch
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210897792.1A
Other languages
Chinese (zh)
Other versions
CN115393953A (en
Inventor
连国云
李焱超
张文宇
杨金锋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Polytechnic
Original Assignee
Shenzhen Polytechnic
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Polytechnic filed Critical Shenzhen Polytechnic
Priority to CN202210897792.1A priority Critical patent/CN115393953B/en
Priority to PCT/CN2022/121269 priority patent/WO2024021283A1/en
Publication of CN115393953A publication Critical patent/CN115393953A/en
Application granted granted Critical
Publication of CN115393953B publication Critical patent/CN115393953B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • G06V40/23Recognition of whole body movements, e.g. for sport training
    • G06V40/25Recognition of walking or running movements, e.g. gait recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • G06V10/443Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
    • G06V10/449Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
    • G06V10/451Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
    • G06V10/454Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Human Computer Interaction (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Social Psychology (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • Biodiversity & Conservation Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Psychiatry (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a pedestrian re-identification method, device and equipment based on heterogeneous network feature interaction, belonging to the technical field of image processing, wherein the method comprises the following steps: designing a pedestrian re-identification initial model based on heterogeneous network characteristics of a convolutional neural network and a visual transformer; calculating a loss value of the pedestrian re-recognition initial model based on the double loss, determining that the pedestrian re-recognition initial model converges and stopping training based on the loss value, and obtaining a pedestrian re-recognition model; and re-identifying the target pedestrian image based on the pedestrian re-identification model. The pedestrian re-recognition model is built based on the heterogeneous network characteristics of the convolutional neural network and the visual transformer, and the shallow characteristic characteristics and the deep characteristic characteristics are fused, so that the basic characteristics of the image can be utilized, the global characteristics of the image can be utilized, a large number of image characteristics can be obtained, and the recognition result is more accurate.

Description

Pedestrian re-recognition method, device and equipment based on heterogeneous network feature interaction
Technical Field
The present invention relates to the field of image processing technologies, and in particular, to a pedestrian re-recognition method, device and equipment based on heterogeneous network feature interaction.
Background
As a biological recognition technology, pedestrian Re-recognition (ReID) is different from unique marks such as face recognition, iris recognition and fingerprint recognition, and mainly depends on the apparent characteristics of pedestrians, and is closely related to the appearance characteristics such as clothing, gestures and the like, so that the pedestrian Re-recognition has wider prospects in applications such as investigation security and pedestrian behavior analysis.
Since pedestrians can freely go in and out within the range of the camera, the camera is not limited by any condition. The captured pedestrian images are seriously affected by the differences of the shooting visual angles, illumination changes, object shielding, noisy backgrounds and other environmental factors, so that the appearance characteristics (resolution, gesture and the like) of the same pedestrian in different images are greatly different, and even the complete pedestrian images cannot be captured due to the object shielding. In addition, there are many visual similarities between different individuals in terms of clothing color, style, etc., which add more difficulty to the identification.
The current pedestrian re-recognition technology mainly comprises the steps of directly inputting an image into a convolutional neural network to obtain global features of pedestrians, and using the global features to recognize the pedestrians. However, due to factors such as noise, insufficient number of features, and the like, a good recognition result is often not obtained. Although there are some improved methods, in practical applications it is difficult to obtain sufficient pose-tagged character images and powerful pose estimates. The improved approach is easily limited by the additional auxiliary model. The rough division mode cannot guarantee the effectiveness of dividing blocks, and effective blocks cannot be filtered, so that the judgment of outliers is very easy to generate errors. The attention obtained by the convolution operation ignores global information and implicit relationships of the image, thereby limiting exploration of correlations between model learning features.
Disclosure of Invention
The invention provides a pedestrian re-recognition method, device, equipment and storage medium based on heterogeneous network feature interaction, and aims to solve the technical problem of poor effect caused by pedestrian re-recognition by using global features and improve the accuracy of pedestrian re-recognition.
In order to achieve the above purpose, the present invention provides a pedestrian re-recognition method based on heterogeneous network feature interaction, the method comprising:
designing a pedestrian re-identification initial model based on heterogeneous network characteristics of a convolutional neural network and a visual transformer;
calculating a loss value of the pedestrian re-recognition initial model based on the double loss, determining that the pedestrian re-recognition initial model converges and stopping training based on the loss value, and obtaining a pedestrian re-recognition model;
re-identifying the target pedestrian image based on the pedestrian re-identification model
Optionally, the step of designing the pedestrian re-recognition initial model based on the heterogeneous network characteristics of the convolutional neural network and the visual transformer includes:
constructing a convolutional neural network branch of the pedestrian re-recognition initial model; and
constructing a visual transformer branch of the pedestrian re-identification initial model;
and fusing the shallow heterogeneous characteristics of the convolutional neural network branch and the deep heterogeneous characteristics of the visual transformer branch to obtain the pedestrian re-identification initial model.
Optionally, the constructing the visual transformer branch of the pedestrian re-recognition initial model includes:
representing an input pedestrian image as an image block sequence including a plurality of image blocks;
performing linear mapping on the image block sequence to obtain a plurality of D-dimensional embedded representations of the image blocks;
concatenating a class token with a plurality of said D-dimensional embedded representations and adding a position code and a camera code for each of said image blocks to produce a sequence of embedded image blocks;
and sequentially processing the embedded image block sequence through normalization, a multi-head attention mechanism and a multi-layer perceptron to obtain the visual transformer branch.
Optionally, the fusing the shallow heterogeneous feature of the convolutional neural network branch and the deep heterogeneous feature of the vision transformer branch includes:
and transforming the three-dimensional shallow heterogeneous characteristics of the convolutional neural network branch into two dimensions through convolution of 1 multiplied by 1, carrying out global average pooling operation on the shallow heterogeneous characteristics of the convolutional neural network branch to reserve focal characteristics, and flowing the focal characteristics into the vision transformer branch.
Optionally, the fusing the shallow heterogeneous feature of the convolutional neural network branch and the deep heterogeneous feature of the vision transformer branch further includes:
carrying out dimension alignment on deep heterogeneous features of the visual transformer branch through 1 multiplied by 1 convolution to obtain three-dimensional deep heterogeneous features, carrying out normalization processing on the three-dimensional deep heterogeneous features, carrying out feature resolution alignment on the basis of interpolation to obtain features to be exchanged, and flowing the features to be exchanged into the convolution neural network branch;
and splicing the global feature vectors obtained by the convolutional neural network branch and the visual transformer branch to obtain the pedestrian re-identification feature vector.
Optionally, the calculating the loss value of the pedestrian re-recognition initial model based on the double loss, determining that the pedestrian re-recognition initial model converges and stops training based on the loss value includes:
setting a first classifier for calculating a branch loss function of the convolutional neural network, and setting a second classifier for calculating a branch loss function of the visual transformer;
and determining that the pedestrian re-recognition initial model converges and stopping training based on the sum of the first loss function calculated by the first classifier and the second loss function obtained by the second classifier.
Optionally, the re-identifying the target pedestrian image based on the pedestrian re-identification model includes:
performing similarity measurement on the characteristics of the target pedestrian image and the plurality of candidate pedestrian images based on the pedestrian re-recognition model to obtain a recognition distance matrix;
the candidate pedestrian in the candidate pedestrian image corresponding to the minimum recognition distance matrix is determined as the target pedestrian.
Optionally, after calculating the loss value of the pedestrian re-recognition initial model based on the double loss, determining that the pedestrian re-recognition initial model converges and stops training based on the loss value, and obtaining the pedestrian re-recognition model, the method further includes:
testing the pedestrian re-identification model to obtain an evaluation index;
and carrying out a comparison experiment on the network structure parameters of the pedestrian re-recognition model based on the evaluation index, and determining target network structure parameters so as to optimize the pedestrian re-recognition model based on the target network structure parameters.
The embodiment of the invention also provides a pedestrian re-identification device based on heterogeneous network feature interaction, which comprises:
the model construction module is used for designing a pedestrian re-identification initial model based on the heterogeneous network characteristics of the convolutional neural network and the visual transformer;
the calculation module is used for calculating the loss value of the pedestrian re-recognition initial model based on the double loss, determining that the pedestrian re-recognition initial model converges and stopping training based on the loss value, and obtaining a pedestrian re-recognition model;
and the re-recognition module is used for re-recognizing the target pedestrian image based on the pedestrian re-recognition model.
The embodiment of the invention also provides pedestrian re-recognition equipment based on the heterogeneous network characteristic interaction, which comprises a memory, a processor and a pedestrian re-recognition program based on the heterogeneous network characteristic interaction stored in the memory, wherein the pedestrian re-recognition program based on the heterogeneous network characteristic interaction realizes the steps of the pedestrian re-recognition method based on the heterogeneous network characteristic interaction when being run by the processor.
Compared with the prior art, the pedestrian re-recognition method, the device, the equipment and the storage medium based on the heterogeneous network feature interaction provided by the invention design a pedestrian re-recognition initial model based on the heterogeneous network features of the convolutional neural network and the visual transformer; calculating a loss value of the pedestrian re-recognition initial model based on the double loss, determining that the pedestrian re-recognition initial model converges and stopping training based on the loss value, and obtaining a pedestrian re-recognition model; and re-identifying the target pedestrian image based on the pedestrian re-identification model. The pedestrian re-recognition model is built based on the heterogeneous network characteristics of the convolutional neural network and the visual transformer, and the shallow characteristic characteristics and the deep characteristic characteristics are fused, so that the basic characteristics of the image can be utilized, the global characteristics of the image can be utilized, a large number of image characteristics can be obtained, and the recognition result is more accurate.
Drawings
FIG. 1 is a schematic hardware architecture of a pedestrian re-recognition device based on heterogeneous network feature interaction according to embodiments of the present invention;
FIG. 2 is a schematic flow chart of a first embodiment of a pedestrian re-recognition method based on heterogeneous network feature interaction of the present invention;
FIG. 3 is a flow chart of a second embodiment of the pedestrian re-recognition method based on heterogeneous network feature interaction of the present invention;
FIG. 4 is a schematic diagram of a pedestrian re-recognition model involved in the pedestrian re-recognition method based on heterogeneous network feature interaction of the invention;
FIG. 5 is a schematic diagram of feature interactions involved in the pedestrian re-recognition method based on heterogeneous network feature interactions of the present invention;
FIG. 6 is a flow chart of a third embodiment of a pedestrian re-recognition method based on heterogeneous network feature interaction of the present invention;
FIG. 7 is a flow chart of a fourth embodiment of a pedestrian re-recognition method based on heterogeneous network feature interaction of the present invention;
FIG. 8 is a flowchart of a fifth embodiment of a pedestrian re-recognition method based on heterogeneous network feature interaction of the present invention;
fig. 9 is a schematic functional block diagram of a first embodiment of the pedestrian re-recognition device based on heterogeneous network feature interaction according to the present invention.
The achievement of the objects, functional features and advantages of the present invention will be further described with reference to the accompanying drawings, in conjunction with the embodiments.
Detailed Description
It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
The pedestrian re-identification device based on heterogeneous network feature interaction mainly related to the embodiment of the invention refers to network connection equipment capable of realizing network connection, and the pedestrian re-identification device based on heterogeneous network feature interaction can be a server, a cloud platform and the like.
Referring to fig. 1, fig. 1 is a schematic hardware structure of a pedestrian re-recognition device based on heterogeneous network feature interaction according to embodiments of the present invention. In an embodiment of the present invention, the pedestrian re-recognition device based on heterogeneous network feature interaction may include a processor 1001 (e.g. a central processing unit Central Processing Unit, a CPU), a communication bus 1002, an input port 1003, an output port 1004, and a memory 1005. Wherein the communication bus 1002 is used to enable connected communications between these components; the input port 1003 is used for data input; the output port 1004 is used for data output, and the memory 1005 may be a high-speed RAM memory or a stable memory (non-volatile memory), such as a disk memory, and the memory 1005 may be an optional storage device independent of the processor 1001. Those skilled in the art will appreciate that the hardware configuration shown in fig. 1 is not limiting of the invention and may include more or fewer components than shown, or may combine certain components, or a different arrangement of components.
With continued reference to FIG. 1, the memory 1005 of FIG. 1, which is a readable storage medium, may include an operating system, a network communication module, an application module, and a pedestrian re-recognition program based on heterogeneous network feature interactions. In fig. 1, the network communication module is mainly used for connecting with a server and performing data communication with the server; and the processor 1001 is configured to invoke the pedestrian re-recognition program based on heterogeneous network feature interaction stored in the memory 1005, and perform the following operations:
designing a pedestrian re-identification initial model based on heterogeneous network characteristics of a convolutional neural network and a visual transformer;
calculating a loss value of the pedestrian re-recognition initial model based on the double loss, determining that the pedestrian re-recognition initial model converges and stopping training based on the loss value, and obtaining a pedestrian re-recognition model;
and re-identifying the target pedestrian image based on the pedestrian re-identification model.
The pedestrian re-recognition device based on the heterogeneous network feature interaction provides a first embodiment of the pedestrian re-recognition method based on the heterogeneous network feature interaction. Referring to fig. 2, fig. 2 is a flowchart of a first embodiment of a pedestrian re-recognition method based on heterogeneous network feature interaction according to the present invention.
As shown in fig. 1, a first embodiment of the present invention proposes a pedestrian re-recognition method based on heterogeneous network feature interaction, where the method is applied to a pedestrian re-recognition device based on heterogeneous network feature interaction, and the method includes:
step S101, designing a pedestrian re-identification initial model based on heterogeneous network characteristics of a convolutional neural network and a visual transformer;
deep learning represented by convolutional neural networks (Convolutional Neural Network, CNN) has been largely successful in the field of computer vision. The shallow layer of the convolutional neural network is good at extracting basic features in the image, such as the edge contour and color features of pedestrians. With the stacking of network layers, the deep layers of the network gradually extract abstract semantic information. Shallow feature maps are critical to the generation of deep feature maps, which rely on shallow feature maps. If some fine granularity information is ignored in the shallow network, the characteristics of the last layer of the network can not fully express pedestrian characteristics, and the model is easy to encounter bottlenecks. Such network models tend to focus only on the portion of the image that contributes most to recognition performance, without considering all the information of pedestrians; importantly, the ignored portions often also have identification value.
The vision transformer (Vision Transformer, viT) achieves good performance in a variety of vision tasks, which can create a flexible and dynamic feel through the stacking of self-attention modules, focusing attention on a target area. The multi-head self-attention can adaptively pay attention to various information of different areas from the global angle, and provides richer characteristic information for the re-identification task. Vision Transformer is susceptible to problems of loss of detail information during dicing, and cannot capture fine-grained information. It may result in a limit on the ability to discriminate pedestrians with similar attributes. In addition, vision Transformer lacks generalized bias for locality, isodegeneration, etc.
Based on the advantages and disadvantages of CNN and ViT, the pedestrian re-recognition initial model is designed based on the heterogeneous network characteristics of the convolutional neural network and the visual transformer. The pedestrian re-identification initial model comprises a convolutional neural network branch and a visual transformer branch, wherein the two branches are mutually connected and independently run, and relevant features of images are respectively extracted and fused. Specifically, the training image set is respectively input into a convolutional neural network module and a visual transformer network module, relevant features are extracted, and training is performed based on preset initial parameters so as to obtain a pedestrian re-identification initial model.
Step S102, calculating a loss value of the pedestrian re-recognition initial model based on double loss, determining that the pedestrian re-recognition initial model converges and stopping training based on the loss value, and obtaining a pedestrian re-recognition model;
the present embodiment determines whether the model converges based on the loss function. Because the embodiment comprises the convolutional neural network branch and the visual transformer branch, and the characteristic preference extracted by each branch is different, the invention respectively constrains the characteristics of the two branches in order to ensure the training effect of the heterogeneous network.
Specifically, the loss functions of the convolutional neural network branch and the visual transformer branch are calculated respectively, the final loss function is the sum of the loss functions of the two branches, when the final loss function is minimum, model convergence is determined to stop training, corresponding model parameters are stored, and a pedestrian re-identification model is obtained.
And step S103, re-identifying the target pedestrian image based on the pedestrian re-identification model.
After the pedestrian re-recognition model is obtained, the target pedestrian image and the candidate pedestrian image are input into the pedestrian re-recognition model, the feature extraction is carried out by the pedestrian re-recognition model, and then the candidate pedestrian image closest to the target pedestrian feature is determined from the candidate pedestrian images, namely, the pedestrian re-recognition result is obtained.
Through the scheme, the pedestrian re-recognition initial model is designed based on the heterogeneous network characteristics of the convolutional neural network and the visual transformer; calculating a loss value of the pedestrian re-recognition initial model based on the double loss, determining that the pedestrian re-recognition initial model converges and stopping training based on the loss value, and obtaining a pedestrian re-recognition model; and re-identifying the target pedestrian image based on the pedestrian re-identification model. The pedestrian re-recognition model is built based on the heterogeneous network characteristics of the convolutional neural network and the visual transformer, and the shallow characteristic characteristics and the deep characteristic characteristics are fused, so that the basic characteristics of the image can be utilized, the global characteristics of the image can be utilized, a large number of image characteristics can be obtained, and the recognition result is more accurate.
As shown in fig. 3, a second embodiment of the present invention provides a pedestrian re-recognition method based on heterogeneous network feature interaction, based on the first embodiment shown in fig. 1, the step of designing a pedestrian re-recognition initial model based on the heterogeneous network features of the convolutional neural network and the visual transformer includes:
step S1011, constructing a convolution neural network branch of the pedestrian re-recognition initial model; and
the CNN branch adopts a characteristic pyramid structure to give a pedestrian imageWhere W, H and C represent the width, height and number of channels of the image, respectively. Feature resolution decreases with increasing network depth while the number of channels C increases.
Dividing the branch of the convolutional neural network into three stages, and respectively representing the three stages as a first Stage 1 Stage two 2 And third Stage 3 As shown in fig. 4, fig. 4 is a schematic diagram of a pedestrian re-recognition model related to the pedestrian re-recognition method based on heterogeneous network feature interaction. Each Stage is composed of a different number of residual blocks, stage in this embodiment 1 To Stage 3 The residual blocks of (1) are set to 7, 8, respectively, in this order. Except Stage 1 The first residual block in (a), the latter residual blocks are all present in pairs. The feature resolution drops 2-fold from stage to stage while the channel number rises 2-fold from stage to stage. The outputs of the same stage residual blocks have the same feature resolution. Stage branching on CNN 3 The feature map output in the stage is subjected to convolution, normalization and global average pooling operation to generate a group of 768-dimensional global feature vectors.
Step S1012, constructing a visual transformer branch of the pedestrian re-identification initial model;
specifically, an input pedestrian image is represented as an image block sequence including a plurality of image blocks;
the visual Transformer branch consists of L Transformer (Transformer) modules, each Transformer module being identical in structure. The input pedestrian image is represented as a sequence { x } of N image blocks i I=1, 2, …, N }, per tileWhere p represents the spatial dimension of each tile.
Performing linear mapping on the image block sequence to obtain a plurality of D-dimensional embedded representations of the image blocks;
in this embodiment a learning embedding matrix is usedThe image block is projected into a D-dimensional embedded representation.
Concatenating a class token with a plurality of said D-dimensional embedded representations and adding a position code and a camera code for each of said image blocks to produce a sequence of embedded image blocks;
specifically, a token classken x which can be learned cls Can be used as a discrimination representation to be concatenated with the D-dimensional embedded representation of the N image blocks described above. And additionally learning spatial information and camera information by adding position coding (PositionEmbedding, PE) and camera coding (CameraEmbedding, CE) to each image block to form a final embedded image block sequence z 0 The method comprises the following steps:
in the formula (1)
And sequentially processing the embedded image block sequence through normalization, a multi-head attention mechanism and a multi-layer perceptron to obtain the visual transformer branch.
Embedding the represented image block z 0 Are sequentially fed into N transducer modules, which are sequentially processed by layer normalization (LayerNorm, LN), multi-headed Attention Mechanism (MSA), and multi-layer perceptron (Multilayer Perceptron, MLP), as shown in equations (2), (3). Residual connection is carried out on the input and the output of the multi-head attention layer, and normalization processing is carried out. MLP is two-layer, with the activation function being a GELU function.
z′ l =MSA(LN(z l-1 ))+z l-1 l=1,…,L
(2)
z l =MLP(LN(z′ l ))+z′ l l=1,…,L
(3)
Multi-headed attention can be expressed as:
MSA(z)=Concat(H 1 ,H 2 ,…,H h )W o
(4)
in (4)h is the number of heads, concat (-) represents stacking over the embedded representation dimension of the image block. Each head H i (i=1, 2, …, h) can be expressed as:
H i =Attention(Q,K,V) (5)
wherein the method comprises the steps of
In the formula (5), Q, K and V are obtained by inputting three linear mappings into zThe linear mapping can be expressed asThese are three completely different matrices, projecting the inputs into different spaces, so the expressive power is much higher. Attention (-) represents the Attention process and is a function used to calculate the relevance and importance of image blocks. D in formula (6) k =d v =d/h, where ∈>The scaling factor is a normalization operation for realizing the numerical stability.
Step S1013, fusing the shallow heterogeneous characteristic of the convolutional neural network branch and the deep heterogeneous characteristic of the vision transformer branch to obtain the pedestrian re-recognition initial model.
Features generated by the CNN branch and ViT branch are heterogeneous, both in feature dimension and semantically. In order to fuse two heterogeneous features, the present invention discards the strategy of very extremely fusing all layers, but applies the heterogeneous feature fusion module shown in fig. 4 to shallow features (Stage 1 ) And deep features (Stage) 3 )。
Specifically, three-dimensional shallow heterogeneous characteristics of a convolutional neural network branch are transformed into two dimensions through convolution of 1×1, global average pooling operation is carried out on the shallow heterogeneous characteristics of the convolutional neural network branch, focal characteristics are reserved, and the focal characteristics are flowed into the vision transformer branch;
the features extracted by the CNN branch are dense, with the dimensions being three-dimensional, and the embedded dimensions of the ViT branch being two-dimensional. When CNN branch features flow into ViT, the channel dimensions need to be transformed with a 1 x 1 convolution first to obtain features of the same dimension. The significance of the features of the convolutional neural network middle layer in the channel dimension is understood to be the response of the input image to the different modes. Unlike CNN learning strategy, the features extracted by ViT and CNN branches have semantic gaps, viT can see the whole world from the beginning, so that the focus features of some most-expressed images are reserved for the features of the CNN branches by using global average pooling operation, and are collected into ViT branches after being subjected to layer normalization treatment, so that the fine-grained features of key areas are supplemented.
Carrying out dimension alignment on deep heterogeneous features of the visual transformer branch through 1 multiplied by 1 convolution to obtain three-dimensional deep heterogeneous features, carrying out normalization processing on the three-dimensional deep heterogeneous features, carrying out feature resolution alignment on the basis of interpolation to obtain features to be exchanged, and flowing the features to be exchanged into the convolution neural network branch;
when the representation in ViT is fed back to the CNN branch, the channel alignment is realized by 1×1 convolution, then the characteristic spatial resolution is aligned with the CNN characteristic by interpolation after batch normalization processing, and then the characteristic spatial resolution is added into the characteristic diagram of the CNN branch. The ViT branch representation and the CNN branch feature are interacted through the feature interaction module, so that the high-response feature of the target pedestrian is further accurately positioned on the basis of obtaining the complete feature of the target pedestrian, the final feature can have more information quantity, and the generalization capability of the feature can be improved. Through shallow layer characteristic coupling module and deep layer characteristic coupling module, can assemble the various difference characteristics of pedestrian structure, enrich final pedestrian characteristic representation.
Fig. 5 is a schematic diagram of feature interaction related to the pedestrian re-recognition method based on heterogeneous network feature interaction. Fig. 5 shows the flow of feature interactions in its entirety: the shallow heterogeneous characteristics of the CNN branch are converged into a ViT branch through a shallow characteristic interaction module; and the deep heterogeneous features of the ViT branch are converged into the CNN branch through a deep feature interaction module.
And splicing the global feature vectors obtained by the convolutional neural network branch and the visual transformer branch to obtain the pedestrian re-identification feature vector. Finally, 1536-dimensional pedestrian re-recognition feature vectors can be obtained.
According to the scheme, the pedestrian re-recognition initial model is designed based on the heterogeneous network characteristics of the convolutional neural network and the visual transformer, so that characteristic interaction is realized, more pedestrian characteristic representations are obtained, and the pedestrian re-recognition accuracy is improved.
Fig. 6 is a schematic flow chart of a third embodiment of the pedestrian re-recognition method based on heterogeneous network feature interaction, and as shown in fig. 6, the third embodiment of the invention provides a pedestrian re-recognition method based on heterogeneous network feature interaction, wherein training the initial model of pedestrian re-recognition based on double loss includes:
step S1021, setting a first classifier for calculating a branch loss function of the convolutional neural network, and setting a second classifier for calculating a branch loss function of the visual transformer;
in the training phase, a classifier is set for each branch individually, i.e. the classifier parameters are not shared. The losses calculated by the two classifiers are added to re-optimize the entire network.
Step S1022, determining that the pedestrian re-recognition initial model converges and stopping training based on the sum of the first loss function calculated by the first classifier and the second loss function obtained by the second classifier.
For each branch feature, the training loss function is composed of two parts, as shown in equations (7) and (8).
L tri =log[1+exp(d pos -d neg )]
(8)
Wherein L is ID And L tri Representing a cross entropy loss function and a triplet loss function, y, respectively i Representing the tag vector, p i Representing the probability value output by the network full-connection layer, N is the number of images of each batch of the input network, d pos And d neg Representing the distance of the positive and negative pairs of samples, respectively.
The final loss function is obtained by summing the loss functions of the two branches as shown in equation (9).
Alpha is used to trade-off the two losses and lambda is used to trade-off the two branch losses. Here, α is set to 1, λ is set to 0.5 according to experiments, and the network model is optimized by minimizing the combined loss function, that is, the convergence of the pedestrian re-recognition model is determined when the minimum value is obtained in the formula (9), and at this time, the pedestrian re-recognition model with the best recognition effect is considered to be obtained, so that training of the model can be stopped.
According to the embodiment, through the scheme, model convergence is determined based on the sum of loss functions of the convolutional neural network branches and the vision transformer branches, and the final pedestrian re-recognition effect is ensured.
As shown in fig. 7, a fourth embodiment of the present invention proposes a pedestrian re-recognition method based on heterogeneous network feature interaction, where the re-recognition of a target pedestrian image based on the pedestrian re-recognition model includes:
step S1031: performing similarity measurement on the characteristics of the target pedestrian image and the plurality of candidate pedestrian images based on the pedestrian re-recognition model to obtain a recognition distance matrix;
and extracting the characteristics of the target pedestrian image and the candidate pedestrian images in the candidate set by using the trained pedestrian re-recognition model to carry out similarity measurement, and obtaining a recognition distance matrix by carrying out similarity comparison.
Step S1032: the candidate pedestrian in the candidate pedestrian image corresponding to the minimum recognition distance matrix is determined as the target pedestrian.
The candidate pedestrian image having the smallest distance to the target pedestrian is determined as the most likely candidate pedestrian. The euclidean distance of two samples in the embedding space is typically calculated as the similarity, and is typically the simplest and most easily understood distance.
In x p And x g Representing the target pedestrian image (probe) and the candidate pedestrian images in the candidate set gamma respectively,and->Is the output characteristic that they cascade together through two branches of the TCCNet network. X is x p And x g Euclidean distance D between features of (a) pg The calculation is as follows:
thus, the target pedestrian is locked based on the Euclidean distance, and the pedestrian re-recognition result is obtained.
As shown in fig. 8, a fifth embodiment of the present invention proposes a pedestrian re-recognition method based on heterogeneous network feature interaction, where after calculating a loss value of the initial pedestrian re-recognition model based on double loss, determining that the initial pedestrian re-recognition model converges and stops training based on the loss value, and obtaining a pedestrian re-recognition model, the method further includes:
step S104, testing the pedestrian re-identification model to obtain an evaluation index;
step S105, performing a comparison experiment on the network structure parameters of the pedestrian re-recognition model based on the evaluation index, and determining target network structure parameters for optimizing the pedestrian re-recognition model based on the target network structure parameters.
The proposed method tests on two large public data sets of MSMT 17. MSMT17 was taken by 12 outdoor cameras and 3 indoor cameras for a total of 4,101 images of 126,441 pedestrians. 23,621 images of 1041 pedestrians are used as a training set, 93,820 images of 3060 pedestrians are used as a test set, 11,659 images are randomly selected as queries, and the remaining 82,162 images are used as a candidate set of bullets. It is a large dataset that is closer to the real scene, which is a dataset that is challenging to the task of pedestrian re-recognition.
In a task of pedestrian re-recognition, a target pedestrian image (query) to be queried is generally given in a test process, then similarity is calculated between the target pedestrian image (query) and candidate images in a candidate set (gamma) based on a pedestrian re-recognition model, and then the images which are closer to the query image are arranged in a sequence from large to small according to the similarity. In order to evaluate the performance of pedestrian re-recognition algorithms, it is currently the practice to calculate the corresponding index on the public data set and then compare it to other models. CMC curves (CumulativeMatching Characteristics) and mAP (mean Average Precision) are the two most commonly used evaluation criteria.
In this embodiment, the most commonly used rank-1, rank-5 and mAP indexes in the CMC curve are mainly selected, wherein rank-k refers to the probability that the k top-level (highest confidence) graph in the search results has the correct result, and mAP indexes are actually equivalent to an average level, and the higher mAP indicates that the query result of the same person as the query is relatively higher in the whole ranking list, which indicates that the model effect is also better.
The specific network structure parameters are as follows: the resolution of the input image is sized 256×128, the batch size is selected to be 64, including 16 different pedestrians and 4 images per person. Data enhancement is performed by random horizontal flipping, padding, random cropping, random erasure, and normalization strategies. The total number of training wheels is set to 150. The present model was optimized using SGD with weight decay factor set to 1e-4 and momentum set to 0.9. The network is activated by using the wakeup strategy to preheat the learning rate, the initial learning rate is 0.009, and cosine strategy attenuation is used to keep stable loss reduction. And determining optimal target network structure parameters based on experimental results.
Based on the evaluation index and experimental details, tests were performed on the MSMT17 dataset, resulting in experimental results as shown in table 1. As can be seen from the experimental results in Table 1, mAP using ResNet50 alone as the base line network can reach 56.7%, and mAP using Vision Transformer alone as the base line network can reach 61.8%. The two base lines are combined into a parallel network, and a shallow characteristic interaction module and a deep characteristic interaction module are added, wherein mAP reaches 66.6%, which indicates that the heterogeneous network structure extracts comprehensive and very remarkable characteristics, and the influence of background noise on the model is reduced, so that the effect of improving the performance of the model is achieved. The invention has significantly improved performance compared to two baseline models and reaches a competitive level.
TABLE 1 influence of different modules on MSM 17 dataset
The recognition result shows that the number of error images retrieved by the ResNet 50-based base line network and the Vision Transformer-based base line network is larger than the model provided by the invention. Furthermore, among the images retrieved by the present invention, the first few images (especially rank-5) are more accurate than the ResNet50 based base line network and Vision Transformer based base line network. This means that the pedestrian re-recognition model provided by the invention can pay attention to more effective information of the human body area, so that more correct target character images can be retrieved, namely, the effectiveness of the heterogeneous network characteristic interaction model is higher.
According to the embodiment, the pedestrian re-recognition model is tested, so that the model is further optimized, and a better pedestrian re-recognition effect is achieved.
Further, to achieve the above objective, the present invention further provides a pedestrian re-recognition device based on heterogeneous network feature interaction, specifically referring to fig. 9, fig. 9 is a schematic functional block diagram of a first embodiment of the pedestrian re-recognition device based on heterogeneous network feature interaction, where the device includes:
the model construction module 10 is used for designing a pedestrian re-identification initial model based on the heterogeneous network characteristics of the convolutional neural network and the visual transformer;
the calculation module 20 is configured to calculate a loss value of the pedestrian re-recognition initial model based on the double loss, determine that the pedestrian re-recognition initial model converges and stops training based on the loss value, and obtain a pedestrian re-recognition model;
and the re-recognition module 30 is used for re-recognizing the target pedestrian image based on the pedestrian re-recognition model.
In addition, the invention further provides a computer readable storage medium, the computer readable storage medium stores a pedestrian re-recognition program based on heterogeneous network feature interaction, and the steps of the pedestrian re-recognition method based on heterogeneous network feature interaction are realized when the pedestrian re-recognition program based on heterogeneous network feature interaction is run by a processor, and are not repeated herein.
Compared with the prior art, the pedestrian re-recognition method, the device, the equipment and the storage medium based on the heterogeneous network feature interaction provided by the invention design a pedestrian re-recognition initial model based on the heterogeneous network features of the convolutional neural network and the visual transformer; calculating a loss value of the pedestrian re-recognition initial model based on the double loss, determining that the pedestrian re-recognition initial model converges and stopping training based on the loss value, and obtaining a pedestrian re-recognition model; and re-identifying the target pedestrian image based on the pedestrian re-identification model. The pedestrian re-recognition model is built based on the heterogeneous network characteristics of the convolutional neural network and the visual transformer, and the shallow characteristic characteristics and the deep characteristic characteristics are fused, so that the basic characteristics of the image can be utilized, the global characteristics of the image can be utilized, a large number of image characteristics can be obtained, and the recognition result is more accurate.
The foregoing description is only of the preferred embodiments of the present invention and is not intended to limit the scope of the invention, and all equivalent structures or modifications in the structures or processes described in the specification and drawings, or the direct or indirect application of the present invention to other related technical fields, are included in the scope of the present invention.

Claims (6)

1. The pedestrian re-identification method based on heterogeneous network feature interaction is characterized by comprising the following steps of:
the method comprises the steps of designing a pedestrian re-recognition initial model based on heterogeneous network characteristics of a convolutional neural network and a visual transformer, namely respectively inputting a training image set into the convolutional neural network module and the visual transformer network module, extracting relevant characteristics and training based on preset initial parameters to obtain the pedestrian re-recognition initial model;
calculating a loss value of the pedestrian re-recognition initial model based on the double loss, determining that the pedestrian re-recognition initial model converges and stopping training based on the loss value, and obtaining a pedestrian re-recognition model;
re-identifying the target pedestrian image based on the pedestrian re-identification model;
the step of designing the pedestrian re-identification initial model based on the heterogeneous network characteristics of the convolutional neural network and the visual transformer comprises the following steps of: constructing a convolutional neural network branch of the pedestrian re-recognition initial model; and constructing a visual transformer branch of the pedestrian re-identification initial model; fusing the shallow heterogeneous characteristics of the convolutional neural network branch with the deep heterogeneous characteristics of the visual transformer branch to obtain the pedestrian re-recognition initial model;
the fusing the shallow heterogeneous characteristics of the convolutional neural network branch and the deep heterogeneous characteristics of the vision transformer branch comprises the following steps: transforming the three-dimensional shallow heterogeneous characteristics of the convolutional neural network branch into two dimensions through convolution of 1 multiplied by 1, carrying out global average pooling operation on the shallow heterogeneous characteristics of the convolutional neural network branch to reserve focal characteristics, and flowing the focal characteristics into the vision transformer branch;
the fusing the shallow heterogeneous characteristics of the convolutional neural network branch and the deep heterogeneous characteristics of the vision transformer branch further comprises: carrying out dimension alignment on deep heterogeneous features of the visual transformer branch through 1 multiplied by 1 convolution to obtain three-dimensional deep heterogeneous features, carrying out normalization processing on the three-dimensional deep heterogeneous features, carrying out feature resolution alignment on the basis of interpolation to obtain features to be exchanged, and flowing the features to be exchanged into the convolution neural network branch; splicing the global feature vectors obtained by the convolutional neural network branch and the visual transformer branch to obtain a pedestrian re-identification feature vector;
wherein the calculating the loss value of the initial model for pedestrian re-recognition based on the double loss, and the determining the initial model for pedestrian re-recognition based on the loss value to converge and stop training comprises: setting a first classifier for calculating a branch loss function of the convolutional neural network, and setting a second classifier for calculating a branch loss function of the visual transformer; and determining that the pedestrian re-recognition initial model converges and stopping training based on the sum of the first loss function calculated by the first classifier and the second loss function obtained by the second classifier.
2. The method of claim 1, wherein said constructing a visual transformer branch of the pedestrian re-recognition initial model comprises:
representing an input pedestrian image as an image block sequence including a plurality of image blocks;
performing linear mapping on the image block sequence to obtain a plurality of D-dimensional embedded representations of the image blocks;
concatenating a class token with a plurality of said D-dimensional embedded representations and adding a position code and a camera code for each of said image blocks to produce a sequence of embedded image blocks;
and sequentially processing the embedded image block sequence through normalization, a multi-head attention mechanism and a multi-layer perceptron to obtain the visual transformer branch.
3. The method of claim 1, wherein the re-identifying the target pedestrian image based on the pedestrian re-identification model comprises:
performing similarity measurement on the characteristics of the target pedestrian image and the plurality of candidate pedestrian images based on the pedestrian re-recognition model to obtain a recognition distance matrix;
the candidate pedestrian in the candidate pedestrian image corresponding to the minimum recognition distance matrix is determined as the target pedestrian.
4. The method according to claim 1, further comprising, after the calculating of the loss value of the pedestrian re-recognition initial model based on the double loss, determining that the pedestrian re-recognition initial model converges and stops training based on the loss value, obtaining a pedestrian re-recognition model:
testing the pedestrian re-identification model to obtain an evaluation index;
and carrying out a comparison experiment on the network structure parameters of the pedestrian re-recognition model based on the evaluation index, and determining target network structure parameters so as to optimize the pedestrian re-recognition model based on the target network structure parameters.
5. The pedestrian re-recognition device based on heterogeneous network feature interaction is characterized by adopting the pedestrian re-recognition method based on heterogeneous network feature interaction as set forth in any one of claims 1-4, and comprising:
the model construction module is used for designing a pedestrian re-identification initial model based on the heterogeneous network characteristics of the convolutional neural network and the visual transformer;
the calculation module is used for calculating the loss value of the pedestrian re-recognition initial model based on the double loss, determining that the pedestrian re-recognition initial model converges and stopping training based on the loss value, and obtaining a pedestrian re-recognition model;
and the re-recognition module is used for re-recognizing the target pedestrian image based on the pedestrian re-recognition model.
6. A pedestrian re-recognition device based on heterogeneous network feature interactions, comprising a memory, a processor, and a pedestrian re-recognition program based on heterogeneous network feature interactions stored on the memory, wherein the pedestrian re-recognition program based on heterogeneous network feature interactions, when executed by the processor, implements the steps of the pedestrian re-recognition method based on heterogeneous network feature interactions of any one of claims 1-4.
CN202210897792.1A 2022-07-28 2022-07-28 Pedestrian re-recognition method, device and equipment based on heterogeneous network feature interaction Active CN115393953B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202210897792.1A CN115393953B (en) 2022-07-28 2022-07-28 Pedestrian re-recognition method, device and equipment based on heterogeneous network feature interaction
PCT/CN2022/121269 WO2024021283A1 (en) 2022-07-28 2022-09-26 Person re-identification method, apparatus, and device based on heterogeneous network feature interaction

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210897792.1A CN115393953B (en) 2022-07-28 2022-07-28 Pedestrian re-recognition method, device and equipment based on heterogeneous network feature interaction

Publications (2)

Publication Number Publication Date
CN115393953A CN115393953A (en) 2022-11-25
CN115393953B true CN115393953B (en) 2023-08-08

Family

ID=84116572

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210897792.1A Active CN115393953B (en) 2022-07-28 2022-07-28 Pedestrian re-recognition method, device and equipment based on heterogeneous network feature interaction

Country Status (2)

Country Link
CN (1) CN115393953B (en)
WO (1) WO2024021283A1 (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111881780A (en) * 2020-07-08 2020-11-03 上海蠡图信息科技有限公司 Pedestrian re-identification method based on multi-layer fusion and alignment division
CN114202740A (en) * 2021-12-07 2022-03-18 大连理工大学宁波研究院 Pedestrian re-identification method based on multi-scale feature fusion
CN114299542A (en) * 2021-12-29 2022-04-08 北京航空航天大学 Video pedestrian re-identification method based on multi-scale feature fusion
CN114445641A (en) * 2022-01-29 2022-05-06 新疆爱华盈通信息技术有限公司 Training method, training device and training network of image recognition model

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11562147B2 (en) * 2020-01-23 2023-01-24 Salesforce.Com, Inc. Unified vision and dialogue transformer with BERT
CN113591692A (en) * 2021-07-29 2021-11-02 赢识科技(杭州)有限公司 Multi-view identity recognition method
CN113344003B (en) * 2021-08-05 2021-11-02 北京亮亮视野科技有限公司 Target detection method and device, electronic equipment and storage medium
WO2022104293A1 (en) * 2021-10-26 2022-05-19 Innopeak Technology, Inc. Multi-modal video transformer (mm-vit) for compressed video action recognition
CN114445681A (en) * 2022-01-28 2022-05-06 上海商汤智能科技有限公司 Model training and image recognition method and device, equipment and storage medium
CN114663685B (en) * 2022-02-25 2023-07-04 江南大学 Pedestrian re-recognition model training method, device and equipment
CN114677687A (en) * 2022-04-14 2022-06-28 大连大学 ViT and convolutional neural network fused writing brush font type rapid identification method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111881780A (en) * 2020-07-08 2020-11-03 上海蠡图信息科技有限公司 Pedestrian re-identification method based on multi-layer fusion and alignment division
CN114202740A (en) * 2021-12-07 2022-03-18 大连理工大学宁波研究院 Pedestrian re-identification method based on multi-scale feature fusion
CN114299542A (en) * 2021-12-29 2022-04-08 北京航空航天大学 Video pedestrian re-identification method based on multi-scale feature fusion
CN114445641A (en) * 2022-01-29 2022-05-06 新疆爱华盈通信息技术有限公司 Training method, training device and training network of image recognition model

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
"多类型特征融合的行人重识别方法的研究";匡澄等;《中国知网》;正文第33-42页 *

Also Published As

Publication number Publication date
WO2024021283A1 (en) 2024-02-01
CN115393953A (en) 2022-11-25

Similar Documents

Publication Publication Date Title
JP7210085B2 (en) Point cloud segmentation method, computer program and computer equipment
CN111489358B (en) Three-dimensional point cloud semantic segmentation method based on deep learning
CN107577990B (en) Large-scale face recognition method based on GPU (graphics processing Unit) accelerated retrieval
Yu et al. Spatial pyramid-enhanced NetVLAD with weighted triplet loss for place recognition
CN110472531B (en) Video processing method, device, electronic equipment and storage medium
CN108596329B (en) Three-dimensional model classification method based on end-to-end deep ensemble learning network
CN110969087B (en) Gait recognition method and system
Murillo et al. Localization in urban environments using a panoramic gist descriptor
CN1669052B (en) Image matching system using 3-dimensional object model, image matching method
CN109753875A (en) Face identification method, device and electronic equipment based on face character perception loss
CN110717411A (en) Pedestrian re-identification method based on deep layer feature fusion
CN110503076B (en) Video classification method, device, equipment and medium based on artificial intelligence
CN110933518B (en) Method for generating query-oriented video abstract by using convolutional multi-layer attention network mechanism
CN110163117B (en) Pedestrian re-identification method based on self-excitation discriminant feature learning
WO2024021394A1 (en) Person re-identification method and apparatus for fusing global features with ladder-shaped local features
CN113743544A (en) Cross-modal neural network construction method, pedestrian retrieval method and system
Wang et al. A novel multiface recognition method with short training time and lightweight based on ABASNet and H-softmax
JP3998628B2 (en) Pattern recognition apparatus and method
CN116597267B (en) Image recognition method, device, computer equipment and storage medium
CN117333908A (en) Cross-modal pedestrian re-recognition method based on attitude feature alignment
CN113033507A (en) Scene recognition method and device, computer equipment and storage medium
CN115393953B (en) Pedestrian re-recognition method, device and equipment based on heterogeneous network feature interaction
CN116343135A (en) Feature post-fusion vehicle re-identification method based on pure vision
Ma et al. Deep regression forest with soft-attention for head pose estimation
Zhang et al. Deep meta-relation network for visual few-shot learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant