CN116644788B - Local refinement and global reinforcement network for vehicle re-identification - Google Patents

Local refinement and global reinforcement network for vehicle re-identification Download PDF

Info

Publication number
CN116644788B
CN116644788B CN202310926540.1A CN202310926540A CN116644788B CN 116644788 B CN116644788 B CN 116644788B CN 202310926540 A CN202310926540 A CN 202310926540A CN 116644788 B CN116644788 B CN 116644788B
Authority
CN
China
Prior art keywords
global
matrix
vehicle
module
pixels
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310926540.1A
Other languages
Chinese (zh)
Other versions
CN116644788A (en
Inventor
郑美凤
王成
张峰
孙珂
李曦
周厚仁
庞希愚
周晓颖
田佳琛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong Jiaotong University
Original Assignee
Shandong Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong Jiaotong University filed Critical Shandong Jiaotong University
Priority to CN202310926540.1A priority Critical patent/CN116644788B/en
Publication of CN116644788A publication Critical patent/CN116644788A/en
Application granted granted Critical
Publication of CN116644788B publication Critical patent/CN116644788B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/092Reinforcement learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/42Global feature extraction by analysis of the whole pattern, e.g. using frequency domain transformations or autocorrelation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The invention relates to the technical field of vehicle re-identification, in particular to a local refinement and global reinforcement network for vehicle re-identification, which is a three-branch network and learns the identified local features and global features of a vehicle through a local refinement module and a global reinforcement module. Wherein the local refinement module is intended to learn a refined local representation, capturing rich correlation information between neighboring pixels through interaction of the target pixel with its nearest pixels; the global emphasis module aims to learn the enhanced global representation, first to focus the attention of the target pixels into individual windows to emphasize important remote dependencies within the region, and then to aggregate globally significant remote connections by cross-window interactions. The local refinement module and the global reinforcement module are matched with each other, so that the identification local information and the identification overall information of the vehicle can be effectively extracted.

Description

Local refinement and global reinforcement network for vehicle re-identification
Technical Field
The invention relates to the technical field of vehicle re-identification, in particular to a local refinement and global reinforcement network for vehicle re-identification.
Background
The vehicle re-identification aims at retrieving the same vehicle image as the query ID from the image library. At present, the task of vehicle re-identification mainly faces two challenges of large intra-class difference and small inter-class difference. Learning the discriminative local and global features of a vehicle is critical to solving both challenges. Self-attention mechanism is a special attention, which mainly contains two forms of full self-attention (full self-attention) and local self-attention (local self-attention), and has shown great potential in the field of computer vision. However, remote connections in the global context of full self-section modeling are typically weak, which limits the learning of overall information for the vehicle; the windowed mode of localsif-attitution prevents adequate learning of the local detail information of the vehicle.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and proposes to design a local refinement and global reinforcement network for vehicle re-identification.
The technical scheme adopted for solving the technical problems is as follows:
a local refinement and global reinforcement network for vehicle re-identification, employing the residual block preceding res_conv4_2 of res net-50 as the backbone for feature extraction, the subsequent part of res_conv4_1 residual block is divided into three branches: GL Branch, GS Branch, and LR Branch, and remove the downsampling operation of the res_conv5_1 residual block of the three branches to provide a larger spatial view;
GL Branch without attention module is used to learn general information of the vehicle as a whole;
adding a global reinforcement module after the res_conv5 layer of GS Branch to learn a reinforced global representation of the vehicle;
applying a local refinement module after the res_conv5 layer of LR Branch to learn a refined local representation of the vehicle;
wherein, the local refinement module aims at capturing the identifying local information of the vehicle, and the structure is as follows:
with characteristic diagramsIs an input to the module which, among other things,C、H、Wthe number, height and width of the channels respectively representing the feature map; using an output channel number of 3CIs convolved with 1*1 to obtainxQuery tensor->Tensor of keySum tensor->:/>
Is provided withxMiddle (f)iThe query of individual pixels isRepresentingx q In positioniFeature vectors at the location; first, theiIndividual pixels +.>The set of keys in the neighborhood is marked +.>Representingx k Middle and positioniClosest tok 2 Feature vectors for the individual locations.
To realize the firstiWith a pixel nearest to itk 2 Interaction of individual pixels, willq i And (3) withk i Is subjected to matrix product calculation and is executedsoftmaxNormalization to obtain attention weight vectorThe formula is as follows:
wherein, representing a matrix multiplication calculation; attention weight vector numberjThe individual elements represent the firstiIndividual pixels and their->The first in the neighborhoodjPaired affinities of individual pixels; then, the invention is fromx v Middle extraction positioniIs->Feature vectors in the neighborhood, denoted +.>Represents the firstiOf individual pixelsk 2 The value of the nearest neighbor; finally, the invention is based on the attention scoreA i Aggregationv i To capture the firstiLocal context of each pixel and reconstructing its characterization to obtainThe calculation process is expressed as follows:
the global strengthening module aims at capturing the identifying overall information of the vehicle, and has the structure that:
with characteristic diagramsIs an input to the global enhancement module, wherein,C、H、Wthe number, the height and the width of the channels respectively represent the characteristic diagram; obtained by a deforming operation and a fully-connected layerxQuery matrix->
The first of the matrixiRow of linesRepresent the firstiA query vector of individual pixels; in order to disperse the attention score at a target pixel into multiple windows, the present invention distributes the attention score along the spatial dimensionxEvenly divided into->A window, wherein,handwthe height and width of a window respectively; applying a deforming operation and a full connection layer to the feature map of each window to obtainMKey matrix of individual windows->
Wherein, the firstjThe key matrix of each window isN=h*wFor the size of the window, the linear transformation operations of all windows share the same weight;K j each column of (a) is a firstjA key vector in a window;
will beQ i And (3) withK T j Matrix multiplication is performed to obtain a target pixeliAnd the firstjPaired affinity vectors between pixels within a windowI.e.
Wherein, representing a matrix multiplication; first, thejPaired affinity matrix for each window with respect to all target pixelsBy means ofQAnd (3) withK T j Matrix multiplication is performed to obtain:
wherein, R j each row of the pixel array is a target pixel and the first row is a target pixeljPair-wise affinities between pixels within the windows; then, the invention is thatR j Is performed in the column direction of (a)softmaxNormalization operates to obtain the attention score of the pixels of the window at each target pixel, which is formulated as:
first, thejAttention matrix of individual windowsEach row of (a) represents a target pixel and the first rowjDependency of all pixels in the window;
by calculation ofMThe attention score of each window at each target pixel results inMAttention matrix of individual windowsThe method comprises the steps of carrying out a first treatment on the surface of the This isMThe matrices are simultaneously calculated as:
wherein, softmaxthe operation is performed in the last dimension; to capture a globally significant remote connection of a target pixel, one wouldMThe attention matrixes are spliced into a matrix along a column axisAnd execute thereonL1_normNormalization to obtain a distance dependent intensified attention matrix +.>The calculation formula is as follows:
L1_normaggregating the enhanced remote connections from the global receptive field; similar to the calculation of key matrix, the invention is applied toxIs obtained by applying a deforming operation and a full connection layer to the feature map of each windowMValue matrix of individual windows
Wherein, the parameters of the linear transformation operation of all windows are shared; at the futureMThe value matrices of the windows are spliced together to form a value matrixThen, use matrixA '' Pair matrixVWeighted summation is performed to reconstruct a representation of the features:
reconstructed featuresSThe captured global context strengthens some meaningful remote dependencies with low relevance;
finally, the invention will matrixDeformation into tensor->And adds it to the input feature map to calculate the output feature map of the global augmentation moduleF The calculation process is as follows:
wherein, GELUthe units of the gaussian error line are indicated,BNrepresenting a batch normalization operation; the module distributes attention to each window and constructs an enhanced global context representation by adopting cross-window interaction operation, thereby improving the capability of learning the whole information of the vehicle by the network.
Further, in the local refinement module, each pixel is closest to itk 2 Calculation of the pairwise affinities of individual pixels and reconstruction of all pixels can be accomplished byunfoldOperation and matrix multiplication of tensors; first, willx q Morphing to obtain a query tensorThe tensor hasHWA number of queries, each query having a size of 1×CThe method comprises the steps of carrying out a first treatment on the surface of the At the same time atx k A kernel of the sizek*kAnd step size of 1unfoldOperates to decimate around each pixelk 2 The individual keys are deformed to obtain the key tensor +.>Wherein the key corresponding to the nearest neighbor of each pixel is selected by onek 2 ×CIs stored in a matrix; each representation isWith a pixel nearest to itk 2 Attention weight tensor for paired affinities of individual pixels>Is byQAndK T matrix multiplication of (a)softmaxThe normalization operation results from:
wherein, a certain pixel and the method thereofThe pair affinity of the pixels in the neighborhood is 1×k 2 Vector representations of (a); next, inx v A kernel of the sizek*kAnd step size of 1unfoldOperates to extract each pixelk 2 The value corresponding to the nearest neighbor is deformed to obtain a value tensor +.>Wherein the nearest neighbor value of each pixel is calculated by onek 2 ×CIs stored in a matrix; finally, the weight vector of each pixel is used for the surroundingk 2 The values corresponding to the individual pixels are weighted and summed to obtain all reconstructed pixels +.>The calculation process is expressed as follows:
the calculation process realizes the interaction between each pixel and the nearest neighbor pixel, and captures rich detail information.
Tensor is to be tensedx Remodelling intoAnd adds it to the original feature map, pair-wise addingPost feature map executionBNAndGELUoperating to obtain final output characteristic diagramF The method comprises the following steps:
the local refinement module captures the context of the target pixel about its nearest neighbor, and the weight of the local refinement module is generated through interaction of the target pixel and its nearest neighbor, so that rich correlation information between pixels can be fully utilized, and different visual modes of different spatial positions can be adapted.
Further, the three branches each employ a global averaging pooling operation and dimension reduction module to generate a feature representation of the input vehicle image.
Further, for any feature map of the branch output, a global averaging pooling operation is used to obtain a 2048-dimensional feature vector, and then a dimension reduction module consisting of 1*1 convolution, BN and relu activation functions is used to further compress its dimensions to 256 dimensions.
Further, the 256-dimensional feature vector is used for the computation of the triplet loss, and is converted into a full-connected layer with the number of output neurons being the number of vehicles in the training set for the computation of the cross entropy loss.
Further, the cross entropy loss calculation formula is as follows:
wherein, Nthe number of vehicles in the training set is shown,ya real identity tag representing an image input to the network,p i is the input image belonging to the firstiProbability of vehicle.
Further, the triplet loss calculation formula is as follows:
wherein, αis control ofAnd->The margin of the distance difference is over-parametered,f a i()f p i()f n j() features extracted from the anchor point, positive sample, negative sample, respectively.
Furthermore, the invention adds the cross entropy loss and the triplet loss of the three branches to obtain the final loss, and the total loss calculation formula is as follows:
wherein, Nrepresenting the number of branches.
The invention has the technical effects that:
compared with the prior art, the local refinement and global reinforcement network for vehicle re-identification uses the local refinement module and the global reinforcement module to learn the identified local features and global features of the vehicle so as to cope with challenges in vehicle re-identification. Wherein the local refinement module is directed to learning a refined local representation that captures rich correlation information between neighboring pixels through interactions of the target pixel with its nearest pixels; the global emphasis module aims at learning a reinforced global representation that first distracts the target pixels into individual windows to emphasize important remote dependencies within the region, and then aggregates globally significant remote connections by cross-window interactions.
Drawings
FIG. 1 is a diagram of a local refinement and global reinforcement network architecture for vehicle re-identification in accordance with the present invention;
FIG. 2 is a block diagram of a locally refined module of the present invention;
FIG. 3 is a block diagram of a global enhancement module of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention more clear, the technical solutions in the embodiments of the present invention are clearly and completely described below with reference to the accompanying drawings of the specification.
Example 1:
as shown in fig. 1, in a local refinement and global reinforcement network for vehicle re-recognition according to this embodiment, a residual block before res_conv4_2 of res net-50 is used as a backbone for feature extraction, and a subsequent part of the res_conv4_1 residual block is divided into three branches: GL Branch, GS Branch, and LR Branch, and removing the downsampling operation of the res_conv5_1 residual block of the three branches to provide a larger spatial view;
GL Branch without attention module is used to learn general information of the vehicle as a whole;
adding a global reinforcement module after the res_conv5 layer of GS Branch to learn a reinforced global representation of the vehicle;
applying a local refinement module after the res_conv5 layer of LR Branch to learn a refined local representation of the vehicle and dividing the output feature map of the module into two parts in the horizontal direction to facilitate further learning of the module;
wherein the local refinement module aims at capturing discriminative local information of the vehicle, and utilizes rich related information contained between adjacent pixels to refine the local representation through interaction of the target pixel and its nearest pixel, and the structure thereof is shown in figure 2, and is provided with a feature mapIs an input to the module which, among other things,C、H、Wthe number, height and width of the channels respectively representing the feature map; using an output channel number of 3CIs convolved with 1*1 to obtainxQuery tensorKey tensor->Sum tensor->:/>
Is provided withxMiddle (f)iThe query of individual pixels isRepresentingx q In positioniFeature vectors at the location; first, theiIndividual pixels +.>The set of keys in the neighborhood is marked +.>Representingx k Middle and positioniClosest tok 2 Feature vectors for the individual locations.
To realize the firstiWith a pixel nearest to itk 2 Interaction of individual pixels, willq i And (3) withk i Is subjected to matrix product calculation and is executedsoftmaxNormalization to obtain attention weight vectorThe formula is as follows:
wherein, representing a matrix multiplication calculation; attention weight vector numberjThe individual elements represent the firstiIndividual pixels and their->The first in the neighborhoodjPaired affinities of individual pixels; then, the invention is fromx v Middle extraction positioniIs->Feature vectors in the neighborhood, denoted +.>Represents the firstiOf individual pixelsk 2 The value of the nearest neighbor; finally, the invention is based on the attention scoreA i Aggregationv i To capture the firstiLocal context of each pixel and reconstructing its characterization, resulting inThe calculation process is expressed as follows:
the computation process gathers rich relevant information between the target pixel and its nearest neighbors, which captures a refined local context compared to local self-attention.
Each pixel being nearest to itk 2 Calculation of the pairwise affinities of individual pixels and reconstruction of all pixels can be accomplished byunfoldOperation and matrix multiplication of tensors; first, willx q Morphing to obtain a query tensorThe tensor hasHWA number of queries, each query having a size of 1×CThe method comprises the steps of carrying out a first treatment on the surface of the At the same time atx k A kernel of the sizek*kAnd step size of 1unfoldOperates to decimate around each pixelk 2 The individual keys are deformed to obtain the key tensor +.>Wherein the key corresponding to the nearest neighbor of each pixel is selected by onek 2 ×CIs stored in a matrix; representing each pixel as closest to itk 2 Attention weight tensor for paired affinities of individual pixels>Is byQAndK T matrix multiplication of (a)softmaxThe normalization operation results from:
wherein, a certain pixel and the method thereofThe pair affinity of the pixels in the neighborhood is 1×k 2 Vector representations of (a); next, inx v A kernel of the sizek*kAnd step size of 1unfoldOperates to extract each pixelk 2 The value corresponding to the nearest neighbor is deformed to obtain a value tensor +.>Wherein the nearest neighbor value of each pixel is calculated by onek 2 ×CIs stored in a matrix; finally, the weight vector of each pixel is used for the surroundingk 2 The values corresponding to the individual pixels are weighted and summed to obtain all reconstructed pixels +.>The calculation process is expressed as follows:
the calculation process realizes the interaction between each pixel and the nearest neighbor pixel, and captures rich detail information.
Tensor is to be tensedx Remodelling intoAnd adds it to the original feature map, performs the following operations on the added feature mapBNAndGELUoperating to obtain final output characteristic diagramF The method comprises the following steps:
the local refinement modules are similar to a normal convolution in that they both capture the context of the target pixel with respect to its nearest neighbors. But the weights of the normal convolution are static and lack adaptability. The weight of the local refinement module is generated through interaction between the target pixel and the nearest neighbor of the target pixel, so that rich correlation information among pixels can be fully utilized, and the local refinement module is dynamic and can adapt to different visual modes of different spatial positions;
the global enhancement module aims at capturing the discriminative overall information of the vehicle, and the structure is as shown in fig. 3, wherein important remote dependence in a window is emphasized by window segmentation of a key vector and a value vector, and then global significant remote connection is obtained by cross-window interaction, so that global representation is enhanced.
With characteristic diagramsIs an input to the global enhancement module, wherein,C、H、Wthe number, the height and the width of the channels respectively represent the characteristic diagram; obtained by a deforming operation and a fully-connected layer (FC)xQuery matrix->
The first of the matrixiRow of linesRepresent the firstiA query vector of individual pixels; in order to disperse the attention score at a target pixel into multiple windows, the present invention distributes the attention score along the spatial dimensionxEvenly divided into->A window, wherein,handwrespectively are provided withIs the height and width of a window. Applying a deforming operation and a full connection layer to the feature map of each window to obtainMKey matrix of individual windows->
Wherein, the firstjThe key matrix of each window isN=h*wFor the size of the window, the linear transformation operations of all windows share the same weight;K j each column of (a) is a firstjA key vector in a window;
will beQ i And (3) withK T j Matrix multiplication is performed to obtain a target pixeliAnd the firstjPaired affinity vectors between pixels within a windowI.e.
Wherein, representing a matrix multiplication; first, thejPaired affinity matrix for each window with respect to all target pixelsBy means ofQAnd (3) withK T j Matrix multiplication is performed to obtain:
wherein, R j each line of the image is a target imageElement and the firstjPair-wise affinities between pixels within the windows; then, the invention is thatR j Is performed in the column direction of (a)softmaxNormalization operates to obtain the attention score of the pixels of the window at each target pixel, which is formulated as:
first, thejAttention matrix of individual windowsEach row of (a) represents a target pixel and the first rowjDependency of all pixels in the window; the independent calculation of the attention score for the pixels within each window can emphasize significant distance dependence compared to full self-attention.
By calculation ofMThe attention score of each window at each target pixel results inMAttention matrix of individual windowsThe method comprises the steps of carrying out a first treatment on the surface of the This isMThe matrices are simultaneously calculated as:
wherein, softmaxthe operation is performed in the last dimension; to capture a globally significant remote connection of a target pixel, one wouldMThe attention matrixes are spliced into a matrix along a column axisAnd execute thereonL1_normNormalization to obtain a distance dependent intensified attention matrix +.>The calculation formula is as follows:
L1_normaggregating the enhanced remote connections from the global receptive field; similar to the calculation of key matrix, the invention is applied toxIs obtained by applying a deforming operation and a full connection layer to the feature map of each windowMValue matrix of individual windows
Wherein, the parameters of the linear transformation operation of all windows are shared; at the futureMThe value matrices of the windows are spliced together to form a value matrixThen, use matrixA '' Pair matrixVWeighted summation is performed to reconstruct a representation of the features:
reconstructed featuresSThe captured global context strengthens some meaningful remote dependencies with low relevance;
finally, the invention will matrixDeformation into tensor->And adds it to the input feature map to calculate the output feature map of the global augmentation moduleF The calculation process is as follows:
wherein, GELUthe units of the gaussian error line are indicated,BNrepresenting a batch normalization operation; the module distracts the individual windows and builds an enhanced global aspect using cross-window interactionsThe following shows that the capability of the network to learn the overall information of the vehicle is improved.
The three branches each employ a global averaging pooling operation and dimension reduction module to generate a feature representation of the input vehicle image. For any feature map (sub-feature map) of the branch output, a global averaging pooling operation is used to obtain a 2048-dimensional feature vector, which is then further dimension-compressed to 256 dimensions using a dimension reduction module consisting of 1*1 convolution, BN and relu activation functions. The 256-dimensional feature vector is used for the computation of the triplet loss and is converted to a full-connected layer of the number of output neurons as the number of vehicles in the training set for the computation of the cross entropy loss. All the 256-dimensional feature vectors output by the branches are spliced together to be embedded as the features of the input image in the test stage.
In order to prevent model overfitting and improve network recognition capability, the present invention uses cross entropy loss and triplet loss, which are widely used in re-recognition tasks, as the loss function of the present invention. Wherein cross entropy loss is used for classification, and triplet loss is used as a loss function for training stage metric learning.
Cross entropy loss is often used as a loss calculation for classification problems, which mainly measures the difference between the true probability distribution and the predicted probability distribution. In order to improve the prediction effect of the model, the invention reduces the value of the cross entropy as much as possible. Cross entropy often is equal tosoftmaxThe combination of the two components can be used in combination,softmaxthe output result can be mapped into a probability distribution of multiple classifications such that the sum of the predictive probabilities for each classification is 1, while the cross entropy is used to calculate the loss. The calculation formula of the cross entropy loss is as follows:
wherein, Nthe number of vehicles in the training set is shown,ya real identity tag representing an image input to the network,p i is the input image belonging to the firstiProbability of vehicle.
Triplet loss for the three inputs of the set anchor point, positive sample and negative sample, its goal is to minimize the distance between the anchor point and the positive sample with the same identity, and maximize the distance between the anchor point and the negative sample with a different identity. When the two inputs are very similar, it can learn a better representation of the two input vectors that differ less, thus distinguishing the details well. Through continuous learning, vehicles with the same ID are finally gathered in the feature space, so that the task of vehicle re-identification is completed. The triplet loss is calculated by the following way:
wherein, αis control ofAnd->The margin of the distance difference is over-parametered,f a i()f p i()f n j() features extracted from the anchor point, the positive sample and the negative sample respectively; in addition, respectively usemaxAndminthe function gets the hardest positive and negative pairs, i.e. the most distant positive and closest negative pairs.
The invention adds the cross entropy loss and the triplet loss of the three branches to obtain the final loss, and the total loss formula is written as follows:
wherein, Nrepresenting the number of branches.
The invention provides a local refinement and global reinforcement network for vehicle re-identification, which learns the identified local and global features of a vehicle through a local refinement module and a global reinforcement module. Wherein the local refinement module captures rich relevant information contained between adjacent pixels through interaction of the target pixel with its nearest neighbors, thereby refining the local representation. The global strengthening module firstly distributes the attention score of the target pixel into each window to emphasize important remote dependence in the region, and then gathers global meaningful remote connection through cross-window interaction, so as to learn the strengthened global representation. The local refinement module and the global reinforcement module are matched with each other, so that the identification local information and the identification overall information of the vehicle can be effectively extracted.
The above embodiments are merely examples of the present invention, and the scope of the present invention is not limited to the above embodiments, and any suitable changes or modifications made by those skilled in the art, which are consistent with the claims of the present invention, shall fall within the scope of the present invention.

Claims (8)

1. A local refinement and global reinforcement network for vehicle re-identification, characterized by: taking the vehicle image as input, the residual block before res_conv4_2 of ResNet-50 is taken as the backbone for feature extraction, and the subsequent part of res_conv4_1 residual block is divided into three branches: GL Branch, GS Branch, and LR Branch, and remove the downsampling operation of the res_conv5_1 residual block of the three branches;
GL Branch without attention module is used to learn general information of the vehicle as a whole;
adding a global reinforcement module after the res_conv5 layer of GS Branch to learn a reinforced global representation of the vehicle;
applying a local refinement module after the res_conv5 layer of LR Branch to learn a refined local representation of the vehicle;
the structure of the local refinement module is as follows:
with characteristic diagramsIs an input to the module which, among other things,C、H、Wthe number, height and width of the channels respectively representing the feature map; using an output channel number of 3CIs convolved with 1*1 to obtainxQuery tensor->Tensor of keySum tensor->:/>
Is provided withxMiddle (f)iThe query of individual pixels isRepresentingx q In positioniFeature vectors at the location; first, theiIndividual pixels +.>The set of keys in the neighborhood is marked +.>Representingx k Middle and positioniClosest tok 2 Feature vectors for the individual locations;
will beq i And (3) withk i Is subjected to matrix product calculation and is executedsoftmaxNormalization to obtain attention weight vectorThe formula is as follows:
wherein, representing a matrix multiplication calculation; attention weight vector numberjThe individual elements represent the firstiIndividual pixels and their->The first in the neighborhoodjPaired affinities of individual pixels; then, fromx v Middle extraction positioniIs->Feature vectors in the neighborhood, denoted asRepresents the firstiOf individual pixelsk 2 The value of the nearest neighbor; finally, according to the attention scoreA i Aggregationv i To capture the firstiLocal context of individual pixels and reconstructing its characterization, resulting in +.>The calculation process is expressed as follows:
obtaining a local refinement module output feature map of the vehicle image;
the global strengthening module has the structure that:
with characteristic diagramsIs an input to the global enhancement module, wherein,C、H、Wthe number, the height and the width of the channels respectively represent the characteristic diagram; obtained by a deforming operation and a fully-connected layerxQuery matrix->
The first of the matrixiRow of linesRepresent the firstiA query vector of individual pixels; will be along the spatial dimensionxEvenly divided into->A window, wherein,handwthe height and width of a window respectively; applying a deforming operation and a full connection layer to the feature map of each window to obtainMKey matrix of individual windows->
Wherein, the firstjThe key matrix of each window isN=h*wFor the size of the window, the linear transformation operations of all windows share the same weight;K j each column of (a) is a firstjA key vector in a window;
will beQ i And (3) withK T j Matrix multiplication is performed to obtain a target pixeliAnd the firstjPaired affinity vectors between pixels within a windowI.e.
Wherein, representing a matrix multiplication; first, thejPaired affinity matrix of each window for all target pixels>By means ofQAnd (3) withK T j Matrix multiplication is performed to obtain:
wherein, R j each row of the pixel array is a target pixel and the first row is a target pixeljPair-wise affinities between pixels within the windows; then, atR j Is performed in the column direction of (a)softmaxNormalization operates to obtain the attention score of the pixels of the window at each target pixel, which is formulated as:
first, thejAttention matrix of individual windowsEach row of (a) represents a target pixel and the first rowjDependency of all pixels in the window;
by calculation ofMThe attention score of each window at each target pixel results inMAttention matrix of individual windowsThe method comprises the steps of carrying out a first treatment on the surface of the This isMThe matrices are simultaneously calculated as:
wherein, softmaxthe operation is performed in the last dimension; will beMThe attention matrixes are spliced into a matrix along a column axisAnd execute thereonL1_normNormalization is carried out to obtain the attention moment of remote dependence strengtheningMatrix->The calculation formula is as follows:
for a pair ofxIs obtained by applying a deforming operation and a full connection layer to the feature map of each windowMValue matrix of individual windows
Wherein, the parameters of the linear transformation operation of all windows are shared; at the futureMThe value matrices of the windows are spliced together to form a value matrixThen, use matrixA '' Pair matrixVWeighted summation is performed to reconstruct a representation of the features:
finally, matrix is formedDeformation into tensor->And adds it to the input feature map to calculate the output feature map of the global augmentation moduleF The calculation process is as follows:
wherein, GELUthe units of the gaussian error line are indicated,BNrepresenting a batch normalization operation; and obtaining a global strengthening module output characteristic diagram of the vehicle image.
2. The local refinement and global reinforcement network for vehicle re-identification of claim 1, wherein: in the local refinement module, each pixel is nearest to itk 2 Computation of pairwise affinities for individual pixels and reconstruction pass of all pixelsunfoldOperation and matrix multiplication of tensors; first, willx q Morphing to obtain a query tensorThe tensor hasHWA number of queries, each query having a size of 1×CThe method comprises the steps of carrying out a first treatment on the surface of the At the same time atx k A kernel of the sizek*kAnd step size of 1unfoldOperates to decimate around each pixelk 2 The individual keys are deformed to obtain the key tensor +.>Wherein the key corresponding to the nearest neighbor of each pixel is selected by onek 2 ×CIs stored in a matrix; representing each pixel as closest to itk 2 Attention weight tensor for paired affinities of individual pixels>Is byQAndK T matrix multiplication of (a)softmaxThe normalization operation results from:
wherein, a certain pixel and the method thereofThe pair affinity of the pixels in the neighborhood is 1×k 2 Vector representations of (a); next, inx v A kernel of the sizek*kAnd step size of 1unfoldOperates to extract each pixelk 2 The value corresponding to the nearest neighbor is deformed to obtain a value tensor +.>Wherein the nearest neighbor value of each pixel is calculated by onek 2 ×CIs stored in a matrix; finally, the weight vector of each pixel is used for the surroundingk 2 The values corresponding to the individual pixels are weighted and summed to obtain all reconstructed pixels +.>The calculation process is expressed as follows:
tensor is to be tensedx Remodelling intoAnd adds it to the original feature map, performs the following operations on the added feature mapBNAndGELUoperating to obtain final output characteristic diagramF The method comprises the following steps:
3. the local refinement and global reinforcement network for vehicle re-identification according to claim 1 or 2, characterized in that: the three branches each employ a global averaging pooling operation and dimension reduction module to generate a feature representation of the input vehicle image.
4. A local refinement and global reinforcement network for vehicle re-identification as claimed in claim 3, wherein: for any feature map of the branch output, a global averaging pooling operation is used to obtain a 2048-dimensional feature vector, which is then further dimension-compressed to 256 dimensions using a dimension reduction module consisting of 1*1 convolution, BN and relu activation functions.
5. The local refinement and global reinforcement network for vehicle re-identification of claim 4, wherein: the 256-dimensional feature vector is used for the computation of the triplet loss and is converted to a full-connected layer of the number of output neurons as the number of vehicles in the training set for the computation of the cross entropy loss.
6. The local refinement and global reinforcement network for vehicle re-identification of claim 5, wherein: the cross entropy loss calculation formula is as follows:
wherein, Nthe number of vehicles in the training set is shown,ya real identity tag representing an image input to the network,p i is the input image belonging to the firstiProbability of vehicle.
7. The local refinement and global reinforcement network for vehicle re-identification of claim 5, wherein: the triplet loss calculation formula is as follows:
wherein, αis control ofAnd->The margin of the distance difference is over-parametered,f a i()f p i()f n j() features extracted from the anchor point, positive sample, negative sample, respectively.
8. The local refinement and global reinforcement network for vehicle re-identification of claim 5, wherein: adding the cross entropy loss and the triplet loss of the three branches to obtain a final loss, wherein the total loss is calculated as follows:
wherein, Nthe number of branches is indicated by the number of branches,L id representing the cross-entropy loss,L triplet representing the triplet loss.
CN202310926540.1A 2023-07-27 2023-07-27 Local refinement and global reinforcement network for vehicle re-identification Active CN116644788B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310926540.1A CN116644788B (en) 2023-07-27 2023-07-27 Local refinement and global reinforcement network for vehicle re-identification

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310926540.1A CN116644788B (en) 2023-07-27 2023-07-27 Local refinement and global reinforcement network for vehicle re-identification

Publications (2)

Publication Number Publication Date
CN116644788A CN116644788A (en) 2023-08-25
CN116644788B true CN116644788B (en) 2023-10-03

Family

ID=87640396

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310926540.1A Active CN116644788B (en) 2023-07-27 2023-07-27 Local refinement and global reinforcement network for vehicle re-identification

Country Status (1)

Country Link
CN (1) CN116644788B (en)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2018050259A (en) * 2016-09-23 2018-03-29 富士通株式会社 Device, method and program for noise reduction
CN111460914A (en) * 2020-03-13 2020-07-28 华南理工大学 Pedestrian re-identification method based on global and local fine-grained features
WO2020257812A2 (en) * 2020-09-16 2020-12-24 Google Llc Modeling dependencies with global self-attention neural networks
CN112766353A (en) * 2021-01-13 2021-05-07 南京信息工程大学 Double-branch vehicle re-identification method for enhancing local attention
CN113408492A (en) * 2021-07-23 2021-09-17 四川大学 Pedestrian re-identification method based on global-local feature dynamic alignment
CN114119975A (en) * 2021-11-25 2022-03-01 中国人民公安大学 Language-guided cross-modal instance segmentation method
CN114821249A (en) * 2022-07-04 2022-07-29 山东交通学院 Vehicle weight recognition method based on grouping aggregation attention and local relation
CA3166088A1 (en) * 2021-06-29 2022-12-29 10353744 Canada Ltd. Training method and pedestrian re-identification method of multi-task classification network
DE102022128465A1 (en) * 2021-11-05 2023-05-11 Nvidia Corporation NOVEL PROCEDURE FOR TRAINING A NEURAL NETWORK

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10290085B2 (en) * 2016-12-14 2019-05-14 Adobe Inc. Image hole filling that accounts for global structure and local texture
US20220415027A1 (en) * 2021-06-29 2022-12-29 Shandong Jianzhu University Method for re-recognizing object image based on multi-feature information capture and correlation analysis

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2018050259A (en) * 2016-09-23 2018-03-29 富士通株式会社 Device, method and program for noise reduction
CN111460914A (en) * 2020-03-13 2020-07-28 华南理工大学 Pedestrian re-identification method based on global and local fine-grained features
WO2020257812A2 (en) * 2020-09-16 2020-12-24 Google Llc Modeling dependencies with global self-attention neural networks
CN112766353A (en) * 2021-01-13 2021-05-07 南京信息工程大学 Double-branch vehicle re-identification method for enhancing local attention
CA3166088A1 (en) * 2021-06-29 2022-12-29 10353744 Canada Ltd. Training method and pedestrian re-identification method of multi-task classification network
CN113408492A (en) * 2021-07-23 2021-09-17 四川大学 Pedestrian re-identification method based on global-local feature dynamic alignment
DE102022128465A1 (en) * 2021-11-05 2023-05-11 Nvidia Corporation NOVEL PROCEDURE FOR TRAINING A NEURAL NETWORK
CN114119975A (en) * 2021-11-25 2022-03-01 中国人民公安大学 Language-guided cross-modal instance segmentation method
CN114821249A (en) * 2022-07-04 2022-07-29 山东交通学院 Vehicle weight recognition method based on grouping aggregation attention and local relation

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Vehicle Re-Identification Based on Global Relational Attention and Multi-Granularity Feature Learning;Xin Tian;《IEEE Access》;全文 *
医学影像疾病诊断的残差神经网络优化算法研究进展;周涛;霍兵强;陆惠玲;师宏斌;;中国图象图形学报(第10期);全文 *
周涛 ; 霍兵强 ; 陆惠玲 ; 师宏斌 ; .医学影像疾病诊断的残差神经网络优化算法研究进展.中国图象图形学报.2020,(第10期),全文. *

Also Published As

Publication number Publication date
CN116644788A (en) 2023-08-25

Similar Documents

Publication Publication Date Title
CN112307958B (en) Micro-expression recognition method based on space-time appearance motion attention network
CN110135366B (en) Shielded pedestrian re-identification method based on multi-scale generation countermeasure network
Kim et al. Fully deep blind image quality predictor
CN111274869B (en) Method for classifying hyperspectral images based on parallel attention mechanism residual error network
WO2021043168A1 (en) Person re-identification network training method and person re-identification method and apparatus
CN111582044B (en) Face recognition method based on convolutional neural network and attention model
Rahmon et al. Motion U-Net: Multi-cue encoder-decoder network for motion segmentation
Tursun et al. MTRNet++: One-stage mask-based scene text eraser
Özkanoğlu et al. InfraGAN: A GAN architecture to transfer visible images to infrared domain
CN109242097B (en) Visual representation learning system and method for unsupervised learning
CN112232395B (en) Semi-supervised image classification method for generating countermeasure network based on joint training
CN113255602A (en) Dynamic gesture recognition method based on multi-modal data
CN116863194A (en) Foot ulcer image classification method, system, equipment and medium
CN111310820A (en) Foundation meteorological cloud chart classification method based on cross validation depth CNN feature integration
Gao et al. MLTDNet: an efficient multi-level transformer network for single image deraining
CN113962905A (en) Single image rain removing method based on multi-stage feature complementary network
CN116644788B (en) Local refinement and global reinforcement network for vehicle re-identification
Yu et al. MagConv: Mask-guided convolution for image inpainting
Xie et al. Global semantic-guided network for saliency prediction
CN116597144A (en) Image semantic segmentation method based on event camera
CN116630637A (en) optical-SAR image joint interpretation method based on multi-modal contrast learning
Zia et al. Text-to-image generation with attention based recurrent neural networks
Zhu et al. Micro-expression recognition convolutional network based on dual-stream temporal-domain information interaction
CN112884022B (en) Unsupervised depth characterization learning method and system based on image translation
CN112529081B (en) Real-time semantic segmentation method based on efficient attention calibration

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant