CN110543841A

CN110543841A - Pedestrian re-identification method, system, electronic device and medium

Info

Publication number: CN110543841A
Application number: CN201910776479.0A
Authority: CN
Inventors: 赵朝阳
Original assignee: Sino-Tech Visual Language (beijing) Technology Co Ltd
Current assignee: Sino-Tech Visual Language (beijing) Technology Co Ltd
Priority date: 2019-08-21
Filing date: 2019-08-21
Publication date: 2019-12-06

Abstract

the present disclosure provides a pedestrian re-identification method, including: extracting global features of the pedestrian image; generating semantic segmentation result graphs respectively corresponding to the pedestrian foreground and different body parts of the pedestrian according to the pedestrian image; calculating the semantic segmentation result graph and the global features to obtain first local feature graphs corresponding to the pedestrian foreground and different body parts of the pedestrian respectively; performing pixel-level attention generation and feature extraction on the first local feature map to obtain second local feature maps corresponding to the foreground of the pedestrian image and different body parts of the pedestrian respectively; and fusing the global feature of the pedestrian image and the second local feature map to generate a final feature map. And re-identifying the pedestrian in the pedestrian image according to the final feature map. The present disclosure also provides a pedestrian re-identification system, an electronic device, and a computer-readable storage medium.

Description

Pedestrian re-identification method, system, electronic device and medium

Technical Field

The present disclosure relates to the field of computer vision and pattern recognition, and in particular, to a pedestrian re-recognition method, system, electronic device, and medium.

Background

Pedestrian re-identification belongs to a sub-problem in the field of image retrieval. Given a pedestrian image, the pedestrian re-recognition task aims to find the pedestrian image in other scenes. Due to differences in light, viewing angle and pose, the same pedestrian may have a distinct appearance, while different pedestrians may look very similar. Therefore, it is important to learn a pedestrian feature that can effectively recognize similar pedestrians and exhibit sufficient robustness against posture and environmental changes.

The existing pedestrian re-identification technology is roughly divided into three categories: a pedestrian re-identification method based on supervised learning, semi-supervised learning and unsupervised learning. Among these, component-based supervised learning approaches are particularly popular because of their robustness to pose and perspective changes, fast training speed, and relatively high performance. Recently, some researchers train a human body semantic segmentation network in an end-to-end manner to obtain local features of pedestrians, but pixel-level noise inside a region still needs to be suppressed. To solve this problem, a pixel-level attention selection module is required. General attention models typically generate an attention map from a global image, taking into account only the spatial correlation between local features.

Disclosure of Invention

technical problem to be solved

In view of the foregoing technical problems, the present disclosure provides a pedestrian re-identification method, system, electronic device and medium, which are used to at least partially solve the above technical problems.

(II) technical scheme

One aspect of the present disclosure provides a pedestrian re-identification method, including: extracting global features of the pedestrian image; generating semantic segmentation result graphs respectively corresponding to the pedestrian foreground and different body parts of the pedestrian according to the pedestrian image; calculating the semantic segmentation result graph and the global features to obtain first local feature graphs corresponding to the foreground of the pedestrian and different body parts of the pedestrian respectively; performing pixel-level attention generation and feature extraction on the first local feature map to obtain second local feature maps corresponding to the foreground of the pedestrian image and different body parts of the pedestrian respectively; fusing the global feature of the pedestrian image and the second local feature map to generate a final feature map; and re-identifying the pedestrian in the pedestrian image according to the final feature map.

Optionally, the semantic segmentation result maps corresponding to the pedestrian foreground and different body parts of the pedestrian respectively include: a semantic segmentation result map corresponding to the whole body, a semantic segmentation result map corresponding to the head, a semantic segmentation result map corresponding to the upper half body, a semantic segmentation result map corresponding to the legs, and a semantic segmentation result map corresponding to the feet.

Optionally, the computing the global feature and the local feature includes: point-to-point product computation is performed on the global feature and each of the semantic division result map corresponding to the whole body, the semantic division result map corresponding to the head, the semantic division result map corresponding to the upper body, the semantic division result map corresponding to the leg, and the semantic division result map corresponding to the foot.

Optionally, the performing pixel-level attention generation and feature extraction on the first local feature map includes: performing pixel-level attention generation on the first local feature map to obtain a pixel-level attention map corresponding to the first local feature map; and performing point-to-point product operation on the pixel-level attention map and the first local feature map to obtain the second local feature map.

optionally, the re-identifying the pedestrian in the pedestrian image according to the final feature map includes: calculating the Euclidean distance between the pedestrian to be identified and each pedestrian feature in the retrieval pedestrian library according to the final feature map; and re-identifying the pedestrian to be identified according to the Euclidean distance.

Optionally, the re-identifying the pedestrian to be identified according to the euclidean distance includes: carrying out ascending arrangement on the Euclidean distances; and matching the pedestrian with the pedestrian to be identified, wherein the pedestrian is corresponding to the Euclidean distance with the top rank.

Optionally, the fusing the global feature of the pedestrian image and the second local feature map to generate a final feature map includes: converting the global features into first feature vectors, converting second local feature maps corresponding to the foreground of the pedestrian images into second feature vectors, and converting second local feature maps corresponding to different body parts of pedestrians into third feature vectors; fusing the first feature vector, the second feature vector and the third feature vector to obtain the final feature map; converting the global features into first feature vectors in an average pooling mode, and converting second local feature maps corresponding to the foreground of the pedestrian images into second feature vectors; and converting the second local feature maps corresponding to different body parts of the pedestrian into third feature vectors by adopting a maximum pooling fusion mode.

Another aspect of the present disclosure provides a pedestrian re-recognition system, including: the local attention module is used for extracting local features of a pedestrian image, generating semantic segmentation result graphs corresponding to a pedestrian image foreground and different body parts of a pedestrian according to the pedestrian image, and calculating the semantic segmentation result graphs and the global features to obtain first local feature graphs corresponding to the pedestrian foreground and the different body parts of the pedestrian respectively; the pixel attention module is used for performing pixel level attention generation and feature extraction on the first local feature map to obtain second local feature maps corresponding to the foreground of the pedestrian image and different body parts of the pedestrian respectively; the fusion module is used for fusing the global feature of the pedestrian image and the second local feature map to generate a final feature map; and the identification module is used for re-identifying the pedestrian in the pedestrian image according to the final characteristic diagram.

Another aspect of the present disclosure provides an electronic device including: one or more processors. A memory for storing one or more programs, wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method provided above.

Another aspect of the present disclosure provides a computer-readable storage medium storing computer-executable instructions for implementing the method provided above when executed.

Another aspect of the present disclosure provides a computer program comprising computer executable instructions for implementing the method provided above when executed.

(III) advantageous effects

The present disclosure provides a pedestrian re-identification method, system, electronic device, and medium, which first generate a coarse-grained weighted feature map of five parts, namely, a foreground, a head, an upper body, a leg, and a foot, according to a pedestrian image, thereby effectively filtering background noise and improving feature robustness; and then, the attention mechanism is utilized to highlight fine-grained information with more discriminative power in the local features and suppress irrelevant noise. By combining the two, the pedestrian can be re-identified more accurately.

Drawings

For a more complete understanding of the present disclosure and the advantages thereof, reference is now made to the following descriptions taken in conjunction with the accompanying drawings, in which:

FIG. 1 schematically illustrates a system architecture diagram of a pedestrian re-identification method and system in accordance with an embodiment of the present disclosure;

FIG. 2 schematically illustrates a flow chart of a pedestrian re-identification method according to an embodiment of the present disclosure;

FIG. 3 schematically illustrates a flow chart of a feature fusion method according to an embodiment of the present disclosure;

FIG. 4 schematically illustrates a block diagram of a pedestrian re-identification system, in accordance with an embodiment of the present disclosure; and

fig. 5 schematically shows a block diagram of an electronic device according to an embodiment of the disclosure.

Detailed Description

hereinafter, embodiments of the present disclosure will be described with reference to the accompanying drawings. It should be understood that the description is illustrative only and is not intended to limit the scope of the present disclosure. In the following detailed description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the embodiments of the disclosure. It may be evident, however, that one or more embodiments may be practiced without these specific details. Moreover, in the following description, descriptions of well-known structures and techniques are omitted so as to not unnecessarily obscure the concepts of the present disclosure.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. The terms "comprises," "comprising," and the like, as used herein, specify the presence of stated features, steps, operations, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, or components.

All terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art unless otherwise defined. It is noted that the terms used herein should be interpreted as having a meaning that is consistent with the context of this specification and should not be interpreted in an idealized or overly formal sense.

Where a convention analogous to "at least one of A, B and C, etc." is used, in general such a construction is intended in the sense one having skill in the art would understand the convention (e.g., "a system having at least one of A, B and C" would include but not be limited to systems that have a alone, B alone, C alone, a and B together, a and C together, B and C together, and/or A, B, C together, etc.). Where a convention analogous to "A, B or at least one of C, etc." is used, in general such a construction is intended in the sense one having skill in the art would understand the convention (e.g., "a system having at least one of A, B or C" would include but not be limited to systems that have a alone, B alone, C alone, a and B together, a and C together, B and C together, and/or A, B, C together, etc.).

some block diagrams and/or flow diagrams are shown in the figures. It will be understood that some blocks of the block diagrams and/or flowchart illustrations, or combinations thereof, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing system, such that the instructions, which execute via the processor, create a system that implements the functions/acts specified in the block diagrams and/or flowchart block or blocks. The techniques of this disclosure may be implemented in hardware and/or software (including firmware, microcode, etc.). In addition, the techniques of this disclosure may take the form of a computer program product on a computer-readable storage medium having instructions stored thereon for use by or in connection with an instruction execution system.

the embodiment of the disclosure provides a pedestrian re-identification method and a pedestrian re-identification system capable of applying the method. The method includes extracting global features of a pedestrian image. And generating semantic segmentation result graphs respectively corresponding to the foreground of the pedestrian image and different body parts of the pedestrian according to the pedestrian image. And performing operation on the semantic segmentation result graph and the global features to obtain first local feature graphs respectively corresponding to the pedestrian foreground and different body parts of the pedestrian. And performing pixel attention generation and feature extraction on the first feature map to obtain second local feature maps corresponding to the foreground of the pedestrian image and different body parts of the pedestrian respectively. And fusing the global feature and the second local feature map to obtain a final feature map. And re-identifying the pedestrians in the pedestrian image according to the final feature map.

Fig. 1 schematically illustrates a system architecture 100 of a pedestrian re-identification method and system according to an embodiment of the present disclosure. It should be noted that fig. 1 is only an example of a system architecture to which the embodiments of the present disclosure may be applied to help those skilled in the art understand the technical content of the present disclosure, and does not mean that the embodiments of the present disclosure may not be applied to other devices, systems, environments or scenarios.

As shown in fig. 1, a system architecture 100 according to this embodiment may include a database 101, a network 102, and a server 103. Network 102 is used to provide communication links between database 101 and server 103.

Various pedestrian images may be stored on the database 101. Network 102 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few. The server 103 may be a server that provides pedestrian re-identification. According to the embodiment of the disclosure, the server 103 acquires the pedestrian images stored in the database 101 through the network 102, and extracts the global features of the pedestrian images. The server 103 may perform local attention generation on the pedestrian image to obtain a pedestrian panorama and a semantic segmentation result map corresponding to each part of the pedestrian body, perform operation on the global feature and the semantic segmentation result map to obtain a weighted feature map, and perform pixel attention generation and further feature extraction on the weighted feature map to further improve the discrimination of the local features of each part, thereby more accurately realizing pedestrian re-identification.

it should be noted that the pedestrian re-identification method provided by the embodiment of the present disclosure may be executed by the server 103. Accordingly, the pedestrian re-identification system provided by the embodiment of the present disclosure may be disposed in the server 103. Alternatively, the pedestrian re-identification method provided by the embodiment of the present disclosure may also be executed by a server or a server cluster that is different from the server 103 and is capable of communicating with the database 101 and/or the server 103. Accordingly, the pedestrian re-identification system provided by the embodiment of the present disclosure may also be disposed in a server or a server cluster different from the server 103 and capable of communicating with the database 101 and/or the server 103.

It should be understood that the number of databases, networks, and servers in fig. 1 are merely illustrative. There may be any number of databases, networks, and servers, as desired for implementation.

Fig. 2 schematically shows a flow chart of a pedestrian re-identification method according to an embodiment of the present disclosure.

As shown in fig. 2, the pedestrian re-identification method of the embodiment of the present disclosure may include, for example, operations S201 to S204.

In operation S201, a global feature of a pedestrian image is extracted.

in order to generate efficient pedestrian features from coarse to fine, it is necessary to extract the abstract global features in the pedestrian image for subsequent processing.

In operation S202, semantic segmentation result maps corresponding to a foreground of the pedestrian and different body parts of the pedestrian are generated according to the image of the pedestrian. The semantic segmentation result graph can be a semantic segmentation result graph corresponding to the whole body, a semantic segmentation result graph corresponding to the head, a semantic segmentation result graph corresponding to the upper half body, a semantic segmentation result graph corresponding to the legs and a semantic segmentation result graph corresponding to the feet.

In operation S203, the semantic segmentation result map and the global features are computed to obtain first local feature maps corresponding to the foreground of the pedestrian and different body parts of the pedestrian, respectively.

Specifically, point-to-point product operation is performed on the semantic segmentation result map corresponding to the whole body, the semantic segmentation result map corresponding to the head, the semantic segmentation result map corresponding to the upper half body, the semantic segmentation result map corresponding to the leg, the semantic segmentation result map corresponding to the foot, and the global features extracted in operation S201, respectively, to obtain the first local feature map.

In operation S204, pixel-level attention generation and feature extraction are performed on the first local feature map to obtain second local feature maps corresponding to the foreground of the pedestrian image and different body parts of the pedestrian, so as to further highlight the distinctive features in the local region and suppress irrelevant features.

Specifically, pixel level attention generation: and performing pixel-level attention generation on the first local feature map to obtain a pixel-level attention map corresponding to the first local feature map.

feature extraction: and performing point-to-point product operation on the pixel-level attention diagram and the first local feature diagram to obtain a second local feature diagram.

In operation S205, the global feature of the pedestrian image is fused with the second local feature map, and a final feature map is generated.

Based on operation S205, the global features and the local features of the pedestrian image may be more fully considered to re-identify the pedestrian in the pedestrian image more accurately.

In operation S206, the pedestrian in the pedestrian image is re-identified according to the final feature map.

Specifically, according to the features of the pedestrians contained in the final feature map, the Euclidean distance between the pedestrian to be recognized and each pedestrian feature in the search pedestrian library is calculated, and the pedestrian to be recognized is re-recognized according to the Euclidean distance. In the embodiment of the present disclosure, the obtained euclidean distances may be arranged in an ascending order, and the pedestrians in the search pedestrian library corresponding to the euclidean distance with the first rank or the top rank are matched with the to-be-identified pedestrians, so as to re-identify the to-be-identified pedestrians. The Euclidean distances can also be arranged in a descending order, and the pedestrians in the retrieval pedestrian library corresponding to the Euclidean distances with the first rank or the second rank are matched with the pedestrians to be identified so as to re-identify the pedestrians to be identified. The specific arrangement is not limiting. The smaller Euclidean distance is, the higher the matching degree of the pedestrian corresponding to the smaller Euclidean distance and the pedestrian to be identified is, the better the effect of the depth feature of the learned pedestrian image on pedestrian re-identification is.

fig. 3 schematically illustrates a flow chart of a feature fusion method according to an embodiment of the present disclosure.

As shown in fig. 3, the feature fusion method of the embodiment of the present disclosure may include, for example, operations S301 to S302.

In operation S301, the global feature is converted into a first feature vector, a second local feature map corresponding to a foreground of the pedestrian image is converted into a second feature vector, and the second local feature maps corresponding to different body parts of the pedestrian are converted into third feature vectors.

Specifically, the global feature extracted in operation S201 is compressed into a first feature vector through an average pooling mapping, and the first feature vector may be, for example, a 2048-dimensional vector, which is not limited by the present invention. This first feature vector may well convey the overall abstract features.

The first local feature maps (for example, weighted feature maps corresponding to the head feature, the upper body feature, the leg feature, and the foot feature) corresponding to different body parts of the pedestrian obtained in the maximum pooling fusion operation S203 are obtained to obtain second feature vectors. The second feature vector may be, for example, a 2048-dimensional vector, which is not limiting in the present invention. The second feature vector incorporates the most discriminating feature of the four body parts.

And averagely pooling a second local feature map corresponding to the foreground of the pedestrian image into a third feature vector. The third feature vector may be, for example, a 2048-dimensional vector, which is not limited by the present invention.

in operation S302, the first feature vector, the second feature vector, and the third feature vector are fused to obtain a final feature map. The final feature map may be, for example, a 3 × 2048-dimensional vector, which is not limited by the present invention. The final feature map can be used for uniformly and effectively representing the features of the pedestrians.

Fig. 4 schematically illustrates a block diagram of a pedestrian re-identification system according to an embodiment of the present disclosure. The system may perform the pedestrian re-identification method described above.

as shown in fig. 4, the pedestrian re-identification system 400 of the embodiment of the present disclosure may include, for example, a local attention module 410, a pixel attention module 420, a fusion module 430, and an identification module 440.

The local attention module 410 is configured to extract local features of the pedestrian image, generate semantic segmentation result maps corresponding to a foreground of the pedestrian image and different body parts of the pedestrian according to the pedestrian image, and perform an operation on the semantic segmentation result maps and the global features to obtain first local feature maps corresponding to the foreground of the pedestrian and the different body parts of the pedestrian.

the present embodiment implements a coarse-grained local attention module by constructing a human semantic segmentation network. To improve the resolution, the step size of the last dimension reduction module in the inclusion-V3 in the local attention module 410 can be reduced from 2 to 1, and the same is done for the last transition layer in the densenert 121. This doubles the resolution. Because of the reduction of the step size, redundant computation is generated, and the embodiment of the disclosure can adopt the hole convolution to solve the problem.

in the disclosed embodiment, the local attention module 410 employs global average pooling, and the last layer of the network may be a 1 × 1 convolutional classifier.

The local attention module 410 may output five segmentation maps corresponding to the pedestrian foreground (whole body), head, upper body, legs and feet, respectively. Specifically, the local attention module 410 includes two branches, an upper branch for global feature extraction of the pedestrian image and a lower branch for generating a semantic segmentation result map (pedestrian foreground, head, upper body, legs and feet). The semantic segmentation result graph and the global feature are subjected to point-to-point product operation to form first local feature graphs (weighted feature graphs) corresponding to different parts of the foreground and the body of the pedestrian, which are respectively represented as S1, S2, S3, S4 and S5.

The pixel attention module 420 is configured to perform pixel-level attention generation and feature extraction on the first local feature map to obtain second local feature maps corresponding to the foreground of the pedestrian image and different body parts of the pedestrian.

After the pixel attention module 420 is placed on the local weighted features, the coarse-grained first local feature map is sent to the pixel attention module 420 for pixel-level attention generation and more discriminative feature extraction, and second local feature maps corresponding to different body parts of the human image foreground and the pedestrian are obtained, so that discriminative features in the local area are further highlighted, and irrelevant features are suppressed. It includes a spatial attention feature branch and a channel attention feature branch. The space attention characteristic module is sequentially connected with the global cross-channel average pooling layer, the convolution layer and the bilinear interpolation layer. The channel attention feature module includes a global average pooling layer and two convolutional layers.

The fusion module 430 is configured to convert the global foreground features into a first feature vector (S6), convert a second local feature map (corresponding to the first local feature map S1) corresponding to a foreground of the pedestrian image into a second feature vector, convert second local feature maps (corresponding to the first local feature maps S2, S3, S4, and S5) corresponding to different body parts of the pedestrian respectively into a third feature vector, and fuse the first feature vector, the second feature vector, and the third feature vector to obtain a final feature map so as to characterize the pedestrian.

and the identification module 440 is used for re-identifying the pedestrian in the pedestrian image according to the final feature map.

Specifically, the recognition module 440 calculates an euclidean distance between the pedestrian to be recognized and each pedestrian feature in the search pedestrian library according to the features of the pedestrian included in the final feature map, and re-recognizes the pedestrian to be recognized according to the euclidean distance.

In the embodiment of the disclosure, the pedestrian re-recognition system 400 may be regarded as a cascade attention network, and in the training phase of the pedestrian re-recognition system 400, the process may be as follows:

The pedestrian picture size is first scaled to 748 x 246. The local attention module 410 is first pre-trained using a set of LIP, which is a set of data dedicated to image segmentation, containing approximately 30000 pictures with 20 pixel-level semantic tags on each picture. Then, for the whole pedestrian re-identification system 400, training is iterated 60000 times, and 13 pictures are input in each iteration.

In the disclosed embodiment, triple losses may be employed to supervise the training of the entire pedestrian re-identification system 400. The core idea of this loss is to separate the unmatched pedestrian pairs from the matched pedestrian pairs by distance separation to increase the inter-class differences and reduce the intra-class differences.

Specifically, as shown in the above formula, for each training batch, P different pedestrians are randomly selected, and K different pictures are selected for each person, that is, one batch contains P × K pictures. For the pictures a in the same batch, selecting a least similar positive sample, namely the picture farthest away in Euclidean space, and marking as p; and the most similar negative sample, i.e. the picture closest to the Euclidean space, is marked as n. a, p, n constitute a triplet. α is a distance interval, which may be set to 0.1, for example. The image set A contains all pictures with the same ID as a, and the rest pictures are in the image set B.

When the trained pedestrian re-recognition system 400 is used for re-recognition of pedestrians, a pedestrian picture library to be inquired is input, the pedestrian re-recognition system 400 outputs the previous layer of the final classification layer as the characteristics of the pedestrians, and the Euclidean distance between the pedestrians to be inquired and each pedestrian characteristic in the retrieval pedestrian library is calculated. And finally, the calculated distances are arranged in an ascending order, and the higher the Rank-1 and the pedestrian matching rate in the front order is, the better the effect of the learned depth features on the pedestrian re-identification task is.

It should be noted that, the embodiment of the system part is similar to that of the method part, and the achieved technical effects are also similar, which are not described herein again.

any of the modules according to embodiments of the present disclosure, or at least part of the functionality of any of them, may be implemented in one module. Any one or more of the modules according to the embodiments of the present disclosure may be implemented by being split into a plurality of modules. Any one or more of the modules according to embodiments of the present disclosure may be implemented at least in part as a hardware circuit, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system on a chip, a system on a substrate, a system on a package, an Application Specific Integrated Circuit (ASIC), or may be implemented in hardware or firmware in any other reasonable manner of integrating or packaging a circuit, or in any one of three implementations, or in any suitable combination of any of the software, hardware, and firmware. Alternatively, one or more of the modules according to embodiments of the disclosure may be implemented at least partly as computer program modules which, when executed, may perform corresponding functions.

For example, any of the local attention module 410, the pixel attention module 420, the fusion module 430, and the identification module 440 may be combined in one module to be implemented, or any one of the modules may be split into a plurality of modules. Alternatively, at least part of the functionality of one or more of these modules may be combined with at least part of the functionality of the other modules and implemented in one module. According to an embodiment of the present disclosure, at least one of the local attention module 410, the pixel attention module 420, the fusion module 430, and the identification module 440 may be implemented at least partially as a hardware circuit, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system on a chip, a system on a substrate, a system on a package, an Application Specific Integrated Circuit (ASIC), or may be implemented in hardware or firmware in any other reasonable manner of integrating or packaging a circuit, or in any one of three implementations of software, hardware, and firmware, or in any suitable combination of any of them. Alternatively, at least one of the local attention module 410, the pixel attention module 420, the fusion module 430 and the identification module 440 may be at least partially implemented as a computer program module which, when executed, may perform a corresponding function.

Fig. 5 schematically shows a block diagram of an electronic device according to an embodiment of the disclosure. The electronic device shown in fig. 5 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.

As shown in fig. 5, the electronic device 500 includes a processor 510, a computer-readable storage medium 520. The electronic device 500 may perform a method according to an embodiment of the present disclosure.

In particular, processor 510 may include, for example, a general purpose microprocessor, an instruction set processor and/or related chip set and/or a special purpose microprocessor (e.g., an Application Specific Integrated Circuit (ASIC)), and/or the like. The processor 510 may also include on-board memory for caching purposes. Processor 510 may be a single processing unit or a plurality of processing units for performing different actions of a method flow according to embodiments of the disclosure.

Computer-readable storage media 520, for example, may be non-volatile computer-readable storage media, specific examples including, but not limited to: magnetic storage systems, such as magnetic tape or Hard Disk Drives (HDDs); optical storage systems, such as compact discs (CD-ROMs); memory such as Random Access Memory (RAM) or flash memory, etc.

The computer-readable storage medium 520 may include a computer program 521, which computer program 521 may include code/computer-executable instructions that, when executed by the processor 510, cause the processor 510 to perform a method according to an embodiment of the disclosure, or any variation thereof.

The computer program 521 may be configured with, for example, computer program code comprising computer program modules. For example, in an example embodiment, code in computer program 521 may include one or more program modules, including for example 521A, modules 521B, … …. It should be noted that the division and number of modules are not fixed, and those skilled in the art may use suitable program modules or program module combinations according to actual situations, and when these program modules are executed by the processor 510, the processor 510 may execute the method according to the embodiment of the present disclosure or any variation thereof.

At least one of the local attention module 410, the pixel attention module 420, the fusion module 430 and the identification module 440 according to embodiments of the present disclosure may be implemented as a computer program module as described with reference to fig. 5, which, when executed by the processor 510, may implement the respective operations described above.

The present disclosure also provides a computer-readable storage medium, which may be included in the device/system described in the above embodiments, or may exist separately without being assembled into the device/system. The computer-readable storage medium carries one or more programs which, when executed, implement the method according to an embodiment of the disclosure.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

It will be understood by those skilled in the art that while the present disclosure has been shown and described with reference to certain exemplary embodiments thereof, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the present disclosure as defined by the appended claims and their equivalents. Accordingly, the scope of the present disclosure should not be limited to the above-described embodiments, but should be defined not only by the appended claims, but also by equivalents thereof.

The above-mentioned embodiments are intended to illustrate the objects, aspects and advantages of the present disclosure in further detail, and it should be understood that the above-mentioned embodiments are only illustrative of the present disclosure and are not intended to limit the present disclosure, and any modifications, equivalents, improvements and the like made within the spirit and principle of the present disclosure should be included in the scope of the present disclosure.

Claims

1. A pedestrian re-identification method, comprising:

Extracting global features of the pedestrian image;

Generating semantic segmentation result graphs respectively corresponding to the pedestrian foreground and different body parts of the pedestrian according to the pedestrian image;

Calculating the semantic segmentation result graph and the global features to obtain first local feature graphs corresponding to the foreground of the pedestrian and different body parts of the pedestrian respectively;

performing pixel-level attention generation and feature extraction on the first local feature map to obtain second local feature maps corresponding to the foreground of the pedestrian image and different body parts of the pedestrian respectively;

Fusing the global feature of the pedestrian image and the second local feature map to generate a final feature map;

And re-identifying the pedestrians in the pedestrian image according to the final feature map.

2. The method according to claim 1, wherein the semantic segmentation result maps respectively corresponding to the pedestrian foreground and different body parts of the pedestrian comprise:

a semantic segmentation result map corresponding to the whole body, a semantic segmentation result map corresponding to the head, a semantic segmentation result map corresponding to the upper half body, a semantic segmentation result map corresponding to the legs, and a semantic segmentation result map corresponding to the feet.

3. the method of claim 2, wherein the operating on the semantic segmentation result graph and the global features comprises:

And respectively carrying out point-to-point product operation on the semantic segmentation result graph corresponding to the whole body, the semantic segmentation result graph corresponding to the head, the semantic segmentation result graph corresponding to the upper half body, the semantic segmentation result graph corresponding to the leg, the semantic segmentation result graph corresponding to the foot and the global features.

4. the method of claim 1, wherein the pixel-level attention generation and feature extraction of the first local feature map comprises:

performing pixel-level attention generation on the first local feature map to obtain a pixel-level attention map corresponding to the first local feature map;

And performing point-to-point product operation on the pixel-level attention diagram and the first local feature diagram to obtain the second local feature diagram.

5. The method of claim 1, wherein the re-identifying the pedestrian in the pedestrian image according to the final feature map comprises:

Calculating the Euclidean distance between the pedestrian to be identified and each pedestrian feature in the retrieval pedestrian library according to the final feature map;

And re-identifying the pedestrian to be identified according to the Euclidean distance.

6. the method of claim 5, wherein the re-identifying the pedestrian to be identified according to the Euclidean distance comprises:

Carrying out ascending arrangement on the Euclidean distances;

And matching the pedestrian with the pedestrian to be identified, wherein the pedestrian is corresponding to the Euclidean distance with the top rank.

7. The method according to claim 1, wherein the fusing the global feature of the pedestrian image with the second local feature map to generate a final feature map comprises:

Converting the global features into first feature vectors, converting second local feature maps corresponding to the foreground of the pedestrian image into second feature vectors, and converting second local feature maps corresponding to different body parts of the pedestrian into third feature vectors;

Fusing the first feature vector, the second feature vector and the third feature vector to obtain the final feature map;

Converting the global features into first feature vectors in an average pooling mode, and converting second local feature maps corresponding to the foreground of the pedestrian images into second feature vectors;

and converting the second local feature maps corresponding to different body parts of the pedestrian into third feature vectors by adopting a maximum pooling fusion mode.

8. a pedestrian re-identification system comprising:

The local attention module is used for extracting local features of a pedestrian image, generating semantic segmentation result graphs corresponding to a pedestrian image foreground and different body parts of a pedestrian according to the pedestrian image, and calculating the semantic segmentation result graphs and the global features to obtain first local feature graphs corresponding to the pedestrian foreground and the different body parts of the pedestrian respectively;

The pixel attention module is used for performing pixel-level attention generation and feature extraction on the first local feature map to obtain second local feature maps corresponding to the foreground of the pedestrian image and different body parts of the pedestrian respectively;

The fusion module is used for fusing the global feature of the pedestrian image and the second local feature map to generate a final feature map;

And the identification module is used for re-identifying the pedestrian in the pedestrian image according to the final feature map.

9. an electronic device, comprising:

One or more processors;

A memory for storing one or more programs,

Wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method of any of claims 1-7.

10. A computer-readable storage medium storing computer-executable instructions for implementing the method of any one of claims 1 to 7 when executed.