CN114550091A

CN114550091A - Unsupervised pedestrian re-identification method and unsupervised pedestrian re-identification device based on local features

Info

Publication number: CN114550091A
Application number: CN202210170514.6A
Authority: CN
Inventors: 苏照阳; 李凡平; 石柱国
Original assignee: ISSA Technology Co Ltd
Current assignee: ISSA Technology Co Ltd
Priority date: 2022-02-24
Filing date: 2022-02-24
Publication date: 2022-05-27

Abstract

The invention provides an unsupervised pedestrian re-identification method and device based on local features, which relate to the technical field of video monitoring and comprise the following steps: inputting at least one target pedestrian image into a pre-training backbone network, and performing feature extraction on the target pedestrian image to obtain an overall feature map corresponding to the target pedestrian image; cutting the overall characteristic diagram into a preset number of local characteristic diagrams according to a preset proportion, and respectively converting the overall characteristic diagram and the local characteristic diagrams into a local characteristic vector and an overall characteristic vector; clustering the local feature vectors and the overall feature vectors in corresponding feature vector spaces respectively, and outputting target overall features and target local features; splicing the overall target characteristics and the local target characteristics, and calculating the characteristic similarity of the spliced characteristics; according to the comparison result of the feature similarity and the similarity threshold, the target pedestrian is re-identified, so that the technical problem of complicated sample identification operation is solved, and a relatively accurate pedestrian re-identification effect can be realized.

Description

Unsupervised pedestrian re-identification method and unsupervised pedestrian re-identification device based on local features

Technical Field

The invention relates to the technical field of video monitoring, in particular to an unsupervised pedestrian re-identification method and device based on local features.

Background

Currently, analysis is generally performed on pedestrians recorded by monitoring videos so as to be applied to the fields of security protection, monitoring and the like. The common pedestrian re-recognition method combines representation learning and metric learning, does not consider the similarity between pictures in the training process, and treats a pedestrian re-recognition task as a classification problem. The similarity of the two pictures is learned through the network in metric learning, and the similarity between different pictures of the same pedestrian is larger than that between different pictures of different pedestrians.

However, in the above-mentioned pedestrian re-identification supervision method, the pedestrian labels need to be labeled on all the training pictures, and each pedestrian picture is labeled on all the other pictures in the whole training set, so that the complexity is very high, and the labeling of the pedestrian re-identification pictures is very time-consuming and labor-consuming.

Disclosure of Invention

The invention aims to provide an unsupervised pedestrian re-identification method and device based on local features, so that the technical problem of complicated sample identification operation is solved, and a relatively accurate pedestrian re-identification effect can be realized.

In a first aspect, an embodiment of the present invention provides an unsupervised pedestrian re-identification method based on local features, including:

inputting at least one target pedestrian image into a pre-training backbone network, and performing feature extraction on the target pedestrian image to obtain an overall feature map corresponding to the target pedestrian image, wherein the target pedestrian image is label-free data;

cutting the overall characteristic diagram into a preset number of local characteristic diagrams according to a preset proportion, and respectively converting the overall characteristic diagram and the local characteristic diagrams into a local characteristic vector and an overall characteristic vector;

clustering the local feature vectors and the overall feature vectors in corresponding feature vector spaces respectively, and outputting target overall features and target local features;

splicing the target overall characteristic and the target local characteristic, and calculating the characteristic similarity of the spliced characteristics;

and re-identifying the target pedestrian according to the comparison result of the feature similarity and the similarity threshold.

With reference to the first aspect, an embodiment of the present invention provides a first possible implementation manner of the first aspect, where the step of cutting the global feature map into a preset number of local feature maps according to a preset proportion, and converting the global feature map and the local feature maps into a local feature vector and a global feature vector, respectively, includes:

determining a preset proportion according to the target identification precision and/or the service requirement;

cutting the overall characteristic graphs into local characteristic graphs according to the preset proportion, wherein the number of the local characteristic graphs corresponding to the overall characteristic graphs is a preset number, and each preset proportion corresponds to one preset number;

and respectively converting the local feature map and the overall feature map into a local feature vector and an overall feature vector according to the global average pooling layer of the pre-training backbone network.

With reference to the first aspect, an embodiment of the present invention provides a second possible implementation manner of the first aspect, where the step of cutting the overall feature maps into partial feature maps according to the preset proportion includes:

if the preset number is two, equally dividing the integral feature map into a top feature map and a bottom feature map by taking the height of the integral feature map as a reference;

and if the preset number is five, equally dividing the overall feature map into a first local feature map, a second local feature map, a third local feature map, a fourth local feature map and a fifth local feature map by taking the height of the overall feature map as a reference.

With reference to the first aspect, an embodiment of the present invention provides a third possible implementation manner of the first aspect, where the step of clustering the local feature vectors and the global feature vectors in corresponding feature vector spaces, and outputting target global features and target local features includes:

clustering each local feature vector in a first feature vector space respectively corresponding to the local feature vectors according to a density-based clustering algorithm, and outputting a target local feature corresponding to each local feature vector;

and clustering the whole feature vectors in the corresponding second feature vector space according to a density-based clustering algorithm, and outputting the target whole features.

With reference to the first aspect, an embodiment of the present invention provides a fourth possible implementation manner of the first aspect, where the step of splicing the target global feature and the target local feature includes:

and performing serial splicing on the target overall characteristic and the target local characteristic.

With reference to the first aspect, an embodiment of the present invention provides a fifth possible implementation manner of the first aspect, where the step of calculating the feature similarity of the spliced features includes:

if the overall feature map is equally divided into a top feature map and a bottom feature map, calculating the feature similarity of the spliced features according to the following formula:

d_t＝sim(f_t,f_t')

d_up＝sim(f_up,f'_up)

d_down＝sim(f_down,f'_down)

f＝{f_t,f_up,f_down}

f'＝{f_t',f'_up,f'_down}

D(f,f')＝(α+β)d_t+αd_up+βd_down

wherein d is_tAs global feature similarity, d_upAs top feature similarity, d_downIs the bottom feature similarity, sim represents the cosine similarity, D is the feature similarity of the features after splicing,

the top feature similarity weight is represented and,

representing the bottom feature similarity weight.

With reference to the first aspect, an embodiment of the present invention provides a sixth possible implementation manner of the first aspect, where before the step of inputting the at least one target pedestrian image into the pre-training backbone network, the method further includes:

pre-training the backbone network;

and respectively calculating nonparametric classification loss of the local characteristic vector and the overall characteristic vector corresponding to each first pedestrian image in corresponding characteristic vector spaces, and updating the network parameters of the backbone network and the local characteristic vector and the overall characteristic vector in the storage unit until all the first pedestrian images are traversed.

In a second aspect, an embodiment of the present invention further provides an unsupervised pedestrian re-identification apparatus based on local features, where the apparatus includes:

the characteristic extraction module is used for inputting at least one target pedestrian image into a pre-training backbone network, and performing characteristic extraction on the target pedestrian image to obtain an overall characteristic diagram corresponding to the target pedestrian image, wherein the target pedestrian image is label-free data;

the cutting conversion module is used for cutting the overall characteristic diagram into a preset number of local characteristic diagrams according to a preset proportion, and respectively converting the overall characteristic diagram and the local characteristic diagrams into a local characteristic vector and an overall characteristic vector;

the clustering module is used for clustering the local feature vectors and the overall feature vectors in corresponding feature vector spaces respectively and outputting target overall features and target local features;

the splicing module is used for splicing the target overall characteristic and the target local characteristic and calculating the characteristic similarity of the spliced characteristics;

and the re-identification module is used for re-identifying the target pedestrian according to the comparison result of the feature similarity and the similarity threshold.

In a third aspect, an embodiment provides an electronic device, including a memory and a processor, where the memory stores a computer program operable on the processor, and the processor implements the steps of the method described in any one of the foregoing embodiments when executing the computer program.

In a fourth aspect, embodiments provide a machine-readable storage medium having stored thereon machine-executable instructions that, when invoked and executed by a processor, cause the processor to carry out the steps of the method of any preceding embodiment.

The embodiment of the invention provides an unsupervised pedestrian re-identification method and device based on local features, through adopting an unlabelled sample to carry out unsupervised pedestrian re-identification, the complex and tedious operation that each image label needs to be marked is saved, meanwhile, after the global features and the local features obtained by cutting the global features are spliced, the similarity is calculated, and the identification effect of unsupervised pedestrian re-identification is ensured.

Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and drawings.

In order to make the aforementioned and other objects, features and advantages of the present invention comprehensible, preferred embodiments accompanied with figures are described in detail below.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.

Fig. 1 is a flowchart of an unsupervised pedestrian re-identification method based on local features according to an embodiment of the present invention;

fig. 2 is a schematic application diagram of another unsupervised pedestrian re-identification method based on local features according to an embodiment of the present invention;

fig. 3 is a functional block diagram of an unsupervised pedestrian re-identification apparatus based on local features according to an embodiment of the present invention;

fig. 4 is a schematic diagram of a hardware architecture of an electronic device according to an embodiment of the present invention.

Detailed Description

To make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

With the continuous expansion of the scale of monitoring networks, the monitoring video data is in explosive growth. Effective video analysis, organization and management technologies are researched, the application requirements of integration, intellectualization and intelligence of massive videos are met, and the technology becomes a working key point in the field of security protection. In daily life, the travel track of pedestrians is recorded by the monitoring camera which is visible everywhere, and a large amount of video data resources are generated. For public security criminal investigation, security protection and intelligent monitoring, the analysis of pedestrians recorded by video monitoring is an effective and important auxiliary means.

However, if the videos are analyzed and compared one by one through the human eyes, a great amount of manpower and material resources are needed, and therefore the requirement for intelligent analysis and identification of the images and the videos through the computer vision technology is generated. The pedestrian re-identification is to search and match the pictures of the same pedestrian under different cameras: given a picture of a pedestrian to be retrieved, an effective pedestrian re-identification algorithm model needs to accurately find out in which pictures the pedestrian appears in the picture library. By searching the target pedestrian, the time and the place where the target pedestrian appears can be extracted, so that a basis is provided for subsequent tracking and positioning. The pedestrian re-identification model is learned based on data and an algorithm, so that the workload of manual search can be reduced to a great extent, and the efficiency and the speed of application of actual criminal investigation, security protection and the like are improved.

In order to solve the problem of high cost of the supervised pedestrian re-identification method, the unsupervised pedestrian re-identification method is provided. However, the inventor researches and discovers that the global features of pedestrians can not well distinguish pictures among different individuals, so that clustering by using the global features can cause many pictures of pedestrians in other categories in the same category, which can cause the generated pseudo labels to be inaccurate, and if the pseudo labels of the training pictures are inaccurate, the accuracy of the unsupervised re-recognition model can be reduced.

Based on this, the unsupervised pedestrian re-identification method and device based on the local features provided by the embodiment of the invention can solve the technical problem of complicated sample identification operation, and can also realize a relatively accurate pedestrian re-identification effect.

In order to facilitate understanding of the embodiment, a detailed description is first given of an unsupervised pedestrian re-identification method based on local features disclosed in the embodiment of the present invention.

Fig. 1 is a flowchart of an unsupervised pedestrian re-identification method based on local features according to an embodiment of the present invention.

Referring to fig. 1, the method may include the steps of:

step S102, inputting at least one target pedestrian image into a pre-training backbone network, and performing feature extraction on the target pedestrian image to obtain an overall feature map corresponding to the target pedestrian image.

Wherein the target pedestrian image is label-free data. For example, in practical applications, if it is necessary to re-identify a target pedestrian with a small pedestrian, one or more small pedestrian images are input into the pre-training backbone network. It is understood that the plurality of small bright pedestrian images may be at different angles of small brightness.

And step S104, cutting the overall characteristic diagram into a preset number of local characteristic diagrams according to a preset proportion, and converting the overall characteristic diagram and the local characteristic diagrams into a local characteristic vector and an overall characteristic vector respectively.

And cutting the overall characteristic diagram according to a preset proportion according to the preset target number of the local characteristic diagrams.

And S106, clustering the local feature vectors and the overall feature vectors in corresponding feature vector spaces respectively, and outputting the overall features and the local features of the targets.

It should be noted that, if the local feature vectors include a plurality of local feature vectors, each local feature vector needs to be clustered in its own feature vector space; that is, if the local feature vector 1 and the local feature vector 2 are included, the local feature vector 1 is clustered in its own feature vector space 1, and the local feature vector 2 is clustered in its own feature vector space 2.

And S108, splicing the overall target characteristics and the local target characteristics, and calculating the characteristic similarity of the spliced characteristics.

Here, the spliced feature vector includes both the overall feature and the local feature, and then a relatively accurate re-recognition effect can be achieved.

And step S110, re-identifying the target pedestrian according to the comparison result of the feature similarity and the similarity threshold.

It can be understood that the image library of the pre-training backbone network stores the image features of the target pedestrian in advance, so that the pedestrian re-identification function can be realized for the target pedestrian. If the feature similarity meets the requirement of the similarity threshold, the image where the target pedestrian is located or the time frame of video monitoring where the image is located can be determined, and the time and place where the target pedestrian appears are known, so that the effect of re-identifying the target pedestrian is achieved.

In the prior embodiment of practical application, the non-label sample is adopted for unsupervised pedestrian re-identification, so that the complex operation that each image label needs to be marked is saved, meanwhile, the overall characteristic and the local characteristic obtained by cutting the overall characteristic are spliced, the similarity is calculated, and the identification effect of unsupervised pedestrian re-identification is ensured.

In some embodiments, in order to ensure the recognition effect of the unsupervised pedestrian re-recognition method, step S104 further includes:

step 1.1), determining a preset proportion according to the target identification precision and/or the service requirement.

For example, if there are two local feature maps, that is, the current service requirement can be met, the corresponding preset proportion may be determined, or if the current requirement for the target identification accuracy is high, the overall feature map needs to be cut into five parts, and the corresponding preset proportion is determined. The predetermined ratio can be understood as a ratio of cutting the overall characteristic diagram, such as 1: 1 equal division, 1: 1 equal division, and the like.

And step 1.2) cutting the overall characteristic graphs into local characteristic graphs according to the preset proportion, wherein the number of the local characteristic graphs corresponding to the overall characteristic graphs is a preset number, and each preset proportion corresponds to one preset number.

Exemplarily, if the preset number is two, equally dividing the overall feature map into a top feature map and a bottom feature map by taking the height of the overall feature map as a reference; and if the preset number is five, equally dividing the overall feature map into a first local feature map, a second local feature map, a third local feature map, a fourth local feature map and a fifth local feature map by taking the height of the overall feature map as a reference.

It should be noted that the preset number may also be three, four, etc. And equally dividing the height of the overall characteristic diagram according to a preset proportion to obtain the local characteristic diagrams meeting the preset quantity.

And step 1.3), respectively converting the local feature map and the overall feature map into a local feature vector and an overall feature vector according to a global average pooling layer of the pre-training backbone network.

Illustratively, a feature map generated by unlabeled data through a backbone network is cut into an upper part and a lower part, and then the global feature map, a top feature map and a bottom feature map are converted into three feature vectors, namely a global feature vector, an upper half (top) feature vector and a lower half (bottom) feature vector, by using a global average pooling layer.

In some embodiments, the step S106 may be further implemented by the following steps, including:

and 2.1) clustering each local feature vector in the corresponding first feature vector space according to a density-based clustering algorithm DBSCAN, and outputting the target local feature corresponding to each local feature vector.

And 2.2) clustering the overall characteristic vectors in the corresponding second characteristic vector space according to a density-based clustering algorithm, and outputting the overall characteristics of the target.

Exemplarily, taking the preset number as two local features as an example, clustering three feature vectors, namely a global feature, a top feature and a bottom feature, in respective feature vector spaces by using a DBSCAN, allocating a pseudo tag to the feature according to a clustering result, and generating a storage unit.

In some embodiments, the step S108 further includes:

and 3.1) carrying out serial splicing on the target overall characteristic and the target local characteristic.

Step 3.2), if the overall characteristic diagram is equally divided into a top characteristic diagram and a bottom characteristic diagram, calculating the characteristic similarity of the spliced characteristics according to the following formula:

d_t＝sim(f_t,f_t')

d_up＝sim(f_up,f'_up)

d_down＝sim(f_down,f'_down)

f＝{f_t,f_up,f_down}

f'＝{f_t',f'_up,f'_down}

D(f,f')＝(α+β)d_t+αd_up+βd_down

the top feature similarity weight is represented and,

representing the bottom feature similarity weight.

In some embodiments, before the step S102 of inputting the at least one target pedestrian image into the pre-training backbone network, the method further includes:

and 3.1) pre-training the backbone network.

The backbone network used in the embodiment of the present invention is Resnet50, and the step size of the convolutional layer of the 4 th residual module group of the network is set to 2.

And 3.2) calculating nonparametric classification loss of the local characteristic vector and the integral characteristic vector corresponding to each first pedestrian image in corresponding characteristic vector spaces respectively, and updating the network parameters of the backbone network and the local characteristic vector and the integral characteristic vector in the storage unit until all the first pedestrian images are traversed.

Randomly sampling P first pedestrian images from a data set to form a query data set, generating features through a backbone network, calculating non-parameter classification loss updating network parameters in respective feature vector spaces, and updating the features in a storage unit at the same time, wherein the step needs to be repeated for multiple times until the P first pedestrian images are traversed;

in some embodiments, the DBSCAN clustering algorithm used in the foregoing steps in embodiments of the present invention may include:

step 4.1), sample set D ═ x₁,x₂,···,x_m}, initializing a core object set

Initializing cluster number k equal to 0, initializing sample set Γ equal to D, and cluster partitioning

Step 4.2), for j ═ 1,2, ·, m, the core object is found as follows:

a. sample x by distance metric_jNeighborhood sample set N_ε(x_j)＝{x_i∈D|dist(x_j,x_i)≤ε}。

b. If the number of the sub-sample set samples satisfies | N_ε(x_j) | ≧ MinPts, sample x_jAdding a core object sample set.

Step 4.3) if core object set

Ending the algorithm, otherwise, turning to the step 4.4);

step 4.4), in the core object set omega, randomly selecting a core object o, and initializing a current cluster number core object queue omega_curInitializing a class index k +1, and initializing a current cluster sample set C_kAnd f, updating the unvisited sample set f- (o).

Step 4.5), if the current cluster core object queue

Then the current cluster C is clustered_kAfter generation, the cluster partition C is updated to { C ═ C₁,C₂,···,C_kAnd updating a core object set omega-C_kGo to step 4.3). Otherwise, updating the core object set omega-C_k。

Step 4.6) in the current cluster core object queue omega_curTaking out a core object o', finding out all epsilon neighborhood sample sets N through a neighborhood distance threshold epsilon_ε(o') making Δ ═ N_ε(o') # Γ, updating the current cluster sampleSet C_k＝C_kU.DELTA.update omega_cur＝Ω_curU (. DELTA.andgate. OMEGA) -o' is transferred to step 4.5).

Wherein, the updating of the characteristics in the storage unit in step 4.4) is specifically represented as: the pedestrian category number in the data set is equal to C, the external storage unit is actually a feature matrix K with the size equal to d multiplied by C, wherein d is the feature dimension output by the network model f (-) and the storage unit feature is updated by adopting the following formula:

κ[i]←μκ[i]+(1-μ)f(x_j)

wherein, kappa [ i]Is the ith row of the storage unit κ, representing the ith class storage feature. f (x)_j) Is an input picture x_jFeatures extracted over the backbone network f (-) and x_jThe ith category is entered. μ is the feature update rate of the memory cell.

The non-parametric classification loss function in step 4.4) is specifically expressed as: the loss function is calculated by maximizing the similarity of the class in which the input image is located.

Where τ is a temperature function used to scale the similarity.

The method divides a feature map generated by a backbone network into an overall feature map, a top feature map and a bottom feature map, and is specifically represented as follows: the feature map generated by the backbone network is an overall feature map, the overall feature map is divided into two parts on the upper and lower average, the upper half layer is a top feature map, and the lower half layer is a bottom feature map; the global feature map is used for representing global features of the pedestrian, the top feature map is used for describing upper half body features of the pedestrian, and the bottom feature map is used for describing lower half body features of the pedestrian.

In the method, step 4.4) of calculating the non-parametric classification loss updating network parameters in the respective feature vector spaces, considering that the overall feature map includes the top feature map and the bottom feature map, the overall feature map has more comprehensive information, and therefore, when updating the network parameters, the learning rate of the overall feature map should be higher than that of the feature map and the bottom feature map.

lr_t＝lr_up+lr_down

Wherein, lr_tIs the learning rate of the overall feature map, lr_upIs the learning rate of the top feature map, lr_downThe learning rate of the bottom feature map.

The method provided by the embodiment of the invention is used for a pedestrian re-identification task, and an unsupervised mode is used for training the model, so that the cost of manual marking is greatly reduced. The backbone network model adopts a local attention module, divides a characteristic diagram into an overall characteristic diagram, a top characteristic diagram and a bottom characteristic diagram, respectively uses DBSCAN to cluster in three characteristic vector spaces, distributes a pseudo label for each pedestrian according to a clustering result, and updates network parameters; the local feature extraction capability of the network is improved and the pseudo tag accuracy is improved through local feature fusion. In addition, the nonparametric classification loss function used by the method reduces the network parameters in the training stage and improves the convergence speed of the model.

In the practical application process, as shown in fig. 2, feature extraction is performed on a training picture of a pedestrian a through a backbone network to obtain an overall feature map of the pedestrian a, top features and bottom features of the overall feature map are cut, the top feature map, the bottom feature map and the overall feature map are respectively converted into top feature vectors, bottom feature vectors and overall feature vectors by using a global average pooling layer, DBSCAN clustering is performed in each feature vector space, a pseudo label is allocated to each clustering result, and a storage unit is generated; on the basis of the flow of the method, P pictures of the pedestrian A are extracted from the query data set and input into the backbone network, non-parameter classification loss is calculated in the respective feature vector space through the steps of feature extraction, vector conversion and cluster calculation, the network parameters and the feature vectors in the storage unit are updated based on the loss until all the P pictures traverse, and the circulation is finished to obtain the trained backbone network.

The P value can be set according to the current service requirement, so that the updating training of the backbone network can be realized by selecting a proper number of pictures. Splicing the target overall characteristic vector and the target local characteristic vector output by the trained backbone network to obtain the characteristic similarity of the spliced vector; and if the feature similarity meets the similarity threshold, the backbone network is trained successfully.

As shown in fig. 3, an embodiment of the present invention further provides an unsupervised pedestrian re-identification apparatus based on local features, where the apparatus includes:

In this embodiment, the electronic device may be, but is not limited to, a Computer device with analysis and processing capabilities, such as a Personal Computer (PC), a notebook Computer, a monitoring device, and a server.

As an exemplary embodiment, referring to fig. 4, the electronic device 110 includes a communication interface 111, a processor 112, a memory 113, and a bus 114, wherein the processor 112, the communication interface 111, and the memory 113 are connected by the bus 114; the memory 113 is used for storing computer programs that support the processor 112 to execute the above-mentioned methods, and the processor 112 is configured to execute the programs stored in the memory 113.

A machine-readable storage medium as referred to herein may be any electronic, magnetic, optical, or other physical storage device that can contain or store information such as executable instructions, data, and the like. For example, the machine-readable storage medium may be: a RAM (random Access Memory), a volatile Memory, a non-volatile Memory, a flash Memory, a storage drive (e.g., a hard drive), any type of storage disk (e.g., an optical disk, a dvd, etc.), or similar storage medium, or a combination thereof.

The non-volatile medium may be non-volatile memory, flash memory, a storage drive (e.g., a hard drive), any type of storage disk (e.g., an optical disk, dvd, etc.), or similar non-volatile storage medium, or a combination thereof.

It can be understood that, for the specific operation method of each functional module in this embodiment, reference may be made to the detailed description of the corresponding step in the foregoing method embodiment, and no repeated description is provided herein.

The computer-readable storage medium provided in the embodiments of the present invention stores a computer program, and when the computer program code is executed, the method described in any of the above embodiments may be implemented, and specific implementation may refer to the method embodiment, which is not described herein again.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the system and the apparatus described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In addition, in the description of the embodiments of the present invention, unless otherwise explicitly specified or limited, the terms "mounted," "connected," and "connected" are to be construed broadly, e.g., as meaning either a fixed connection, a removable connection, or an integral connection; can be mechanically or electrically connected; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements. The specific meanings of the above terms in the present invention can be understood in specific cases to those skilled in the art.

In the description of the present invention, it should be noted that the terms "center", "upper", "lower", "left", "right", "vertical", "horizontal", "inner", "outer", etc., indicate orientations or positional relationships based on the orientations or positional relationships shown in the drawings, and are only for convenience of description and simplicity of description, but do not indicate or imply that the device or element being referred to must have a particular orientation, be constructed and operated in a particular orientation, and thus, should not be construed as limiting the present invention. Furthermore, the terms "first," "second," and "third" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.

Finally, it should be noted that: the above-mentioned embodiments are only specific embodiments of the present invention, which are used for illustrating the technical solutions of the present invention and not for limiting the same, and the protection scope of the present invention is not limited thereto, although the present invention is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: any person skilled in the art can modify or easily conceive the technical solutions described in the foregoing embodiments or equivalent substitutes for some technical features within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the embodiments of the present invention, and they should be construed as being included therein.

Claims

1. An unsupervised pedestrian re-identification method based on local features, characterized in that the method comprises the following steps:

2. The method according to claim 1, wherein the step of cutting the global feature map into a preset number of local feature maps according to a preset ratio and converting the global feature map and the local feature maps into a local feature vector and a global feature vector, respectively, comprises:

3. The method according to claim 2, wherein the step of cutting the global feature maps into local feature maps according to the preset proportion comprises:

4. The method according to claim 1, wherein the step of clustering the local feature vector and the global feature vector in a corresponding feature vector space respectively and outputting a target global feature and a target local feature comprises:

5. The method of claim 1, wherein the step of stitching the target global feature and the target local feature comprises:

6. The method of claim 3, wherein the step of calculating the feature similarity of the stitched features comprises:

d_t＝sim(f_t,f′_t)

d_up＝sim(f_up,f′_up)

d_down＝sim(f_down,f′_down)

f＝{f_t,f_up,f_down}

f'＝{f_t',f′_up,f′_down}

D(f,f')＝(α+β)d_t+αd_up+βd_down

the top feature similarity weight is represented and,

representing the bottom feature similarity weight.

7. The method of claim 1, wherein prior to the step of inputting the at least one target pedestrian image into the pre-trained backbone network, the method further comprises:

pre-training the backbone network;

8. An unsupervised pedestrian re-identification device based on local features, the device comprising:

9. An electronic device comprising a memory, a processor, and a program stored on the memory and executable on the processor, the processor implementing the method of any one of claims 1 to 7 when executing the program.

10. A computer-readable storage medium, characterized in that a computer program is stored in the readable storage medium, which computer program, when executed, implements the method of any of claims 1-7.