CN112823356A

CN112823356A - Pedestrian re-identification method, device and system

Info

Publication number: CN112823356A
Application number: CN202080003202.5A
Authority: CN
Inventors: 赛义德·穆罕默德·阿德南; 崔勇; 谢丰隆
Original assignee: Konka Group Co Ltd
Current assignee: Konka Group Co Ltd
Priority date: 2020-12-04
Filing date: 2020-12-04
Publication date: 2021-05-18
Anticipated expiration: 2040-12-04
Also published as: CN112823356B; WO2022116135A1

Abstract

The application discloses a pedestrian re-identification method, a device and a system, wherein the method comprises the following steps: collecting target images of each pedestrian under the two cameras, and forming a training sample and a test sample according to the collected target images; segmenting the training sample according to the modes to obtain segmentation data corresponding to each mode; grouping is carried out on each segmentation data, and a distance metric corresponding to each modality is obtained based on the grouped data. The method and the device have the advantages that the distance measurement is learned in the open set, the distance measurement can find the matching of the query in the complex nonlinear model existing in the open set, the target user can be quickly found through the obtained distance measurement, and the better accuracy is obtained in the field of pedestrian re-identification.

Description

Pedestrian re-identification method, device and system

Technical Field

The present application relates to the field of video surveillance, and in particular, to a method, an apparatus, and a system for re-identifying pedestrians.

Background

Currently, the method for re-identifying pedestrians in an open set is to learn distance measurement by using public closed set data sets, which are very different from the open set data sets, and ignore some open set situations, that is, the observed people are partially overlapped or not overlapped at all under different camera angles. Therefore, the prior art fails to address the metric of open set pedestrian re-identification.

In the prior art, these metrics are learned using a closed set pedestrian re-identification dataset, which is completely different from the closed set pedestrian re-identification dataset, where there is no need to observe the nth person in the mth camera view of the network, and all m views have the same number of n observed persons, so when learning a metric under the constraint of closed set pedestrian re-identification, there is still no real-world complex data relationship, i.e., in a network of m views, the nth person may capture an instance in cam1 of the network, and in cam 3 and cam 5 there is no instance of the nth person on the network, such a metric cannot simulate the complexity of when there is no matching query in the gallery, or there is no identity overlap of people in a pair of camera views at all.

Further, these metrics do not take into account the non-linear complex modes present in the open set. These non-linear models exist in the open-set image space due to random non-linear variations in the viewpoint, background, and illumination variations of the captured images, and crowded scenes and occlusions in images captured in large camera networks. In this case, when the learned metric can neither process the nonlinear model at the same time nor model the complexity of opening the data sample, its performance is greatly reduced when testing in a large-scale real-world network.

Thus, the prior art has yet to be improved and enhanced.

Disclosure of Invention

The technical problem to be solved by the present application is to provide a method, an apparatus and a system for re-identifying a pedestrian, aiming at overcoming the defects in the prior art, so as to solve the problem that the performance of the method for re-identifying and measuring a pedestrian in the prior art is greatly reduced when a test is performed in a large-scale real network without considering the non-linear complex modality existing in the open set.

The technical scheme adopted by the application is as follows:

in a first aspect, an embodiment of the present application provides a pedestrian re-identification method, which includes:

acquiring a first target image in a first scene, and extracting a first color feature and a first texture feature corresponding to the first target image;

determining a first modality in which the first target image is located according to the first color feature and the first texture feature;

determining a first metric corresponding to the first modality according to a preset relationship between the modality and the distance metric;

acquiring all second target images in each second scene, and extracting second color features and second texture features corresponding to the second target images;

inputting the first color feature and the first texture feature, and a second color feature and a second texture feature corresponding to each second target image into the first metric to obtain a plurality of feature distances, wherein the plurality of feature distances comprise feature distances between the first target image and each second target image;

and selecting a third target image corresponding to the minimum value in the characteristic distances, and judging that the third target image and the first target image are the same person.

In a second aspect, an embodiment of the present application provides a pedestrian re-identification apparatus, including:

the device comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring a first target image in a first scene and extracting a first color feature and a first texture feature corresponding to the first target image;

a modality determining module, configured to determine a first modality in which the first target image is located according to the first color feature and the first texture feature;

the measurement determining module is used for determining a first measurement corresponding to the first modality according to a preset relationship between the modality and the distance measurement;

the second obtaining module is used for obtaining all second target images in each second scene and extracting second color features and second texture features corresponding to the second target images;

a calculating module, configured to input the first color feature and the first texture feature, and a second color feature and a second texture feature corresponding to each second target image into the first metric to obtain a plurality of feature distances, where the plurality of feature distances include feature distances between the first target image and each second target image;

and the judging module is used for selecting a third target image corresponding to the minimum value in the characteristic distances and judging that the third target image and the first target image are the same person.

In a third aspect, an embodiment of the present application provides a pedestrian re-identification system, which includes: a processor and a memory; the memory has stored thereon a computer readable program executable by the processor; the processor, when executing the computer readable program, is configured to perform the steps of:

Has the advantages that: compared with the prior art, the application provides a pedestrian re-identification method, a device and a system, wherein the method comprises the following steps: collecting target images of each pedestrian under the two cameras, and forming a training sample and a test sample according to the collected target images; segmenting the training sample according to the modes to obtain segmentation data corresponding to each mode; grouping is carried out on each segmentation data, and a distance metric corresponding to each modality is obtained based on the grouped data. The method and the device have the advantages that the distance measurement is learned in the open set, the distance measurement can find the matching of the query in the complex nonlinear model existing in the open set, the target user can be quickly found through the obtained distance measurement, and the better accuracy is obtained in the field of pedestrian re-identification.

Drawings

Fig. 1 is a flowchart of a pedestrian re-identification method provided in the present application.

Fig. 2 is a scene schematic diagram of a pedestrian re-identification method provided by the present application.

Fig. 3 is a schematic structural diagram of a pedestrian re-identification device provided by the present application.

Fig. 4 is a schematic structural diagram of a pedestrian re-identification system provided in the present application.

Detailed Description

In order to make the purpose, technical scheme and effect of the present application clearer and clearer, the present application is further described in detail below with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or wirelessly coupled. As used herein, the term "and/or" includes all or any element and all combinations of one or more of the associated listed items.

It will be understood by those within the art that, unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the prior art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.

The invention will be further explained by the description of the embodiments with reference to the drawings.

The embodiment provides a pedestrian re-identification method, as shown in fig. 1, the method includes:

s100, acquiring a first target image in a first scene, and extracting a first color feature and a first texture feature corresponding to the first target image;

s200, determining a first modality of the first target image according to the first color feature and the first texture feature;

s300, determining a first measurement corresponding to a first modality according to a preset modality-distance measurement relation;

s400, acquiring all second target images in each second scene, and extracting second color features and second texture features corresponding to each second target image;

s500, inputting the first color feature and the first texture feature and second color features and second texture features corresponding to the second target images into the first metric to obtain a plurality of feature distances, wherein the plurality of feature distances comprise feature distances between the first target image and the second target images;

s600, selecting a third target image corresponding to the minimum value in the plurality of characteristic distances, and judging that the third target image and the first target image are the same person.

In this embodiment, taking a small network composed of six camera views as an example, as shown in fig. 2, different people are captured by the camera views in each camera view. Suppose we want to keep track of the ID 1 captured in cam 3 (camera 3), i.e. we want to find the moving location of ID 1 in the network and in which camera view it was captured (obviously it was observed in cam 4, but we have to look up using re-recognition).

Further, to find a match with ID 1 in view 3 in other views, we first extract features with ID 1, which are color (e.g. RGB) and texture (e.g. dense SIFT) features, as follows:

f1＝{color,texture}

we now use the extracted features to determine its modality, i.e. the modality to which ID ═ 1 belongs. To this end we use KNN algorithms like:

Modal＝KNN(f1)

the KNN algorithm is then used to determine which mode the feature f1 is close to. Assuming that there are 4 modes K in the network, the KNN algorithm determines that the feature f1 is located on the 4 mode K. Therefore, the modality with ID 1 is 4.

Next, in other views of the network (cam1, cam 2, cam 4, cam 5, and cam 6), we can use the metric of modality 4 (i.e., M4) to find other matches with ID 1. To do this, we calculate the feature distance "S" of the feature f1 with ID 1 from the other people in the other views as follows:

s ═ f 1-other characteristics)^TXM 4X (f 1-other characteristics)

Now, after we get ID 1 and all feature distances between all other people in other views, we will check which person has the smallest feature distance (it should be noted that the smallest feature distance means the largest similarity). Suppose that if we use the above equation to get the S value between ID 1 and others:

then we can see that the distance of Person 8 is the smallest, so the match ID is 1 (for simplicity we only take 9 people). I.e., ID 1 in the 4 th camera area.

Further, before the obtaining of the first target image in the first scene and the extracting of the first color feature and the first texture feature corresponding to the target to be recognized, the method further includes:

s10, collecting target images of each pedestrian under the two cameras, and forming a training sample and a test sample according to the collected target images;

in this embodiment, an open set pedestrian re-identification network is defined first. Assume a simple network of two camera views, called view a and view b (the network later extends to m views). Further, assume that only n unique people are captured in view a and view b, while there are additional p people captured in view a; not captured in view b. Similarly, q individuals are captured in view b; but not captured in view a. Then, the set of people in view a, called U, can be given as follows:

U＝{n+p}

where n is the number of unique people collectively captured in views a and b, and p is captured in view a only. Similarly, the set V of the total number of unique people in view b is represented as:

V＝{n+q}

where n is the number of unique people captured in views a and b and q is captured in view b only. Furthermore, the cardinality of set U and set V are not equal, i.e. in the open set, all camera views may observe different people.

Now, the joint sample space of the network (view a and view b) is randomly divided into a training set and a test set. The training set contains 50% of the total samples, while the remaining 50% of the samples form the test set.

And S20, segmenting the training sample according to the mode to obtain segmentation data corresponding to each mode.

In the present embodiment, in order to enhance the feature representation of a person, color and texture features are extracted from each image. Among them, color features (RGB, YUV and HSV histograms) provide color distribution information of human hair, clothing and body, and texture features (DenseSIFT, SILTP and LBP) provide corresponding texture attributes of human face and clothing. Correspondingly, the segmenting the training sample according to the modalities to obtain segmented data corresponding to each modality specifically includes:

s21, extracting color features and texture features of each target image in the training sample;

s22, obtaining a plurality of modes according to the color features and the texture features of each target image;

and S23, segmenting the training sample according to the mode to obtain segmentation data corresponding to each mode.

In the embodiment, after the features of all training sets of the open set pedestrian re-recognition network are extracted, the multi-modal problem existing in the joint image space of the view a and the view b is solved.

And S30, grouping each piece of segmentation data, and obtaining a distance metric corresponding to each modality based on the grouped data to obtain the preset relationship between the modalities and the distance metrics.

In this embodiment, the image space of the pair of views a and b will be jointly divided into a different limited number of modalities. Selecting the right number of modalities in the open set is important to find the right match for a given random query, so in this application, an effective modality can only be formed when one modality contains at least 10% of the total training sample (the network size is large in the open set, so 10% of the total training sample is large enough to form a stable modality). Now in this case, we divide the joint image space into K modalities using a similar method explained in [6 ]. In addition, in the open-set pedestrian re-recognition, the number of observers who hold two viewpoints (a and b) continues to increase. Therefore, to discover stable modalities and to associate new people with existing modalities at runtime or to form new modalities in a federated space, we employ a dynamic partitioning model. I.e. the joint image space is repartitioned every 15% new input personnel. Although we have adopted a dynamic partitioning model at runtime, we retain the linearity of their incremental cost, and first a new model will be formed only if the population in the m-view network is increased by at least 15%. Assuming that the person trained in m views is 10000 people, 15% of 10000 people are 1500 new people. In open-set, a certain time interval (hours) is required to reach the number of 1500 people, and therefore, the dynamic partition model is not continuously operated, but is operated after a certain time interval (1 to several hours). Thus, the dynamic partitioning model remains extensible in time. For training of size n individuals, the complexity of segmentation is about o (t x K x n), where t is the number of iterations we keep fixed (t 3) and K is the number of modalities to be found, which increases linearly to 1 or 2 or 3 after a few hours interval.

In this way, the joint image space is divided into K modalities, and we already know the associations of all persons belonging to different modalities. We now take a single modality K from all K modalities and group the people located on modality K into three groups. This is because in modality k, some people have captured images in both a and b views, while in modality k, some people have captured images only in a view, and in modality k, some people have captured images only in b view. These three sets are called sets, respectively¹ _ab,k，set² _a,kAnd set³ _b,k. Now, the model is learned using the persons in the three sets in modality k to generate training pairs and triplesThe distance measure Mk for state k.

Further, grouping each piece of segmented data, and obtaining a distance metric corresponding to each modality based on the grouped data specifically includes:

grouping for each segmented data to obtain three sets, the three sets comprising: set¹ _ab,k，set² _a,kAnd set³ _b,k(ii) a Wherein set¹ _ab,kFor views under two cameras, set² _a,kFor views in which only the camera a is present, set³ _b,kIs a view only under camera b;

set as a set² _a,kAnd set of³ _b,kSet positive, and update set² _a,kAnd set of³ _b,k；

And obtaining a distance metric corresponding to each modality based on the three sets.

Further, the obtaining of the distance metric corresponding to each modality based on the three sets specifically includes:

generating a negative gallery sample set according to the three sets;

generating a pair of imposters according to each pair of images in the three sets;

and generating a pair of imposters according to the negative image library sample set and each pair of images to obtain the distance measurement corresponding to each mode.

Specifically, to generate training pairs and generate triples, we must set a positive alignment for each training person in modality k. However, set² _a,kAnd set of³ _b,kHas only a single shot image in modality k (single shot here means that each of the two groups captures only one instance in view a or b, rather than together in both views). Therefore, it is necessary to set² _a,kAnd set³ _b,kThe people in k become aligned. To this end, we have devised a new approach and used set² _a,kAnd set³ _b,kOf the person in (a). We used these original examples, for set² _a,kAnd set³ _b,kThe original instances of all of the people are randomly rotated and mirror flipped to generate another positive instance of those people as their pair.

Because all people in modality k have a pair of images, we can now generate training triples to learn the distance metric Mk for modality k. Initially we will generate a negative gallery sample for each person present in modality k. However, in open set, the people in view a and view b may have non-overlapping people identities, i.e. the number of people, and as such their identities will be different in all different camera views of the network, so if we define view b as a gallery view, people not seen in view a will never be used to train the modal metric Mk, and vice versa. Therefore, to solve this problem, we decide to use view b as set¹ _ab,kAnd set² _a,kA gallery view of people in, and for set sets³ _b,kThe human view a in (1) is a gallery view. Now, if there are nn ' individuals present in modality k, for each nn ' person, its negative library sample is obtained from the corresponding opposite view, i.e. if the nn ' person is from view a (i.e. in set)¹ _ab,kAnd set² _a,kIn), then it is taken from view b, or if the nth' person is from view b (i.e. set)³ _b,k) Then it is taken from view a. For each person, the selected negative gallery person is neither a person of own identity nor a counterfeiter (i.e., does not have a similarity higher than its actual instance, it exists in an opposite view or a randomly generated image instance). Now, a sample set of negative gallery of all nn' people in modality k

The following were used:

here, the

Is a negative gallery sample (for set) from view b¹ _ab,kAnd set² _a,k)，

Is a negative gallery sample (for set) from view a³ _b,k) And iid 'is the corresponding id of the person in pattern k that is the negative gallery sample for nn'.

Now we acquire a pair of imposters for a pair of images of every nth' person in modality k. In modality k, someone has captured a sample image, set, in both views (a and b)¹ _ab,kWhile some people initially only capture in view a or view b (i.e. at set)² _a,kOr set³ _b,kMedium), but not captured in both views. For two views (set)¹ _ab,k) Using each view-specific sample to obtain an imposter from the corresponding opposite view, e.g., using the person's image to obtain an imposter from view b, and vice versa. Thus, their pair of counterfeiters is given as follows:

here xo is an impostor, obtained in view a and view b as:

for view a or view b (i.e. set)¹ _ab,kOr set³ _b,k) We use its original captured image (from view a or view b), and its randomly generated instances in the same view. For example, if looking atIn figure a the original example image of the nth person is captured, and the counterfeiter is obtained from the opposite view b as:

here, the

Is an image randomly generated using random rotation and flipping, and o_bIs a counterfeiter in view b.

Thereafter, both the original image instance of the nn' th individual captured in view a and the randomly generated instance in the same view a are used to obtain the impostor xo in the same view a. In this case, imposter xo in view a is chosen such that it is a common imposter of the original and randomly generated image instances of the nth' individual, given as follows:

the above process of obtaining negative gallery sample and imposter pairs is then repeated for all remaining persons in modality k and all remaining k-1 modalities in the joint image space.

Further, all nn negative gallery samples and imposter pairs of people are obtained in modality k, we now prepare the learning mode k metric Mk. To learn Mk, the method employed in this example can well distinguish the important features of every nn' th person due to its inherent ability to handle nonlinear complexity in pattern k and its multi-nuclear approach. However, selecting the appropriate kernel is critical to improve performance and maintain scalability of the model. In our invention, to maintain the scalability of the discriminative power and runtime metrics, we have chosen three RBF cores and one chi-square core. Finally, the working multi-modal metric Mk learns as follows:

here matrices SB and SW are inter-class scattering matrices and intra-class scattering matrices. However, in our case, the matrices SB and SW are learned for many people who do not have cross-view image pairs, i.e., people located in set² _a,kOr set³ _b,kIn (1). For set² _a,kOr set³ _b,kPerson in, we use their randomly generated pairs to compute the SW matrix, and it is directed to the set² _a,kThe open-set case of a given person provides an inherent discriminative power in order to learn the metric Mk, that is, if there is no positive match in the opposite view b. Now, the above equation a can be converted to a generalized eigenvalue problem and then solved to obtain the metric Mk as:

the eigenvalue problem is solved and then the first 300 eigenvectors corresponding to the largest eigenvalues are obtained, resulting in modal metrics Mk. Continuing the above process, all of the remaining indicators for all of the remaining K-1 modes will be learned.

S400, determining the mode of the test sample, and selecting the distance measurement corresponding to the mode according to the mode of the test sample.

S500, calculating the distance measurement by adopting a KNN clustering algorithm to obtain a feature set.

In this embodiment, the re-identification in the open set is performed by the modality of the given random query xi that is determined first. This is done by the KNN method, and then finds its match in the network using the corresponding modal metric Mk as follows:

if it is not

Is a randomly given query captured in view a, and x^jAre instances observed in the network, i.e. captured in either view a or view b. In open set, the query first needs to be determined

Whether it has already been captured in the network (in view a or view b) or whether it is not present in the network, and then matching is performed. The metric Mk of our invention can perform both tasks simultaneously, and if it has been captured, it is found in the network

Otherwise there is no result.

In summary, compared with the prior art, the embodiment of the present application has the following advantages:

the application discloses a pedestrian re-identification method, which comprises the following steps: collecting target images of each pedestrian under the two cameras, and forming a training sample and a test sample according to the collected target images; segmenting the training sample according to the modes to obtain segmentation data corresponding to each mode; grouping is carried out on each segmentation data, and a distance metric corresponding to each modality is obtained based on the grouped data. The method and the device have the advantages that the distance measurement is learned in the open set, the distance measurement can find the matching of the query in the complex nonlinear model existing in the open set, the target user can be quickly found through the obtained distance measurement, and the better accuracy is obtained in the field of pedestrian re-identification.

Based on the above pedestrian re-identification method, the present application also provides a pedestrian re-identification device, as shown in fig. 3, the device includes:

a first obtaining module 41, configured to obtain a first target image in a first scene, and extract a first color feature and a first texture feature corresponding to the first target image;

a modality determining module 42, configured to determine a first modality in which the first target image is located according to the first color feature and the first texture feature;

a metric determining module 43, configured to determine a first metric corresponding to the first modality according to a preset relationship between the modality and the distance metric;

a second obtaining module 44, configured to obtain all second target images in each second scene, and extract a second color feature and a second texture feature corresponding to each second target image;

a calculating module 45, configured to input the first color feature and the first texture feature, and the second color feature and the second texture feature corresponding to each second target image into the first metric to obtain a plurality of feature distances, where the plurality of feature distances include feature distances between the first target image and each second target image;

and a determining module 46, configured to select a third target image corresponding to a minimum value of the feature distances, and determine that the third target image and the first target image are the same person.

It should be noted that, as will be clearly understood by those skilled in the art, the detailed implementation process of the pedestrian re-identification method apparatus and each module may refer to the corresponding description in the foregoing embodiment of the pedestrian re-identification method, and for convenience and brevity of description, no further description is provided here.

The pedestrian re-identification means may be implemented in the form of a computer program that can be run on a pedestrian re-identification system as shown in fig. 4.

Based on the pedestrian re-identification method, the application also provides a computer-readable storage medium, where one or more programs are stored, and the one or more programs can be executed by one or more processors to implement the steps in the pedestrian re-identification method according to the above embodiment.

Based on the pedestrian re-identification method, the present application further provides a pedestrian re-identification system, as shown in fig. 4, which includes at least one processor (processor) 20; a display screen 21; and a memory (memory)22, and may further include a communication Interface (Communications Interface)23 and a bus 24. The processor 20, the display 21, the memory 22 and the communication interface 23 can communicate with each other through the bus 24. The display screen 21 is configured to display a user guidance interface preset in the initial setting mode. The communication interface 23 may transmit information. The processor 20 may call logic instructions in the memory 22 to perform the methods in the embodiments described above.

Furthermore, the logic instructions in the memory 22 may be implemented in software functional units and stored in a computer readable storage medium when sold or used as a stand-alone product.

The memory 22, which is a computer-readable storage medium, may be configured to store a software program, a computer-executable program, such as program instructions or modules corresponding to the methods in the embodiments of the present disclosure. The processor 20 executes the functional application and data processing, i.e. implements the method in the above-described embodiments, by executing the software program, instructions or modules stored in the memory 22.

The memory 22 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to the use of the terminal device, and the like. Further, the memory 22 may include a high speed random access memory and may also include a non-volatile memory. For example, a variety of media that can store program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk, may also be transient storage media.

In addition, the specific processes loaded and executed by the instruction processors in the storage medium and the system are described in detail in the method, and are not stated herein.

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solutions of the present application, and not to limit the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions in the embodiments of the present application.

Claims

1. A pedestrian re-identification method, wherein it comprises:

2. The pedestrian re-identification method according to claim 1, wherein before the acquiring a first target image in a first scene and extracting a first color feature and a first texture feature corresponding to the first target, the method further comprises:

collecting target images of each pedestrian under the two cameras, and forming a training sample and a test sample according to the collected target images;

segmenting the training sample according to the modes to obtain segmentation data corresponding to each mode;

grouping each segmentation data, and obtaining a distance measurement corresponding to each mode based on the grouped data so as to obtain the preset relationship between the modes and the distance measurement.

3. The pedestrian re-identification method according to claim 2, wherein the method further comprises:

determining the mode of a test sample, and selecting the distance measurement corresponding to the mode according to the mode of the test sample;

and calculating the distance measurement by adopting a KNN clustering algorithm to obtain a feature set.

4. The pedestrian re-identification method according to claim 2, wherein the segmenting the training sample according to modalities to obtain segmented data corresponding to each modality specifically comprises:

extracting color features and texture features of each target image in the training sample;

obtaining a plurality of modes according to the color features and the texture features of each target image;

and segmenting the training sample according to the modes to obtain segmentation data corresponding to each mode.

5. The pedestrian re-identification method according to claim 2, wherein the grouping for each piece of segmented data and deriving the distance metric corresponding to each modality based on the grouped data specifically comprises:

grouping for each segmented data to obtain three sets, the three sets comprising: set¹ _ab,k,set² _a,kAnd set³ _b,k(ii) a Wherein set¹ _ab,kFor views under two cameras, set² _a,kFor views in which only the camera a is present, set³ _b,kIs a view only under camera b;

6. The pedestrian re-identification method of claim 5, wherein said deriving a distance metric for each modality based on the three sets further comprises:

set as a set² _a,kAnd set of³ _b,kSet positive, and update set² _a,kAnd set of³ _b,k。

7. The pedestrian re-identification method according to claim 6, wherein said deriving a distance metric corresponding to each modality based on the three sets specifically comprises:

generating a negative gallery sample set according to the three sets;

8. A pedestrian re-identification apparatus, wherein the apparatus comprises:

9. A pedestrian re-identification system, comprising: a processor and a memory; the memory has stored thereon a computer readable program executable by the processor; the processor, when executing the computer readable program, is configured to perform the steps of:

10. The pedestrian re-identification system of claim 9, wherein the processor, when executing the computer readable program, is further configured to:

11. The pedestrian re-identification system of claim 10, wherein the processor, when executing the computer readable program, is further configured to:

12. The pedestrian re-identification system of claim 10, wherein the processor, when executing the computer readable program, is further configured to:

13. The pedestrian re-identification system of claim 10, wherein the processor, when executing the computer readable program, is further configured to:

grouping for each segmented data to obtain three sets, the three sets comprising: set¹ _ab,k,set² _a,kAnd set³ _b,k(ii) a Wherein set¹ _ab,kFor the view under two cameras，set² _a,kFor views in which only the camera a is present, set³ _b,kIs a view only under camera b;