CN112823356A - Pedestrian re-identification method, device and system - Google Patents

Pedestrian re-identification method, device and system Download PDF

Info

Publication number
CN112823356A
CN112823356A CN202080003202.5A CN202080003202A CN112823356A CN 112823356 A CN112823356 A CN 112823356A CN 202080003202 A CN202080003202 A CN 202080003202A CN 112823356 A CN112823356 A CN 112823356A
Authority
CN
China
Prior art keywords
target image
feature
pedestrian
modality
texture
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202080003202.5A
Other languages
Chinese (zh)
Other versions
CN112823356B (en
Inventor
赛义德·穆罕默德·阿德南
崔勇
谢丰隆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Konka Group Co Ltd
Original Assignee
Konka Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Konka Group Co Ltd filed Critical Konka Group Co Ltd
Publication of CN112823356A publication Critical patent/CN112823356A/en
Application granted granted Critical
Publication of CN112823356B publication Critical patent/CN112823356B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/56Extraction of image or video features relating to colour

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

The application discloses a pedestrian re-identification method, a device and a system, wherein the method comprises the following steps: collecting target images of each pedestrian under the two cameras, and forming a training sample and a test sample according to the collected target images; segmenting the training sample according to the modes to obtain segmentation data corresponding to each mode; grouping is carried out on each segmentation data, and a distance metric corresponding to each modality is obtained based on the grouped data. The method and the device have the advantages that the distance measurement is learned in the open set, the distance measurement can find the matching of the query in the complex nonlinear model existing in the open set, the target user can be quickly found through the obtained distance measurement, and the better accuracy is obtained in the field of pedestrian re-identification.

Description

Pedestrian re-identification method, device and system
Technical Field
The present application relates to the field of video surveillance, and in particular, to a method, an apparatus, and a system for re-identifying pedestrians.
Background
Currently, the method for re-identifying pedestrians in an open set is to learn distance measurement by using public closed set data sets, which are very different from the open set data sets, and ignore some open set situations, that is, the observed people are partially overlapped or not overlapped at all under different camera angles. Therefore, the prior art fails to address the metric of open set pedestrian re-identification.
In the prior art, these metrics are learned using a closed set pedestrian re-identification dataset, which is completely different from the closed set pedestrian re-identification dataset, where there is no need to observe the nth person in the mth camera view of the network, and all m views have the same number of n observed persons, so when learning a metric under the constraint of closed set pedestrian re-identification, there is still no real-world complex data relationship, i.e., in a network of m views, the nth person may capture an instance in cam1 of the network, and in cam 3 and cam 5 there is no instance of the nth person on the network, such a metric cannot simulate the complexity of when there is no matching query in the gallery, or there is no identity overlap of people in a pair of camera views at all.
Further, these metrics do not take into account the non-linear complex modes present in the open set. These non-linear models exist in the open-set image space due to random non-linear variations in the viewpoint, background, and illumination variations of the captured images, and crowded scenes and occlusions in images captured in large camera networks. In this case, when the learned metric can neither process the nonlinear model at the same time nor model the complexity of opening the data sample, its performance is greatly reduced when testing in a large-scale real-world network.
Thus, the prior art has yet to be improved and enhanced.
Disclosure of Invention
The technical problem to be solved by the present application is to provide a method, an apparatus and a system for re-identifying a pedestrian, aiming at overcoming the defects in the prior art, so as to solve the problem that the performance of the method for re-identifying and measuring a pedestrian in the prior art is greatly reduced when a test is performed in a large-scale real network without considering the non-linear complex modality existing in the open set.
The technical scheme adopted by the application is as follows:
in a first aspect, an embodiment of the present application provides a pedestrian re-identification method, which includes:
acquiring a first target image in a first scene, and extracting a first color feature and a first texture feature corresponding to the first target image;
determining a first modality in which the first target image is located according to the first color feature and the first texture feature;
determining a first metric corresponding to the first modality according to a preset relationship between the modality and the distance metric;
acquiring all second target images in each second scene, and extracting second color features and second texture features corresponding to the second target images;
inputting the first color feature and the first texture feature, and a second color feature and a second texture feature corresponding to each second target image into the first metric to obtain a plurality of feature distances, wherein the plurality of feature distances comprise feature distances between the first target image and each second target image;
and selecting a third target image corresponding to the minimum value in the characteristic distances, and judging that the third target image and the first target image are the same person.
In a second aspect, an embodiment of the present application provides a pedestrian re-identification apparatus, including:
the device comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring a first target image in a first scene and extracting a first color feature and a first texture feature corresponding to the first target image;
a modality determining module, configured to determine a first modality in which the first target image is located according to the first color feature and the first texture feature;
the measurement determining module is used for determining a first measurement corresponding to the first modality according to a preset relationship between the modality and the distance measurement;
the second obtaining module is used for obtaining all second target images in each second scene and extracting second color features and second texture features corresponding to the second target images;
a calculating module, configured to input the first color feature and the first texture feature, and a second color feature and a second texture feature corresponding to each second target image into the first metric to obtain a plurality of feature distances, where the plurality of feature distances include feature distances between the first target image and each second target image;
and the judging module is used for selecting a third target image corresponding to the minimum value in the characteristic distances and judging that the third target image and the first target image are the same person.
In a third aspect, an embodiment of the present application provides a pedestrian re-identification system, which includes: a processor and a memory; the memory has stored thereon a computer readable program executable by the processor; the processor, when executing the computer readable program, is configured to perform the steps of:
acquiring a first target image in a first scene, and extracting a first color feature and a first texture feature corresponding to the first target image;
determining a first modality in which the first target image is located according to the first color feature and the first texture feature;
determining a first metric corresponding to the first modality according to a preset relationship between the modality and the distance metric;
acquiring all second target images in each second scene, and extracting second color features and second texture features corresponding to the second target images;
inputting the first color feature and the first texture feature, and a second color feature and a second texture feature corresponding to each second target image into the first metric to obtain a plurality of feature distances, wherein the plurality of feature distances comprise feature distances between the first target image and each second target image;
and selecting a third target image corresponding to the minimum value in the characteristic distances, and judging that the third target image and the first target image are the same person.
Has the advantages that: compared with the prior art, the application provides a pedestrian re-identification method, a device and a system, wherein the method comprises the following steps: collecting target images of each pedestrian under the two cameras, and forming a training sample and a test sample according to the collected target images; segmenting the training sample according to the modes to obtain segmentation data corresponding to each mode; grouping is carried out on each segmentation data, and a distance metric corresponding to each modality is obtained based on the grouped data. The method and the device have the advantages that the distance measurement is learned in the open set, the distance measurement can find the matching of the query in the complex nonlinear model existing in the open set, the target user can be quickly found through the obtained distance measurement, and the better accuracy is obtained in the field of pedestrian re-identification.
Drawings
Fig. 1 is a flowchart of a pedestrian re-identification method provided in the present application.
Fig. 2 is a scene schematic diagram of a pedestrian re-identification method provided by the present application.
Fig. 3 is a schematic structural diagram of a pedestrian re-identification device provided by the present application.
Fig. 4 is a schematic structural diagram of a pedestrian re-identification system provided in the present application.
Detailed Description
In order to make the purpose, technical scheme and effect of the present application clearer and clearer, the present application is further described in detail below with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or wirelessly coupled. As used herein, the term "and/or" includes all or any element and all combinations of one or more of the associated listed items.
It will be understood by those within the art that, unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the prior art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.
The invention will be further explained by the description of the embodiments with reference to the drawings.
The embodiment provides a pedestrian re-identification method, as shown in fig. 1, the method includes:
s100, acquiring a first target image in a first scene, and extracting a first color feature and a first texture feature corresponding to the first target image;
s200, determining a first modality of the first target image according to the first color feature and the first texture feature;
s300, determining a first measurement corresponding to a first modality according to a preset modality-distance measurement relation;
s400, acquiring all second target images in each second scene, and extracting second color features and second texture features corresponding to each second target image;
s500, inputting the first color feature and the first texture feature and second color features and second texture features corresponding to the second target images into the first metric to obtain a plurality of feature distances, wherein the plurality of feature distances comprise feature distances between the first target image and the second target images;
s600, selecting a third target image corresponding to the minimum value in the plurality of characteristic distances, and judging that the third target image and the first target image are the same person.
In this embodiment, taking a small network composed of six camera views as an example, as shown in fig. 2, different people are captured by the camera views in each camera view. Suppose we want to keep track of the ID 1 captured in cam 3 (camera 3), i.e. we want to find the moving location of ID 1 in the network and in which camera view it was captured (obviously it was observed in cam 4, but we have to look up using re-recognition).
Further, to find a match with ID 1 in view 3 in other views, we first extract features with ID 1, which are color (e.g. RGB) and texture (e.g. dense SIFT) features, as follows:
f1={color,texture}
we now use the extracted features to determine its modality, i.e. the modality to which ID ═ 1 belongs. To this end we use KNN algorithms like:
Modal=KNN(f1)
the KNN algorithm is then used to determine which mode the feature f1 is close to. Assuming that there are 4 modes K in the network, the KNN algorithm determines that the feature f1 is located on the 4 mode K. Therefore, the modality with ID 1 is 4.
Next, in other views of the network (cam1, cam 2, cam 4, cam 5, and cam 6), we can use the metric of modality 4 (i.e., M4) to find other matches with ID 1. To do this, we calculate the feature distance "S" of the feature f1 with ID 1 from the other people in the other views as follows:
s ═ f 1-other characteristics)TXM 4X (f 1-other characteristics)
Now, after we get ID 1 and all feature distances between all other people in other views, we will check which person has the smallest feature distance (it should be noted that the smallest feature distance means the largest similarity). Suppose that if we use the above equation to get the S value between ID 1 and others:
Figure BDA0002821486920000061
then we can see that the distance of Person 8 is the smallest, so the match ID is 1 (for simplicity we only take 9 people). I.e., ID 1 in the 4 th camera area.
Further, before the obtaining of the first target image in the first scene and the extracting of the first color feature and the first texture feature corresponding to the target to be recognized, the method further includes:
s10, collecting target images of each pedestrian under the two cameras, and forming a training sample and a test sample according to the collected target images;
in this embodiment, an open set pedestrian re-identification network is defined first. Assume a simple network of two camera views, called view a and view b (the network later extends to m views). Further, assume that only n unique people are captured in view a and view b, while there are additional p people captured in view a; not captured in view b. Similarly, q individuals are captured in view b; but not captured in view a. Then, the set of people in view a, called U, can be given as follows:
U={n+p}
where n is the number of unique people collectively captured in views a and b, and p is captured in view a only. Similarly, the set V of the total number of unique people in view b is represented as:
V={n+q}
where n is the number of unique people captured in views a and b and q is captured in view b only. Furthermore, the cardinality of set U and set V are not equal, i.e. in the open set, all camera views may observe different people.
Now, the joint sample space of the network (view a and view b) is randomly divided into a training set and a test set. The training set contains 50% of the total samples, while the remaining 50% of the samples form the test set.
And S20, segmenting the training sample according to the mode to obtain segmentation data corresponding to each mode.
In the present embodiment, in order to enhance the feature representation of a person, color and texture features are extracted from each image. Among them, color features (RGB, YUV and HSV histograms) provide color distribution information of human hair, clothing and body, and texture features (DenseSIFT, SILTP and LBP) provide corresponding texture attributes of human face and clothing. Correspondingly, the segmenting the training sample according to the modalities to obtain segmented data corresponding to each modality specifically includes:
s21, extracting color features and texture features of each target image in the training sample;
s22, obtaining a plurality of modes according to the color features and the texture features of each target image;
and S23, segmenting the training sample according to the mode to obtain segmentation data corresponding to each mode.
In the embodiment, after the features of all training sets of the open set pedestrian re-recognition network are extracted, the multi-modal problem existing in the joint image space of the view a and the view b is solved.
And S30, grouping each piece of segmentation data, and obtaining a distance metric corresponding to each modality based on the grouped data to obtain the preset relationship between the modalities and the distance metrics.
In this embodiment, the image space of the pair of views a and b will be jointly divided into a different limited number of modalities. Selecting the right number of modalities in the open set is important to find the right match for a given random query, so in this application, an effective modality can only be formed when one modality contains at least 10% of the total training sample (the network size is large in the open set, so 10% of the total training sample is large enough to form a stable modality). Now in this case, we divide the joint image space into K modalities using a similar method explained in [6 ]. In addition, in the open-set pedestrian re-recognition, the number of observers who hold two viewpoints (a and b) continues to increase. Therefore, to discover stable modalities and to associate new people with existing modalities at runtime or to form new modalities in a federated space, we employ a dynamic partitioning model. I.e. the joint image space is repartitioned every 15% new input personnel. Although we have adopted a dynamic partitioning model at runtime, we retain the linearity of their incremental cost, and first a new model will be formed only if the population in the m-view network is increased by at least 15%. Assuming that the person trained in m views is 10000 people, 15% of 10000 people are 1500 new people. In open-set, a certain time interval (hours) is required to reach the number of 1500 people, and therefore, the dynamic partition model is not continuously operated, but is operated after a certain time interval (1 to several hours). Thus, the dynamic partitioning model remains extensible in time. For training of size n individuals, the complexity of segmentation is about o (t x K x n), where t is the number of iterations we keep fixed (t 3) and K is the number of modalities to be found, which increases linearly to 1 or 2 or 3 after a few hours interval.
In this way, the joint image space is divided into K modalities, and we already know the associations of all persons belonging to different modalities. We now take a single modality K from all K modalities and group the people located on modality K into three groups. This is because in modality k, some people have captured images in both a and b views, while in modality k, some people have captured images only in a view, and in modality k, some people have captured images only in b view. These three sets are called sets, respectively1 ab,k,set2 a,kAnd set3 b,k. Now, the model is learned using the persons in the three sets in modality k to generate training pairs and triplesThe distance measure Mk for state k.
Further, grouping each piece of segmented data, and obtaining a distance metric corresponding to each modality based on the grouped data specifically includes:
grouping for each segmented data to obtain three sets, the three sets comprising: set1 ab,k,set2 a,kAnd set3 b,k(ii) a Wherein set1 ab,kFor views under two cameras, set2 a,kFor views in which only the camera a is present, set3 b,kIs a view only under camera b;
set as a set2 a,kAnd set of3 b,kSet positive, and update set2 a,kAnd set of3 b,k
And obtaining a distance metric corresponding to each modality based on the three sets.
Further, the obtaining of the distance metric corresponding to each modality based on the three sets specifically includes:
generating a negative gallery sample set according to the three sets;
generating a pair of imposters according to each pair of images in the three sets;
and generating a pair of imposters according to the negative image library sample set and each pair of images to obtain the distance measurement corresponding to each mode.
Specifically, to generate training pairs and generate triples, we must set a positive alignment for each training person in modality k. However, set2 a,kAnd set of3 b,kHas only a single shot image in modality k (single shot here means that each of the two groups captures only one instance in view a or b, rather than together in both views). Therefore, it is necessary to set2 a,kAnd set3 b,kThe people in k become aligned. To this end, we have devised a new approach and used set2 a,kAnd set3 b,kOf the person in (a). We used these original examples, for set2 a,kAnd set3 b,kThe original instances of all of the people are randomly rotated and mirror flipped to generate another positive instance of those people as their pair.
Because all people in modality k have a pair of images, we can now generate training triples to learn the distance metric Mk for modality k. Initially we will generate a negative gallery sample for each person present in modality k. However, in open set, the people in view a and view b may have non-overlapping people identities, i.e. the number of people, and as such their identities will be different in all different camera views of the network, so if we define view b as a gallery view, people not seen in view a will never be used to train the modal metric Mk, and vice versa. Therefore, to solve this problem, we decide to use view b as set1 ab,kAnd set2 a,kA gallery view of people in, and for set sets3 b,kThe human view a in (1) is a gallery view. Now, if there are nn ' individuals present in modality k, for each nn ' person, its negative library sample is obtained from the corresponding opposite view, i.e. if the nn ' person is from view a (i.e. in set)1 ab,kAnd set2 a,kIn), then it is taken from view b, or if the nth' person is from view b (i.e. set)3 b,k) Then it is taken from view a. For each person, the selected negative gallery person is neither a person of own identity nor a counterfeiter (i.e., does not have a similarity higher than its actual instance, it exists in an opposite view or a randomly generated image instance). Now, a sample set of negative gallery of all nn' people in modality k
Figure BDA0002821486920000104
The following were used:
Figure BDA0002821486920000101
here, the
Figure BDA0002821486920000105
Is a negative gallery sample (for set) from view b1 ab,kAnd set2 a,k),
Figure BDA0002821486920000106
Is a negative gallery sample (for set) from view a3 b,k) And iid 'is the corresponding id of the person in pattern k that is the negative gallery sample for nn'.
Now we acquire a pair of imposters for a pair of images of every nth' person in modality k. In modality k, someone has captured a sample image, set, in both views (a and b)1 ab,kWhile some people initially only capture in view a or view b (i.e. at set)2 a,kOr set3 b,kMedium), but not captured in both views. For two views (set)1 ab,k) Using each view-specific sample to obtain an imposter from the corresponding opposite view, e.g., using the person's image to obtain an imposter from view b, and vice versa. Thus, their pair of counterfeiters is given as follows:
Figure BDA0002821486920000102
here xo is an impostor, obtained in view a and view b as:
Figure BDA0002821486920000103
for view a or view b (i.e. set)1 ab,kOr set3 b,k) We use its original captured image (from view a or view b), and its randomly generated instances in the same view. For example, if looking atIn figure a the original example image of the nth person is captured, and the counterfeiter is obtained from the opposite view b as:
Figure BDA0002821486920000111
here, the
Figure BDA0002821486920000114
Is an image randomly generated using random rotation and flipping, and obIs a counterfeiter in view b.
Thereafter, both the original image instance of the nn' th individual captured in view a and the randomly generated instance in the same view a are used to obtain the impostor xo in the same view a. In this case, imposter xo in view a is chosen such that it is a common imposter of the original and randomly generated image instances of the nth' individual, given as follows:
Figure BDA0002821486920000112
the above process of obtaining negative gallery sample and imposter pairs is then repeated for all remaining persons in modality k and all remaining k-1 modalities in the joint image space.
Further, all nn negative gallery samples and imposter pairs of people are obtained in modality k, we now prepare the learning mode k metric Mk. To learn Mk, the method employed in this example can well distinguish the important features of every nn' th person due to its inherent ability to handle nonlinear complexity in pattern k and its multi-nuclear approach. However, selecting the appropriate kernel is critical to improve performance and maintain scalability of the model. In our invention, to maintain the scalability of the discriminative power and runtime metrics, we have chosen three RBF cores and one chi-square core. Finally, the working multi-modal metric Mk learns as follows:
Figure BDA0002821486920000113
here matrices SB and SW are inter-class scattering matrices and intra-class scattering matrices. However, in our case, the matrices SB and SW are learned for many people who do not have cross-view image pairs, i.e., people located in set2 a,kOr set3 b,kIn (1). For set2 a,kOr set3 b,kPerson in, we use their randomly generated pairs to compute the SW matrix, and it is directed to the set2 a,kThe open-set case of a given person provides an inherent discriminative power in order to learn the metric Mk, that is, if there is no positive match in the opposite view b. Now, the above equation a can be converted to a generalized eigenvalue problem and then solved to obtain the metric Mk as:
Figure BDA0002821486920000121
the eigenvalue problem is solved and then the first 300 eigenvectors corresponding to the largest eigenvalues are obtained, resulting in modal metrics Mk. Continuing the above process, all of the remaining indicators for all of the remaining K-1 modes will be learned.
S400, determining the mode of the test sample, and selecting the distance measurement corresponding to the mode according to the mode of the test sample.
S500, calculating the distance measurement by adopting a KNN clustering algorithm to obtain a feature set.
In this embodiment, the re-identification in the open set is performed by the modality of the given random query xi that is determined first. This is done by the KNN method, and then finds its match in the network using the corresponding modal metric Mk as follows:
Figure BDA0002821486920000122
if it is not
Figure BDA0002821486920000123
Is a randomly given query captured in view a, and xjAre instances observed in the network, i.e. captured in either view a or view b. In open set, the query first needs to be determined
Figure BDA0002821486920000124
Whether it has already been captured in the network (in view a or view b) or whether it is not present in the network, and then matching is performed. The metric Mk of our invention can perform both tasks simultaneously, and if it has been captured, it is found in the network
Figure BDA0002821486920000125
Otherwise there is no result.
In summary, compared with the prior art, the embodiment of the present application has the following advantages:
the application discloses a pedestrian re-identification method, which comprises the following steps: collecting target images of each pedestrian under the two cameras, and forming a training sample and a test sample according to the collected target images; segmenting the training sample according to the modes to obtain segmentation data corresponding to each mode; grouping is carried out on each segmentation data, and a distance metric corresponding to each modality is obtained based on the grouped data. The method and the device have the advantages that the distance measurement is learned in the open set, the distance measurement can find the matching of the query in the complex nonlinear model existing in the open set, the target user can be quickly found through the obtained distance measurement, and the better accuracy is obtained in the field of pedestrian re-identification.
Based on the above pedestrian re-identification method, the present application also provides a pedestrian re-identification device, as shown in fig. 3, the device includes:
a first obtaining module 41, configured to obtain a first target image in a first scene, and extract a first color feature and a first texture feature corresponding to the first target image;
a modality determining module 42, configured to determine a first modality in which the first target image is located according to the first color feature and the first texture feature;
a metric determining module 43, configured to determine a first metric corresponding to the first modality according to a preset relationship between the modality and the distance metric;
a second obtaining module 44, configured to obtain all second target images in each second scene, and extract a second color feature and a second texture feature corresponding to each second target image;
a calculating module 45, configured to input the first color feature and the first texture feature, and the second color feature and the second texture feature corresponding to each second target image into the first metric to obtain a plurality of feature distances, where the plurality of feature distances include feature distances between the first target image and each second target image;
and a determining module 46, configured to select a third target image corresponding to a minimum value of the feature distances, and determine that the third target image and the first target image are the same person.
It should be noted that, as will be clearly understood by those skilled in the art, the detailed implementation process of the pedestrian re-identification method apparatus and each module may refer to the corresponding description in the foregoing embodiment of the pedestrian re-identification method, and for convenience and brevity of description, no further description is provided here.
The pedestrian re-identification means may be implemented in the form of a computer program that can be run on a pedestrian re-identification system as shown in fig. 4.
Based on the pedestrian re-identification method, the application also provides a computer-readable storage medium, where one or more programs are stored, and the one or more programs can be executed by one or more processors to implement the steps in the pedestrian re-identification method according to the above embodiment.
Based on the pedestrian re-identification method, the present application further provides a pedestrian re-identification system, as shown in fig. 4, which includes at least one processor (processor) 20; a display screen 21; and a memory (memory)22, and may further include a communication Interface (Communications Interface)23 and a bus 24. The processor 20, the display 21, the memory 22 and the communication interface 23 can communicate with each other through the bus 24. The display screen 21 is configured to display a user guidance interface preset in the initial setting mode. The communication interface 23 may transmit information. The processor 20 may call logic instructions in the memory 22 to perform the methods in the embodiments described above.
Furthermore, the logic instructions in the memory 22 may be implemented in software functional units and stored in a computer readable storage medium when sold or used as a stand-alone product.
The memory 22, which is a computer-readable storage medium, may be configured to store a software program, a computer-executable program, such as program instructions or modules corresponding to the methods in the embodiments of the present disclosure. The processor 20 executes the functional application and data processing, i.e. implements the method in the above-described embodiments, by executing the software program, instructions or modules stored in the memory 22.
The memory 22 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to the use of the terminal device, and the like. Further, the memory 22 may include a high speed random access memory and may also include a non-volatile memory. For example, a variety of media that can store program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk, may also be transient storage media.
In addition, the specific processes loaded and executed by the instruction processors in the storage medium and the system are described in detail in the method, and are not stated herein.
Finally, it should be noted that: the above embodiments are only used to illustrate the technical solutions of the present application, and not to limit the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions in the embodiments of the present application.

Claims (13)

1. A pedestrian re-identification method, wherein it comprises:
acquiring a first target image in a first scene, and extracting a first color feature and a first texture feature corresponding to the first target image;
determining a first modality in which the first target image is located according to the first color feature and the first texture feature;
determining a first metric corresponding to the first modality according to a preset relationship between the modality and the distance metric;
acquiring all second target images in each second scene, and extracting second color features and second texture features corresponding to the second target images;
inputting the first color feature and the first texture feature, and a second color feature and a second texture feature corresponding to each second target image into the first metric to obtain a plurality of feature distances, wherein the plurality of feature distances comprise feature distances between the first target image and each second target image;
and selecting a third target image corresponding to the minimum value in the characteristic distances, and judging that the third target image and the first target image are the same person.
2. The pedestrian re-identification method according to claim 1, wherein before the acquiring a first target image in a first scene and extracting a first color feature and a first texture feature corresponding to the first target, the method further comprises:
collecting target images of each pedestrian under the two cameras, and forming a training sample and a test sample according to the collected target images;
segmenting the training sample according to the modes to obtain segmentation data corresponding to each mode;
grouping each segmentation data, and obtaining a distance measurement corresponding to each mode based on the grouped data so as to obtain the preset relationship between the modes and the distance measurement.
3. The pedestrian re-identification method according to claim 2, wherein the method further comprises:
determining the mode of a test sample, and selecting the distance measurement corresponding to the mode according to the mode of the test sample;
and calculating the distance measurement by adopting a KNN clustering algorithm to obtain a feature set.
4. The pedestrian re-identification method according to claim 2, wherein the segmenting the training sample according to modalities to obtain segmented data corresponding to each modality specifically comprises:
extracting color features and texture features of each target image in the training sample;
obtaining a plurality of modes according to the color features and the texture features of each target image;
and segmenting the training sample according to the modes to obtain segmentation data corresponding to each mode.
5. The pedestrian re-identification method according to claim 2, wherein the grouping for each piece of segmented data and deriving the distance metric corresponding to each modality based on the grouped data specifically comprises:
grouping for each segmented data to obtain three sets, the three sets comprising: set1 ab,k,set2 a,kAnd set3 b,k(ii) a Wherein set1 ab,kFor views under two cameras, set2 a,kFor views in which only the camera a is present, set3 b,kIs a view only under camera b;
and obtaining a distance metric corresponding to each modality based on the three sets.
6. The pedestrian re-identification method of claim 5, wherein said deriving a distance metric for each modality based on the three sets further comprises:
set as a set2 a,kAnd set of3 b,kSet positive, and update set2 a,kAnd set of3 b,k
7. The pedestrian re-identification method according to claim 6, wherein said deriving a distance metric corresponding to each modality based on the three sets specifically comprises:
generating a negative gallery sample set according to the three sets;
generating a pair of imposters according to each pair of images in the three sets;
and generating a pair of imposters according to the negative image library sample set and each pair of images to obtain the distance measurement corresponding to each mode.
8. A pedestrian re-identification apparatus, wherein the apparatus comprises:
the device comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring a first target image in a first scene and extracting a first color feature and a first texture feature corresponding to the first target image;
a modality determining module, configured to determine a first modality in which the first target image is located according to the first color feature and the first texture feature;
the measurement determining module is used for determining a first measurement corresponding to the first modality according to a preset relationship between the modality and the distance measurement;
the second obtaining module is used for obtaining all second target images in each second scene and extracting second color features and second texture features corresponding to the second target images;
a calculating module, configured to input the first color feature and the first texture feature, and a second color feature and a second texture feature corresponding to each second target image into the first metric to obtain a plurality of feature distances, where the plurality of feature distances include feature distances between the first target image and each second target image;
and the judging module is used for selecting a third target image corresponding to the minimum value in the characteristic distances and judging that the third target image and the first target image are the same person.
9. A pedestrian re-identification system, comprising: a processor and a memory; the memory has stored thereon a computer readable program executable by the processor; the processor, when executing the computer readable program, is configured to perform the steps of:
acquiring a first target image in a first scene, and extracting a first color feature and a first texture feature corresponding to the first target image;
determining a first modality in which the first target image is located according to the first color feature and the first texture feature;
determining a first metric corresponding to the first modality according to a preset relationship between the modality and the distance metric;
acquiring all second target images in each second scene, and extracting second color features and second texture features corresponding to the second target images;
inputting the first color feature and the first texture feature, and a second color feature and a second texture feature corresponding to each second target image into the first metric to obtain a plurality of feature distances, wherein the plurality of feature distances comprise feature distances between the first target image and each second target image;
and selecting a third target image corresponding to the minimum value in the characteristic distances, and judging that the third target image and the first target image are the same person.
10. The pedestrian re-identification system of claim 9, wherein the processor, when executing the computer readable program, is further configured to:
collecting target images of each pedestrian under the two cameras, and forming a training sample and a test sample according to the collected target images;
segmenting the training sample according to the modes to obtain segmentation data corresponding to each mode;
grouping each segmentation data, and obtaining a distance measurement corresponding to each mode based on the grouped data so as to obtain the preset relationship between the modes and the distance measurement.
11. The pedestrian re-identification system of claim 10, wherein the processor, when executing the computer readable program, is further configured to:
determining the mode of a test sample, and selecting the distance measurement corresponding to the mode according to the mode of the test sample;
and calculating the distance measurement by adopting a KNN clustering algorithm to obtain a feature set.
12. The pedestrian re-identification system of claim 10, wherein the processor, when executing the computer readable program, is further configured to:
extracting color features and texture features of each target image in the training sample;
obtaining a plurality of modes according to the color features and the texture features of each target image;
and segmenting the training sample according to the modes to obtain segmentation data corresponding to each mode.
13. The pedestrian re-identification system of claim 10, wherein the processor, when executing the computer readable program, is further configured to:
grouping for each segmented data to obtain three sets, the three sets comprising: set1 ab,k,set2 a,kAnd set3 b,k(ii) a Wherein set1 ab,kFor the view under two cameras,set2 a,kFor views in which only the camera a is present, set3 b,kIs a view only under camera b;
and obtaining a distance metric corresponding to each modality based on the three sets.
CN202080003202.5A 2020-12-04 2020-12-04 Pedestrian re-identification method, device and system Active CN112823356B (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2020/133843 WO2022116135A1 (en) 2020-12-04 2020-12-04 Person re-identification method, apparatus and system

Publications (2)

Publication Number Publication Date
CN112823356A true CN112823356A (en) 2021-05-18
CN112823356B CN112823356B (en) 2024-05-28

Family

ID=75858129

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202080003202.5A Active CN112823356B (en) 2020-12-04 2020-12-04 Pedestrian re-identification method, device and system

Country Status (2)

Country Link
CN (1) CN112823356B (en)
WO (1) WO2022116135A1 (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103793702A (en) * 2014-02-28 2014-05-14 武汉大学 Pedestrian re-identifying method based on coordination scale learning
US9129148B1 (en) * 2012-11-09 2015-09-08 Orbeus Inc. System, method and apparatus for scene recognition
CN108805189A (en) * 2018-05-30 2018-11-13 大连理工大学 Various visual angles learning distance metric method based on KL divergences
CN108875448A (en) * 2017-05-09 2018-11-23 上海荆虹电子科技有限公司 A kind of pedestrian recognition methods and device again

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105718882B (en) * 2016-01-19 2018-12-18 上海交通大学 A kind of resolution ratio self-adaptive feature extraction and the pedestrian's recognition methods again merged
CN106919909B (en) * 2017-02-10 2018-03-27 华中科技大学 The metric learning method and system that a kind of pedestrian identifies again
CN111539255B (en) * 2020-03-27 2023-04-18 中国矿业大学 Cross-modal pedestrian re-identification method based on multi-modal image style conversion

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9129148B1 (en) * 2012-11-09 2015-09-08 Orbeus Inc. System, method and apparatus for scene recognition
CN103793702A (en) * 2014-02-28 2014-05-14 武汉大学 Pedestrian re-identifying method based on coordination scale learning
CN108875448A (en) * 2017-05-09 2018-11-23 上海荆虹电子科技有限公司 A kind of pedestrian recognition methods and device again
CN108805189A (en) * 2018-05-30 2018-11-13 大连理工大学 Various visual angles learning distance metric method based on KL divergences

Also Published As

Publication number Publication date
CN112823356B (en) 2024-05-28
WO2022116135A1 (en) 2022-06-09

Similar Documents

Publication Publication Date Title
Li et al. Visual tracking via incremental log-euclidean riemannian subspace learning
Cheng et al. Person re-identification by multi-channel parts-based cnn with improved triplet loss function
Satta et al. Fast person re-identification based on dissimilarity representations
Zhang et al. Multi-observation visual recognition via joint dynamic sparse representation
Oszust et al. Polish sign language words recognition with Kinect
CN110503076B (en) Video classification method, device, equipment and medium based on artificial intelligence
WO2021218238A1 (en) Image processing method and image processing apparatus
Wang et al. Multi-spectral dataset and its application in saliency detection
CN110516707B (en) Image labeling method and device and storage medium thereof
CN103942563A (en) Multi-mode pedestrian re-identification technology
Dixit et al. A fast technique to detect copy-move image forgery with reflection and non-affine transformation attacks
CN108921064B (en) Pedestrian re-identification method based on multi-feature fusion
CN111666976B (en) Feature fusion method, device and storage medium based on attribute information
CN113011387A (en) Network training and human face living body detection method, device, equipment and storage medium
Bąk et al. Exploiting feature correlations by Brownian statistics for people detection and recognition
Chandaliya et al. Child face age progression and regression using self-attention multi-scale patch gan
Agbo-Ajala et al. A lightweight convolutional neural network for real and apparent age estimation in unconstrained face images
Cheng et al. Person re-identification by the asymmetric triplet and identification loss function
Sokolova et al. Methods of gait recognition in video
Peng et al. Saliency-aware image-to-class distances for image classification
Di Martino et al. Rethinking shape from shading for spoofing detection
CN111626212A (en) Method and device for identifying object in picture, storage medium and electronic device
Fan et al. Siamese graph convolution network for face sketch recognition: an application using graph structure for face photo-sketch recognition
CN112823356B (en) Pedestrian re-identification method, device and system
Méndez-Llanes et al. On the use of local fixations and quality measures for deep face recognition

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant