CN113255598A - Pedestrian re-identification method based on Transformer - Google Patents

Pedestrian re-identification method based on Transformer Download PDF

Info

Publication number
CN113255598A
CN113255598A CN202110723088.XA CN202110723088A CN113255598A CN 113255598 A CN113255598 A CN 113255598A CN 202110723088 A CN202110723088 A CN 202110723088A CN 113255598 A CN113255598 A CN 113255598A
Authority
CN
China
Prior art keywords
pedestrian
image
human body
model
key points
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110723088.XA
Other languages
Chinese (zh)
Other versions
CN113255598B (en
Inventor
王乾宇
周金明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Inspector Intelligent Technology Co Ltd
Original Assignee
Nanjing Inspector Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Inspector Intelligent Technology Co Ltd filed Critical Nanjing Inspector Intelligent Technology Co Ltd
Priority to CN202110723088.XA priority Critical patent/CN113255598B/en
Publication of CN113255598A publication Critical patent/CN113255598A/en
Application granted granted Critical
Publication of CN113255598B publication Critical patent/CN113255598B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/103Static body considered as a whole, e.g. static pedestrian or occupant recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Library & Information Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Molecular Biology (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Evolutionary Biology (AREA)
  • Biomedical Technology (AREA)
  • Human Computer Interaction (AREA)
  • Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Image Analysis (AREA)
  • Traffic Control Systems (AREA)

Abstract

The invention discloses a pedestrian re-identification method based on a Transformer, which comprises the following steps: step 1, constructing a bottom library; the bottom library is a picture containing the whole body of the target person; detecting a pedestrian frame of a pedestrian in a picture by using a YOLOv3 model, intercepting an image corresponding to the pedestrian frame R in an original picture, detecting key points of a human body by using an HRNet model, intercepting image blocks corresponding to the key points, sending input vectors Zi of all the key points into a transform network to obtain characteristics of the pedestrian, and storing the characteristics of the pedestrian and corresponding target personnel information into a base; and 2, extracting the features of the pedestrian to be detected according to the method in the step 1, and 3, re-identifying and matching the pedestrian. The image is captured by taking the key points of the human body as the center and is used as a Transformer to input and extract features, so that the accuracy of pedestrian re-identification is improved.

Description

Pedestrian re-identification method based on Transformer
Technical Field
The invention relates to the field of image recognition technology research, in particular to pedestrian re-recognition research, and specifically relates to a pedestrian re-recognition method based on a Transformer.
Background
The pedestrian re-identification is a technology for judging whether two pedestrian images belong to the same person by using computer vision, and is widely applied to searching for specific people in a monitoring scene. At present, most of the existing re-identification technologies adopt a convolutional neural network to extract the features of pedestrians, but the features extracted by the convolutional neural network are influenced by convolution and down sampling, and detailed information is lost, so that certain limitations are realized.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention provides a pedestrian re-identification method based on a Transformer, which can improve the accuracy of pedestrian re-identification. The technical scheme is as follows:
step 1, constructing a bottom library;
the bottom library is a picture containing the whole body of the target person;
detecting a pedestrian frame of a pedestrian in a picture using a first modelR=(x,y,w,h)Wherein(x,y)The point at the upper left corner of the pedestrian frame, w is the width of the pedestrian frame, and h is the height of the pedestrian frame;
intercepting pedestrian frame in original pictureRDetecting a human body key point P = (P) by using a second model according to the corresponding image1,p2,…,pk) K is the number of key points of the human body;
for each human body key point piAdding the coordinates of the upper left corner of the pedestrian frame R to restore the position of the original image to obtain the coordinates Q of the key points of the human body in the original imageiUsing the coordinates Q of each original image human body key pointiTaking the key point as the center, intercepting the image block corresponding to the key point, expanding outwards by taking the key point as the center in an intercepting mode to obtain a rectangular or circular image block, flattening the intercepted image block into an image vector Ii
Adding position codes in front of the image vectors, wherein the position codes comprise coordinates of key points of a human body and fixed random codes, namely Ji=(xpi,ypi,ri) Wherein (x)pi,ypi) Is a human body key point piCoordinate Q ofi,riFor random encoding, the random encoding satisfies the following conditions: (1) r isiTo obey an n-dimensional random vector of an n-dimensional Gaussian distribution, (2) riThe modular length of (1) is fixed, (3) any two key points have different corresponding random codes;
finally, connecting the position code and the image vector to obtain an input vector Zi=concat(Ji,Ii);
Inputting vectors Z of all key pointsiSending the pedestrian feature into a Transformer network to obtain the pedestrian feature f,wherein f = T (Z)1,Z2,…,Zk) Finally, the characteristics F, F = (F) of all target persons are obtained1,f2…, fm) (ii) a And storing the characteristic F and the corresponding target personnel information into a bottom library.
Step 2, extracting the pedestrian features to be detected,
acquiring a pedestrian image to be detected;
detecting the positions and sizes of all pedestrians in the pedestrian image by using a first model to obtain a plurality of pedestrian frames Ri=(xi,yi,wi,hi) Whereinxi,yi As a point at the upper left corner of the pedestrian frame, wiWidth of pedestrian frame, hiIs the height of the pedestrian frame; adjusting the aspect ratio or resolution of the pedestrian image is the same as the requirement of the first model in step 1.
For each detected pedestrian frame, intercepting the pedestrian frame from the original pedestrian imageRCorresponding image, using the second model to detect the key point P of human bodyi,j=(pi,1,pi,2,…,pi,k) K is the number of key points of the human body, and j belongs to {1,2, …, k }; the resolution of the truncated image is adjusted to the same requirements as the second model in step 1.
For each human body key point pi,jAdding the coordinates of the upper left corner of the pedestrian frame R to restore the position of the original pedestrian image to obtain the coordinates Q of the key points of the human body in the original pedestrian imagei,jWith each human body key point coordinate Qi,jTaking the key point as the center, intercepting the image block corresponding to the key point, expanding outwards by taking the key point as the center in an intercepting mode to obtain a rectangular or circular image block, flattening the intercepted image block into an image vector Ii,j(ii) a The specific interception is the same as the interception employed in step 1.
Adding position codes in front of the image vectors, wherein the position codes comprise coordinates of key points of a human body and fixed random codes, namely Ji=(xpi,j, ypi,j,ri,j) Wherein (x)pi,j, ypi,j) Is a human body key point pi,jCoordinate Q ofi,j,ri,jIs randomEncoding, the random encoding satisfying the following condition: (1) r isi,jTo obey an n-dimensional random vector of an n-dimensional Gaussian distribution, (2) ri,jThe modular length of (1) is fixed, and (3) the random codes corresponding to any two key points are different.
Finally, connecting the position code and the image vector to obtain an input vector Zi,j=concat(Ji,j,Ii,j);
Image Z of all key pointsi,1,Zi,2,…,Zi,kSending the pedestrian features into a Transformer network to finally obtain the pedestrian features gi,gi =T(Zi,1,Zi,2,…,Zi,k)。
Step 3, carrying out pedestrian re-identification matching;
characterizing the pedestrian giAnd comparing the pedestrian with the characteristics in the bottom library to determine whether the detected pedestrian is the target person.
Preferably, the first model in step 1 is a YOLOv3 model.
Preferably, the HRNet model is used as the second model in step 1.
Preferably, when the first model is used to detect the pedestrian frame in step 1, if the aspect ratio or the resolution of the picture does not meet the requirement of the first model, black borders are added to the picture, so that the aspect ratio of the picture is adjusted to the aspect ratio corresponding to the first model, and the picture is scaled to the resolution required by the first model.
Preferably, when the second model is used to detect the key points of the human body, if the resolution of the captured image does not meet the requirement of the second model, the image is scaled to the resolution required by the second model.
Preferably, the human key points correspond to the right ankle, the right knee, the right hip, the left knee, the left ankle, the right wrist, the right elbow, the right shoulder, the left elbow, the left wrist, the neck, the head center and the body center of the human body.
Preferably, the intercepting manner in step 1 is expanded outward with the key point as the center, specifically: taking the key as the center, a rectangle with the side length being the width 1/3 of the pedestrian frame is cut out, or a circle with the radius being the width 1/6 of the pedestrian frame is cut out.
Preferably, step 3 specifically comprises: calculating pedestrian characteristics giAnd when the maximum similarity is greater than the similarity threshold value with the similarity s of each bottom library feature in the bottom libraries, the matching is successful, namely the detected pedestrian belongs to the target person in the bottom libraries, otherwise, the matching is failed.
Preferably, step 3 calculates pedestrian features giThe similarity s with each bottom library feature in the bottom library specifically is as follows: for each bottom library feature f in the bottom libraryaAnd a belongs to {1,2, …, m }, and calculating the pedestrian characteristic giAnd faThe degree of similarity s of (a) to (b),
Figure 933756DEST_PATH_IMAGE001
the maximum similarity and the corresponding target person information t are as follows:
Figure DEST_PATH_IMAGE002
setting a similarity threshold sthresholdIf s ismax>sthresholdAnd if the matching is successful, returning the target personnel information corresponding to the t, otherwise, failing to match.
Compared with the prior art, one of the technical schemes has the following beneficial effects: by taking the key points of the human body as the center, the image is captured and used as the transform input extraction features, the key point images are ensured to be complete when each key point part is sent into the network, so that the network can better extract the features of the pedestrian, and the accuracy of pedestrian re-identification is improved. Meanwhile, the image is intercepted and sent to the network on the basis of the key points, and the influence of the background is also removed.
Detailed Description
In order to clarify the technical solution and the working principle of the present invention, the embodiments of the present disclosure will be described in further detail below. All the above optional technical solutions may be combined arbitrarily to form the optional embodiments of the present disclosure, and are not described herein again.
The terms "step 1," "step 2," "step 3," and the like in the description and claims of this application are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It should be understood that the data so used may be interchanged under appropriate circumstances such that the embodiments of the application described herein may be practiced in sequences other than those described herein.
The embodiment of the disclosure provides a pedestrian re-identification method based on a Transformer, which mainly comprises the following steps:
step 1, constructing a bottom library;
the bottom library is a picture containing the whole body of the target person;
detecting a pedestrian frame of a pedestrian in a picture using a first model (YOLOv 3 model)R=(x,y,w,h)Wherein(x,y)Is the point at the top left corner of the pedestrian frame, w is the width of the pedestrian frame, and h is the height of the pedestrian frame.
Preferably, when the first model (YOLOv 3 model) is used to detect the pedestrian frame, if the aspect ratio or resolution of the picture does not meet the requirements of the first model (YOLOv 3 model), black edges are added to the picture, so that the aspect ratio of the picture is adjusted to the aspect ratio (for example, 16: 9) corresponding to the first model (YOLOv 3 model), and the picture is scaled to the resolution (for example, 512 × 288) required by the first model (YOLOv 3 model).
Intercepting pedestrian frame in original pictureRDetecting a human body key point P = (P) by using a second model (HRNet model) in the corresponding image1,p2,…,pk) And k is the number of key points of the human body.
Preferably, when the second model (HRNet model) is used to detect the human body key points, if the resolution of the cut-out image does not meet the requirement of the second model (HRNet model), the image is scaled to the resolution required by the second model (HRNet model) (for example, the image is scaled to 128 × 256 resolution).
Preferably, the human key points correspond to the right ankle, the right knee, the right hip, the left knee, the left ankle, the right wrist, the right elbow, the right shoulder, the left elbow, the left wrist, the neck, the head center and the body center of the human body.
For each human body key point piAdding the coordinates of the upper left corner of the pedestrian frame R to restore the position of the original image to obtain the coordinates Q of the key points of the human body in the original imageiUsing the coordinates Q of each original image human body key pointiTaking the key point as the center, intercepting the image block corresponding to the key point, expanding outwards by taking the key point as the center in an intercepting mode to obtain a rectangular or circular image block, flattening the intercepted image block into an image vector Ii
Preferably, the intercepting mode is expanded outwards by taking the key point as a center, and specifically comprises the following steps: taking the key as the center, a rectangle with the side length being the width 1/3 of the pedestrian frame is cut out, or a circle with the radius being the width 1/6 of the pedestrian frame is cut out.
The existing method directly divides the image of the pedestrian into image blocks with fixed length according to the grids, so that the characteristics of the pedestrian can be divided, for example, the head image is divided into a left image and a right image, the images corresponding to the key points are the images of human joints and the head and the body, the images of the joint parts can reflect the posture of the human body, and the related information of clothes is provided to a certain extent, the head information contains the related information of human faces, and the images of the body part can help to identify the related information of the clothes. Meanwhile, the image is intercepted and sent to the network on the basis of the key points, and the influence of the background is also removed.
Adding position codes in front of the image vectors, wherein the position codes comprise coordinates of key points of a human body and fixed random codes, namely Ji=(xpi,ypi,ri) Wherein (x)pi,ypi) Is a human body key point piCoordinate Q ofi,riFor random encoding, the random encoding satisfies the following conditions: (1) r isiTo obey n-dimensional random vectors of n-dimensional Gaussian distribution (such as taking n to 8), (2) riFixed die length (e.g. | r)iI | = 1), (3) any two key points, which have different corresponding random codes.
Finally, connecting the position code and the image vector to obtain an input vector Zi=concat(Ji,Ii) (ii) a The positions of the human key points are added into the position codes of the input vectors, so that the relative positions of the human key points in the space can be expressed, and the model is helped to learn the human posture information; fixed random codes are added into position codes to represent different human body key points, the different human body key points are semantically distinguished, the fixed mould long random vector can be adopted to express that the human body key points are unordered for pedestrian re-identification, and the importance is equivalent.
Inputting vectors Z of all key pointsiSending the pedestrian features into a Transformer network to obtain pedestrian features f, wherein f = T (Z)1,Z2,…,Zk) Finally, the characteristics F, F = (F) of all target persons are obtained1,f2…, fm) (ii) a Storing the characteristic F and the corresponding target personnel information into a bottom library; the transform is adopted to extract features, the encoder of the transform model strengthens the extraction of local features, the decoder of the transform model learns the association between local features and global features, detailed information is kept as much as possible, and the accuracy of identification is improved.
Step 2, extracting the pedestrian features to be detected,
acquiring a pedestrian image to be detected;
the first model (YOLOv 3 model) is used for detecting the positions and sizes of all pedestrians in the pedestrian image to obtain a plurality of pedestrian frames Ri=(xi,yi,wi,hi) Whereinxi,yi As a point at the upper left corner of the pedestrian frame, wiWidth of pedestrian frame, hiIs the height of the pedestrian frame; the aspect ratio or resolution of the pedestrian image is adjusted to be the same as the requirement of the first model (YOLOv 3 model) in step 1.
For each detected pedestrian frame, intercepting the pedestrian frame from the original pedestrian imageRDetecting key point P of human body by using second model (HRNet model) corresponding to the imagei,j=(pi,1,pi,2,…,pi,k) K being key points of the human bodyThe number, j ∈ {1,2, …, k }; the resolution of the cut-out image is adjusted to the same level as the second model (HRNet model) in step 1.
For each human body key point pi,jAdding the coordinates of the upper left corner of the pedestrian frame R to restore the position of the original pedestrian image to obtain the coordinates Q of the key points of the human body in the original pedestrian imagei,jWith each human body key point coordinate Qi,jTaking the key point as the center, intercepting the image block corresponding to the key point, expanding outwards by taking the key point as the center in an intercepting mode to obtain a rectangular or circular image block, flattening the intercepted image block into an image vector Ii,j(ii) a The specific interception is the same as the interception employed in step 1.
Adding position codes in front of the image vectors, wherein the position codes comprise coordinates of key points of a human body and fixed random codes, namely Ji=(xpi,j, ypi,j,ri,j) Wherein (x)pi,j, ypi,j) Is a human body key point pi,jCoordinate Q ofi,j,ri,jFor random encoding, the random encoding satisfies the following conditions: (1) r isi,jTo obey n-dimensional random vectors of n-dimensional Gaussian distribution (such as taking n to 8), (2) ri,jFixed die length (e.g. | r)i,jI | = 1), (3) any two key points, which have different corresponding random codes.
Finally, connecting the position code and the image vector to obtain an input vector Zi,j=concat(Ji,j,Ii,j);
Image Z of all key pointsi,1,Zi,2,…,Zi,kSending the pedestrian features into a Transformer network to finally obtain the pedestrian features gi,gi =T(Zi,1,Zi,2,…,Zi,k)。
Characterizing the pedestrian giAnd comparing the pedestrian with the features in the bottom library to determine whether the pedestrian is the target person.
Step 3, carrying out pedestrian re-identification matching;
calculating pedestrian characteristics giAnd the similarity s of each bottom library feature in the bottom library, when the maximum similarity is greater than the similarity threshold value, the matching is successful,namely, the detected pedestrian belongs to the target person in the bottom bank, otherwise, the matching fails.
Preferably, step 3 calculates pedestrian features giThe similarity s with each bottom library feature in the bottom library specifically is as follows: for each bottom library feature f in the bottom libraryaAnd a belongs to {1,2, …, m }, and calculating the pedestrian characteristic giAnd faThe degree of similarity s of (a) to (b),
Figure 8023DEST_PATH_IMAGE001
the maximum similarity and the corresponding target person information t are as follows:
Figure 739218DEST_PATH_IMAGE002
setting a similarity threshold sthresholdIf s ismax>sthresholdAnd if the matching is successful, returning the target personnel information corresponding to the t, otherwise, failing to match.
The invention has been described above by way of example, it is obvious that the specific implementation of the invention is not limited by the above-described manner, and that various insubstantial modifications are possible using the method concepts and technical solutions of the invention; or directly apply the conception and the technical scheme of the invention to other occasions without improvement and equivalent replacement, and the invention is within the protection scope of the invention.

Claims (9)

1. A pedestrian re-identification method based on a Transformer is characterized by comprising the following steps:
step 1, constructing a bottom library;
the bottom library is a picture containing the whole body of the target person;
detecting a pedestrian frame R = (x, y, w, h) of a pedestrian in the picture by using a first model, wherein (x, y) is a point at the upper left corner of the pedestrian frame, w is the width of the pedestrian frame, and h is the height of the pedestrian frame;
intercepting an image corresponding to the pedestrian frame R in the original image, and detecting the key of the human body by using a second modelPoint P = (P)1,p2,…,pk) K is the number of key points of the human body;
for each human body key point piAdding the coordinates of the upper left corner of the pedestrian frame R to restore the position of the original image to obtain the coordinates Q of the key points of the human body in the original imageiUsing the coordinates Q of each original image human body key pointiTaking the key point as the center, intercepting the image block corresponding to the key point, expanding outwards by taking the key point as the center in an intercepting mode to obtain a rectangular or circular image block, flattening the intercepted image block into an image vector Ii
Adding position codes in front of the image vectors, wherein the position codes comprise coordinates of key points of a human body and fixed random codes, namely Ji=(xpi,ypi,ri) Wherein (x)pi,ypi) Is a human body key point piCoordinate Q ofi,riFor random encoding, the random encoding satisfies the following conditions: (1) r isiTo obey an n-dimensional random vector of an n-dimensional Gaussian distribution, (2) riThe modular length of (1) is fixed, (3) any two key points have different corresponding random codes;
finally, connecting the position code and the image vector to obtain an input vector Zi=concat(Ji,Ii);
Inputting vectors Z of all key pointsiSending the pedestrian features into a Transformer network to obtain pedestrian features f, wherein f = T (Z)1,Z2,…,Zk) Finally, the characteristics F, F = (F) of all target persons are obtained1,f2…, fm) (ii) a Storing the characteristic F and the corresponding target personnel information into a bottom library;
step 2, extracting the pedestrian features to be detected,
acquiring a pedestrian image to be detected;
detecting the positions and sizes of all pedestrians in the pedestrian image by using a first model to obtain a plurality of pedestrian frames Ri=(xi,yi,wi,hi) Whereinxi,yi As a point at the upper left corner of the pedestrian frame, wiWidth of pedestrian frame, hiIs the height of the pedestrian frame;adjusting the aspect ratio or resolution of the pedestrian image to be the same as the requirement of the first model in the step 1;
for each detected pedestrian frame, intercepting the pedestrian frame from the original pedestrian imageRCorresponding image, using the second model to detect the key point P of human bodyi,j=(pi,1,pi,2,…,pi,k) K is the number of key points of the human body, and j belongs to {1,2, …, k }; adjusting the resolution of the cut-out image to be the same as the requirement of the second model in the step 1;
for each human body key point pi,jAdding the coordinates of the upper left corner of the pedestrian frame R to restore the position of the original pedestrian image to obtain the coordinates Q of the key points of the human body in the original pedestrian imagei,jWith each human body key point coordinate Qi,jTaking the key point as the center, intercepting the image block corresponding to the key point, expanding outwards by taking the key point as the center in an intercepting mode to obtain a rectangular or circular image block, flattening the intercepted image block into an image vector Ii,j(ii) a The specific intercepting mode is the same as the intercepting mode adopted in the step 1;
adding position codes in front of the image vectors, wherein the position codes comprise coordinates of key points of a human body and fixed random codes, namely Ji=(xpi,j, ypi,j, ri,j) Wherein (x)pi,j, ypi,j) Is a human body key point pi,jCoordinate Q ofi,j,ri,jFor random encoding, the random encoding satisfies the following conditions: (1) r isi,jTo obey an n-dimensional random vector of an n-dimensional Gaussian distribution, (2) ri,jThe modular length of (1) is fixed, (3) any two key points have different corresponding random codes;
finally, connecting the position code and the image vector to obtain an input vector Zi,j=concat(Ji,j, Ii,j);
Image Z of all key pointsi,1,Zi,2,…,Zi,kSending the pedestrian features into a Transformer network to finally obtain the pedestrian features gi,gi=T(Zi,1,Zi,2,…,Zi,k);
Step 3, carrying out pedestrian re-identification matching;
characterizing the pedestrian giAnd comparing the pedestrian with the characteristics in the bottom library to determine whether the detected pedestrian is the target person.
2. The method for pedestrian re-identification based on Transformer as claimed in claim 1, wherein the first model in step 1 is a YOLOv3 model.
3. The method for pedestrian re-identification based on Transformer as claimed in claim 1, wherein the HRNet model is selected as the second model in step 1.
4. The method as claimed in claim 1, wherein when the first model is used to detect the pedestrian frame in step 1, if the aspect ratio or resolution of the picture does not meet the requirement of the first model, black borders are added to the picture, so that the aspect ratio of the picture is adjusted to the aspect ratio corresponding to the first model, and the picture is scaled to the resolution required by the first model.
5. The method as claimed in claim 1, wherein in the step 1, when the second model is used to detect the key points of the human body, if the resolution of the captured image does not meet the requirement of the second model, the image is scaled to the resolution required by the second model.
6. The method for re-identifying pedestrians based on Transformer according to claim 1, wherein the intercepting manner in step 1 is expanded outwards with a key point as a center, specifically: taking the key as the center, a rectangle with the side length being the width 1/3 of the pedestrian frame is cut out, or a circle with the radius being the width 1/6 of the pedestrian frame is cut out.
7. The transform-based pedestrian weight recognition method of any one of claims 1-6, wherein the human body key points correspond to a right ankle, a right knee, a right hip, a left knee, a left ankle, a right wrist, a right elbow, a right shoulder, a left elbow, a left wrist, a neck, a head center, and a body center of the human body.
8. The method for re-identifying pedestrians based on Transformer according to claim 7, wherein the step 3 is specifically as follows: calculating pedestrian characteristics giAnd when the maximum similarity is greater than the similarity threshold value with the similarity s of each bottom library feature in the bottom libraries, the matching is successful, namely the detected pedestrian belongs to the target person in the bottom libraries, otherwise, the matching is failed.
9. The Transformer-based pedestrian re-identification method according to claim 8, wherein the pedestrian feature g is calculated in step 3iThe similarity s with each bottom library feature in the bottom library specifically is as follows: for each bottom library feature f in the bottom libraryaAnd a belongs to {1,2, …, m }, and calculating the pedestrian characteristic giAnd faThe degree of similarity s of (a) to (b),
Figure 687969DEST_PATH_IMAGE001
the maximum similarity and the corresponding target person information t are as follows:
Figure DEST_PATH_IMAGE003
setting a similarity threshold sthresholdIf s ismax>sthresholdAnd if the matching is successful, returning the target personnel information corresponding to the t, otherwise, failing to match.
CN202110723088.XA 2021-06-29 2021-06-29 Pedestrian re-identification method based on Transformer Active CN113255598B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110723088.XA CN113255598B (en) 2021-06-29 2021-06-29 Pedestrian re-identification method based on Transformer

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110723088.XA CN113255598B (en) 2021-06-29 2021-06-29 Pedestrian re-identification method based on Transformer

Publications (2)

Publication Number Publication Date
CN113255598A true CN113255598A (en) 2021-08-13
CN113255598B CN113255598B (en) 2021-09-28

Family

ID=77190012

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110723088.XA Active CN113255598B (en) 2021-06-29 2021-06-29 Pedestrian re-identification method based on Transformer

Country Status (1)

Country Link
CN (1) CN113255598B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113688271A (en) * 2021-10-25 2021-11-23 浙江大华技术股份有限公司 Archive searching method and related device for target object
CN114091548A (en) * 2021-09-23 2022-02-25 昆明理工大学 Vehicle cross-domain re-identification method based on key point and graph matching
CN118015662A (en) * 2024-04-09 2024-05-10 沈阳二一三电子科技有限公司 Transformer multi-head self-attention mechanism-based pedestrian re-recognition method crossing cameras

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107832672A (en) * 2017-10-12 2018-03-23 北京航空航天大学 A kind of pedestrian's recognition methods again that more loss functions are designed using attitude information
CN110334675A (en) * 2019-07-11 2019-10-15 山东大学 A kind of pedestrian's recognition methods again based on skeleton key point segmentation and column convolution
US20200134321A1 (en) * 2018-02-12 2020-04-30 Beijing Sensetime Technology Development Co., Ltd. Pedestrian re-identification methods and apparatuses, electronic devices, and storage media

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107832672A (en) * 2017-10-12 2018-03-23 北京航空航天大学 A kind of pedestrian's recognition methods again that more loss functions are designed using attitude information
US20200134321A1 (en) * 2018-02-12 2020-04-30 Beijing Sensetime Technology Development Co., Ltd. Pedestrian re-identification methods and apparatuses, electronic devices, and storage media
CN110334675A (en) * 2019-07-11 2019-10-15 山东大学 A kind of pedestrian's recognition methods again based on skeleton key point segmentation and column convolution

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
ZHAO HAIYU等: "《Spindle Net: Person Re-Identification With Human Body Region Guided Feature Decomposition and Fusion》", 《PROCEEDINGS OF THE IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION》 *
陈首兵等: "基于孪生网络和重排序的行人重识别", 《计算机应用》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114091548A (en) * 2021-09-23 2022-02-25 昆明理工大学 Vehicle cross-domain re-identification method based on key point and graph matching
CN113688271A (en) * 2021-10-25 2021-11-23 浙江大华技术股份有限公司 Archive searching method and related device for target object
CN118015662A (en) * 2024-04-09 2024-05-10 沈阳二一三电子科技有限公司 Transformer multi-head self-attention mechanism-based pedestrian re-recognition method crossing cameras

Also Published As

Publication number Publication date
CN113255598B (en) 2021-09-28

Similar Documents

Publication Publication Date Title
CN113255598B (en) Pedestrian re-identification method based on Transformer
CN109657631B (en) Human body posture recognition method and device
CN103530599B (en) The detection method and system of a kind of real human face and picture face
CN108090435B (en) Parking available area identification method, system and medium
WO2020042419A1 (en) Gait-based identity recognition method and apparatus, and electronic device
CN108305283B (en) Human behavior recognition method and device based on depth camera and basic gesture
JP4951498B2 (en) Face image recognition device, face image recognition method, face image recognition program, and recording medium recording the program
CN111914642B (en) Pedestrian re-identification method, device, equipment and medium
CN104200200B (en) Fusion depth information and half-tone information realize the system and method for Gait Recognition
CN109325412A (en) Pedestrian recognition method, device, computer equipment and storage medium
CN104008370A (en) Video face identifying method
CN103577815A (en) Face alignment method and system
CN112528812A (en) Pedestrian tracking method, pedestrian tracking device and pedestrian tracking system
CN112200056A (en) Face living body detection method and device, electronic equipment and storage medium
CN112528902A (en) Video monitoring dynamic face recognition method and device based on 3D face model
CN110991258A (en) Face fusion feature extraction method and system
CN112396036A (en) Method for re-identifying blocked pedestrians by combining space transformation network and multi-scale feature extraction
CN109784261B (en) Pedestrian segmentation and identification method based on machine vision
Tamimi et al. Vision based localization of mobile robots using kernel approaches
JP7259921B2 (en) Information processing device and control method
CN112380966B (en) Monocular iris matching method based on feature point re-projection
CN114898287A (en) Method and device for dinner plate detection early warning, electronic equipment and storage medium
JP7479809B2 (en) Image processing device, image processing method, and program
CN113591685A (en) Geographic object spatial relationship identification method and system based on multi-scale pooling
JPH11283031A (en) Object recognition device and method therefor

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
PE01 Entry into force of the registration of the contract for pledge of patent right

Denomination of invention: A pedestrian recognition method based on transformer

Effective date of registration: 20220705

Granted publication date: 20210928

Pledgee: China Construction Bank Corporation Nanjing Jianye sub branch

Pledgor: Nanjing inspector Intelligent Technology Co.,Ltd.

Registration number: Y2022980009897

PE01 Entry into force of the registration of the contract for pledge of patent right
PC01 Cancellation of the registration of the contract for pledge of patent right

Date of cancellation: 20230720

Granted publication date: 20210928

Pledgee: China Construction Bank Corporation Nanjing Jianye sub branch

Pledgor: Nanjing inspector Intelligent Technology Co.,Ltd.

Registration number: Y2022980009897

PC01 Cancellation of the registration of the contract for pledge of patent right
PE01 Entry into force of the registration of the contract for pledge of patent right

Denomination of invention: A Transformer based pedestrian re recognition method

Effective date of registration: 20230803

Granted publication date: 20210928

Pledgee: China Construction Bank Corporation Nanjing Jianye sub branch

Pledgor: Nanjing inspector Intelligent Technology Co.,Ltd.

Registration number: Y2023980050832

PE01 Entry into force of the registration of the contract for pledge of patent right
PC01 Cancellation of the registration of the contract for pledge of patent right

Granted publication date: 20210928

Pledgee: China Construction Bank Corporation Nanjing Jianye sub branch

Pledgor: Nanjing inspector Intelligent Technology Co.,Ltd.

Registration number: Y2023980050832