CN113255598B - Pedestrian re-identification method based on Transformer - Google Patents

Pedestrian re-identification method based on Transformer Download PDF

Info

Publication number
CN113255598B
CN113255598B CN202110723088.XA CN202110723088A CN113255598B CN 113255598 B CN113255598 B CN 113255598B CN 202110723088 A CN202110723088 A CN 202110723088A CN 113255598 B CN113255598 B CN 113255598B
Authority
CN
China
Prior art keywords
pedestrian
image
human body
model
key points
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110723088.XA
Other languages
Chinese (zh)
Other versions
CN113255598A (en
Inventor
王乾宇
周金明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Inspector Intelligent Technology Co Ltd
Original Assignee
Nanjing Inspector Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Inspector Intelligent Technology Co Ltd filed Critical Nanjing Inspector Intelligent Technology Co Ltd
Priority to CN202110723088.XA priority Critical patent/CN113255598B/en
Publication of CN113255598A publication Critical patent/CN113255598A/en
Application granted granted Critical
Publication of CN113255598B publication Critical patent/CN113255598B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/103Static body considered as a whole, e.g. static pedestrian or occupant recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components

Abstract

The invention discloses a pedestrian re-identification method based on a Transformer, which comprises the following steps: step 1, constructing a bottom library; the bottom library is a picture containing the whole body of the target person; detecting a pedestrian frame of a pedestrian in a picture by using a YOLOv3 model, intercepting an image corresponding to the pedestrian frame R in an original picture, detecting key points of a human body by using an HRNet model, intercepting image blocks corresponding to the key points, sending input vectors Zi of all the key points into a transform network to obtain characteristics of the pedestrian, and storing the characteristics of the pedestrian and corresponding target personnel information into a base; and 2, extracting the features of the pedestrian to be detected according to the method in the step 1, and 3, re-identifying and matching the pedestrian. The image is captured by taking the key points of the human body as the center and is used as a Transformer to input and extract features, so that the accuracy of pedestrian re-identification is improved.

Description

Pedestrian re-identification method based on Transformer
Technical Field
The invention relates to the field of image recognition technology research, in particular to pedestrian re-recognition research, and specifically relates to a pedestrian re-recognition method based on a Transformer.
Background
The pedestrian re-identification is a technology for judging whether two pedestrian images belong to the same person by using computer vision, and is widely applied to searching for specific people in a monitoring scene. At present, most of the existing re-identification technologies adopt a convolutional neural network to extract the features of pedestrians, but the features extracted by the convolutional neural network are influenced by convolution and down sampling, and detailed information is lost, so that certain limitations are realized.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention provides a pedestrian re-identification method based on a Transformer, which can improve the accuracy of pedestrian re-identification. The technical scheme is as follows:
step 1, constructing a bottom library;
the bottom library is a picture containing the whole body of the target person;
detecting a pedestrian frame of a pedestrian in a picture using a first modelR=(x,y,w,h)Wherein(x,y)The point at the upper left corner of the pedestrian frame, w is the width of the pedestrian frame, and h is the height of the pedestrian frame;
intercepting pedestrian frame in original pictureRDetecting a human body key point P = (P) by using a second model according to the corresponding image1,p2,…,pk) K is the number of key points of the human body;
for each human body key point piAdding the coordinates of the upper left corner of the pedestrian frame R to restore the position of the original image to obtain the human body in the original imageCoordinates Q of the key pointsiUsing the coordinates Q of each original image human body key pointiTaking the key point as the center, intercepting the image block corresponding to the key point, expanding outwards by taking the key point as the center in an intercepting mode to obtain a rectangular or circular image block, flattening the intercepted image block into an image vector Ii,i∈{1,2,…,k}。
Adding position codes in front of the image vectors, wherein the position codes comprise coordinates of key points of a human body and fixed random codes, namely Ji=(xpi,ypi,ri) Wherein (x)pi,ypi) Is a human body key point piCoordinate Q ofi,riFor random encoding, the random encoding satisfies the following conditions: (1) r isiTo obey an n-dimensional random vector of an n-dimensional Gaussian distribution, (2) riThe modular length of (1) is fixed, (3) any two key points have different corresponding random codes;
finally, connecting the position code and the image vector to obtain an input vector Zi=concat(Ji,Ii);
Inputting vectors Z of all key pointsiSending the pedestrian features into a Transformer network to obtain pedestrian features f, wherein f = T (Z)1,Z2,…,Zk) Finally, the characteristics F, F = (F) of all target persons are obtained1,f2…, fm) (ii) a And storing the characteristic F and the corresponding target personnel information into a bottom library. m refers to the number of target persons; t denotes a Transformer network, T (Z)1,Z2,…,Zk) Input vector Z representing all key pointsiAnd sending the pedestrian features into a transform network to obtain the pedestrian features f.
Step 2, extracting the pedestrian features to be detected,
acquiring a pedestrian image to be detected;
detecting the positions and sizes of all pedestrians in the pedestrian image by using a first model to obtain a plurality of pedestrian frames Rj=(xj,yj,wj,hj) Whereinxj,yj As a point at the upper left corner of the pedestrian frame, wjWidth of pedestrian frame, hjIs the height of the pedestrian frame; adjust the sameThe aspect ratio or resolution of the pedestrian image is the same as the requirement of the first model in step 1.
For each detected pedestrian frame, intercepting the pedestrian frame from the original pedestrian imageRCorresponding image, using the second model to detect the key point P of human bodyj=(pj,1,pj,2,…,pj,k) K is the number of key points of the human body; the resolution of the truncated image is adjusted to the same requirements as the second model in step 1.
For each human body key point pj,iAdding the coordinates of the upper left corner of the pedestrian frame R to restore the position of the original pedestrian image to obtain the coordinates Q of the key points of the human body in the original pedestrian imagej,iWith each human body key point coordinate Qj,iTaking the key point as the center, intercepting the image block corresponding to the key point, expanding outwards by taking the key point as the center in an intercepting mode to obtain a rectangular or circular image block, flattening the intercepted image block into an image vector Vj,i(ii) a The specific interception is the same as the interception employed in step 1.
Adding position codes in front of the image vectors, wherein the position codes comprise coordinates of key points of a human body and fixed random codes, namely Ij=(xpj,i, ypj,i,rj,i) Wherein (x)pj,i, ypj,i) Is a human body key point pj,iCoordinate Q ofj,i,rj,iFor random encoding, the random encoding satisfies the following conditions: (1) r isj,iTo obey an n-dimensional random vector of an n-dimensional Gaussian distribution, (2) rj,iThe modular length of (1) is fixed, and (3) the random codes corresponding to any two key points are different.
Finally, connecting the position code and the image vector to obtain an input vector Zj,i=concat(Ij,i,Vj,i);
Image Z of all key pointsj,1,Zj,2,…,Zj,kSending the pedestrian features into a Transformer network to finally obtain the pedestrian features gjj,gjj =T(Zj,1,Zj,2,…,Zj,k). T denotes a Transformer network, T (Z)j,1,Zj,2,…,Zj,k) Representing all key pointsInput vector Zj,iAnd sending the pedestrian features into a transform network to obtain the pedestrian features f.
Step 3, carrying out pedestrian re-identification matching;
characterizing the pedestrian gjAnd comparing the pedestrian with the characteristics in the bottom library to determine whether the detected pedestrian is the target person.
Preferably, the first model in step 1 is a YOLOv3 model.
Preferably, the HRNet model is used as the second model in step 1.
Preferably, when the first model is used to detect the pedestrian frame in step 1, if the aspect ratio or the resolution of the picture does not meet the requirement of the first model, black borders are added to the picture, so that the aspect ratio of the picture is adjusted to the aspect ratio corresponding to the first model, and the picture is scaled to the resolution required by the first model.
Preferably, when the second model is used to detect the key points of the human body, if the resolution of the captured image does not meet the requirement of the second model, the image is scaled to the resolution required by the second model.
Preferably, the human key points correspond to the right ankle, the right knee, the right hip, the left knee, the left ankle, the right wrist, the right elbow, the right shoulder, the left elbow, the left wrist, the neck, the head center and the body center of the human body.
Preferably, the intercepting manner in step 1 is expanded outward with the key point as the center, specifically: taking the key as the center, a rectangle with the side length being the width 1/3 of the pedestrian frame is cut out, or a circle with the radius being the width 1/6 of the pedestrian frame is cut out.
Preferably, step 3 specifically comprises: calculating pedestrian characteristics gjAnd when the maximum similarity is greater than the similarity threshold value with the similarity s of each bottom library feature in the bottom libraries, the matching is successful, namely the detected pedestrian belongs to the target person in the bottom libraries, otherwise, the matching is failed.
Preferably, step 3 calculates pedestrian features gjThe similarity s with each bottom library feature in the bottom library specifically is as follows: for each bottom library feature f in the bottom libraryaAnd a belongs to {1,2, …, m }, and calculating the pedestrian characteristic gjAnd faThe degree of similarity s of (a) to (b),
Figure 100002_DEST_PATH_IMAGE001
the maximum similarity and the corresponding target person information t are as follows:
Figure 100002_DEST_PATH_IMAGE003
setting a similarity threshold sthresholdIf s ismax>sthresholdAnd if the matching is successful, returning the target personnel information corresponding to the t, otherwise, failing to match.
Compared with the prior art, one of the technical schemes has the following beneficial effects: by taking the key points of the human body as the center, the image is captured and used as the transform input extraction features, the key point images are ensured to be complete when each key point part is sent into the network, so that the network can better extract the features of the pedestrian, and the accuracy of pedestrian re-identification is improved. Meanwhile, the image is intercepted and sent to the network on the basis of the key points, and the influence of the background is also removed.
Detailed Description
In order to clarify the technical solution and the working principle of the present invention, the embodiments of the present disclosure will be described in further detail below. All the above optional technical solutions may be combined arbitrarily to form the optional embodiments of the present disclosure, and are not described herein again.
The terms "step 1," "step 2," "step 3," and the like in the description and claims of this application are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It should be understood that the data so used may be interchanged under appropriate circumstances such that the embodiments of the application described herein may be practiced in sequences other than those described herein.
The embodiment of the disclosure provides a pedestrian re-identification method based on a Transformer, which mainly comprises the following steps:
step 1, constructing a bottom library;
the bottom library is a picture containing the whole body of the target person;
detecting a pedestrian frame of a pedestrian in a picture using a first model (YOLOv 3 model)R=(x,y,w,h)Wherein(x,y)Is the point at the top left corner of the pedestrian frame, w is the width of the pedestrian frame, and h is the height of the pedestrian frame.
Preferably, when the first model (YOLOv 3 model) is used to detect the pedestrian frame, if the aspect ratio or resolution of the picture does not meet the requirements of the first model (YOLOv 3 model), black edges are added to the picture, so that the aspect ratio of the picture is adjusted to the aspect ratio (for example, 16: 9) corresponding to the first model (YOLOv 3 model), and the picture is scaled to the resolution (for example, 512 × 288) required by the first model (YOLOv 3 model).
Intercepting pedestrian frame in original pictureRDetecting a human body key point P = (P) by using a second model (HRNet model) in the corresponding image1,p2,…,pk) And k is the number of key points of the human body.
Preferably, when the second model (HRNet model) is used to detect the human body key points, if the resolution of the cut-out image does not meet the requirement of the second model (HRNet model), the image is scaled to the resolution required by the second model (HRNet model) (for example, the image is scaled to 128 × 256 resolution).
Preferably, the human key points correspond to the right ankle, the right knee, the right hip, the left knee, the left ankle, the right wrist, the right elbow, the right shoulder, the left elbow, the left wrist, the neck, the head center and the body center of the human body.
For each human body key point piAdding the coordinates of the upper left corner of the pedestrian frame R to restore the position of the original image to obtain the coordinates Q of the key points of the human body in the original imageiUsing the coordinates Q of each original image human body key pointiTaking the key point as the center, intercepting the image block corresponding to the key point, expanding outwards by taking the key point as the center in an intercepting mode to obtain a rectangular or circular image block, flattening the intercepted image block into an image vector Ii
Preferably, the intercepting mode is expanded outwards by taking the key point as a center, and specifically comprises the following steps: taking the key as the center, a rectangle with the side length being the width 1/3 of the pedestrian frame is cut out, or a circle with the radius being the width 1/6 of the pedestrian frame is cut out.
The existing method directly divides the image of the pedestrian into image blocks with fixed length according to the grids, so that the characteristics of the pedestrian can be divided, for example, the head image is divided into a left image and a right image, the images corresponding to the key points are the images of human joints and the head and the body, the images of the joint parts can reflect the posture of the human body, and the related information of clothes is provided to a certain extent, the head information contains the related information of human faces, and the images of the body part can help to identify the related information of the clothes. Meanwhile, the image is intercepted and sent to the network on the basis of the key points, and the influence of the background is also removed.
Adding position codes in front of the image vectors, wherein the position codes comprise coordinates of key points of a human body and fixed random codes, namely Ji=(xpi,ypi,ri) Wherein (x)pi,ypi) Is a human body key point piCoordinate Q ofi,riFor random encoding, the random encoding satisfies the following conditions: (1) r isiTo obey n-dimensional random vectors of n-dimensional Gaussian distribution (such as taking n to 8), (2) riFixed die length (e.g. | r)iI | = 1), (3) any two key points, which have different corresponding random codes.
Finally, connecting the position code and the image vector to obtain an input vector Zi=concat(Ji,Ii) (ii) a The positions of the human key points are added into the position codes of the input vectors, so that the relative positions of the human key points in the space can be expressed, and the model is helped to learn the human posture information; adding fixed random codes to position codes to represent different key points of human body, making semantic distinction of different key points of human body, adopting fixed mould long random vector to express key points of human bodyIt is unordered for pedestrian re-identification and the importance is equivalent.
Inputting vectors Z of all key pointsiSending the pedestrian features into a Transformer network to obtain pedestrian features f, wherein f = T (Z)1,Z2,…,Zk) Finally, the characteristics F, F = (F) of all target persons are obtained1,f2…, fm) (ii) a Storing the characteristic F and the corresponding target personnel information into a bottom library; the transform is adopted to extract features, the encoder of the transform model strengthens the extraction of local features, the decoder of the transform model learns the association between local features and global features, detailed information is kept as much as possible, and the accuracy of identification is improved.
Step 2, extracting the pedestrian features to be detected,
acquiring a pedestrian image to be detected;
the first model (YOLOv 3 model) is used for detecting the positions and sizes of all pedestrians in the pedestrian image to obtain a plurality of pedestrian frames Rj=(xj,yj,wj,hj) Whereinxj,yj As a point at the upper left corner of the pedestrian frame, wjWidth of pedestrian frame, hjIs the height of the pedestrian frame; the aspect ratio or resolution of the pedestrian image is adjusted to be the same as the requirement of the first model (YOLOv 3 model) in step 1.
For each detected pedestrian frame, intercepting the pedestrian frame from the original pedestrian imageRDetecting key point P of human body by using second model (HRNet model) corresponding to the imagej=(pj,1,pj,2,…,pj,k) K is the number of key points of the human body; the resolution of the cut-out image is adjusted to the same level as the second model (HRNet model) in step 1.
For each human body key point pj,iAdding the coordinates of the upper left corner of the pedestrian frame R to restore the position of the original pedestrian image to obtain the coordinates Q of the key points of the human body in the original pedestrian imagej,iWith each human body key point coordinate Qj,iTaking the key point as the center, intercepting the image block corresponding to the key point, and outwards expanding the intercepted image block by taking the key point as the center to obtain a rectangular or circular imageFlattening the clipped image block into an image vector Vj,i(ii) a The specific interception is the same as the interception employed in step 1.
Adding position codes in front of the image vectors, wherein the position codes comprise coordinates of key points of a human body and fixed random codes, namely Ij=(xpj,i, ypj,i,rj,i) Wherein (x)pj,i, ypj,i) Is a human body key point pj,iCoordinate Q ofj,i,rj,iFor random encoding, the random encoding satisfies the following conditions: (1) r isj,iTo obey n-dimensional random vectors of n-dimensional Gaussian distribution (such as taking n to 8), (2) rj,iFixed die length (e.g. | r)j,iI | = 1), (3) any two key points, which have different corresponding random codes.
Finally, connecting the position code and the image vector to obtain an input vector Zj,i=concat(Ij,i,Vj,i);
Image Z of all key pointsj,1,Zj,2,…,Zj,kSending the pedestrian features into a Transformer network to finally obtain the pedestrian featuresjgj,gjj =T(Zj,1,Zj,2,…,Zj,k)。
Characterizing the pedestrian gjjAnd comparing the pedestrian with the features in the bottom library to determine whether the pedestrian is the target person.
Step 3, carrying out pedestrian re-identification matching;
calculating pedestrian characteristics gjAnd when the maximum similarity is greater than the similarity threshold value with the similarity s of each bottom library feature in the bottom libraries, the matching is successful, namely the detected pedestrian belongs to the target person in the bottom libraries, otherwise, the matching is failed.
Preferably, step 3 calculates pedestrian features gjThe similarity s with each bottom library feature in the bottom library specifically is as follows: for each bottom library feature f in the bottom libraryaAnd a belongs to {1,2, …, m }, and calculating the pedestrian characteristic gjAnd faThe degree of similarity s of (a) to (b),
Figure 94747DEST_PATH_IMAGE001
the maximum similarity and the corresponding target person information t are as follows:
Figure 489956DEST_PATH_IMAGE004
setting a similarity threshold sthresholdIf s ismax>sthresholdAnd if the matching is successful, returning the target personnel information corresponding to the t, otherwise, failing to match.
The invention has been described above by way of example, it is obvious that the specific implementation of the invention is not limited by the above-described manner, and that various insubstantial modifications are possible using the method concepts and technical solutions of the invention; or directly apply the conception and the technical scheme of the invention to other occasions without improvement and equivalent replacement, and the invention is within the protection scope of the invention.

Claims (9)

1. A pedestrian re-identification method based on a Transformer is characterized by comprising the following steps:
step 1, constructing a bottom library;
the bottom library is a picture containing the whole body of the target person;
detecting a pedestrian frame R = (x, y, w, h) of a pedestrian in the picture by using a first model, wherein (x, y) is a point at the upper left corner of the pedestrian frame, w is the width of the pedestrian frame, and h is the height of the pedestrian frame;
intercepting an image corresponding to the pedestrian frame R in the original picture, and detecting a human body key point P = (P) by using a second model1,p2,…,pk) K is the number of key points of the human body;
for each human body key point piAdding the coordinates of the upper left corner of the pedestrian frame R to restore the position of the original image to obtain the coordinates Q of the key points of the human body in the original imageiUsing the coordinates Q of each original image human body key pointiAs the center, intercepting the image block corresponding to the key point, expanding outwards by taking the key point as the center in an intercepting mode to obtain a rectangular or circular image block, interceptingImage block of flattened image vector Ii,i∈{1,2,…,k};
Adding position codes in front of the image vectors, wherein the position codes comprise coordinates of key points of a human body and fixed random codes, namely Ji=(xpi,ypi,ri) Wherein (x)pi,ypi) Is a human body key point piCoordinate Q ofi,riFor random encoding, the random encoding satisfies the following conditions: (1) r isiTo obey an n-dimensional random vector of an n-dimensional Gaussian distribution, (2) riThe modular length of (1) is fixed, (3) any two key points have different corresponding random codes;
finally, connecting the position code and the image vector to obtain an input vector Zi=concat(Ji,Ii);
Inputting vectors Z of all key pointsiSending the pedestrian features into a Transformer network to obtain pedestrian features f, wherein f = T (Z)1,Z2,…,Zk) Finally, the characteristics F, F = (F) of all target persons are obtained1,f2…, fm) (ii) a Storing the characteristic F and the corresponding target personnel information into a bottom library; m refers to the number of target persons; t denotes a Transformer network, T (Z)1,Z2,…,Zk) Input vector Z representing all key pointsiSending the pedestrian features into a transform network to obtain pedestrian features f;
step 2, extracting the pedestrian features to be detected,
acquiring a pedestrian image to be detected;
detecting the positions and sizes of all pedestrians in the pedestrian image by using a first model to obtain a plurality of pedestrian frames Rj=(xj,yj,wj,hj) Whereinxj,yj As a point at the upper left corner of the pedestrian frame, wjWidth of pedestrian frame, hjIs the height of the pedestrian frame; adjusting the aspect ratio or resolution of the pedestrian image to be the same as the requirement of the first model in the step 1;
for each detected pedestrian frame, intercepting the pedestrian frame from the original pedestrian imageRCorresponding image, detecting person using the second modelBody key point Pj=(pj,1,pj,2,…,pj,k) K is the number of key points of the human body; adjusting the resolution of the cut-out image to be the same as the requirement of the second model in the step 1;
for each human body key point pj,iAdding the coordinates of the upper left corner of the pedestrian frame R to restore the position of the original pedestrian image to obtain the coordinates Q of the key points of the human body in the original pedestrian imagej,iWith each human body key point coordinate Qj,iTaking the key point as the center, intercepting the image block corresponding to the key point, expanding outwards by taking the key point as the center in an intercepting mode to obtain a rectangular or circular image block, flattening the intercepted image block into an image vector Vj,i(ii) a The specific intercepting mode is the same as the intercepting mode adopted in the step 1;
adding position codes in front of the image vectors, wherein the position codes comprise coordinates of key points of a human body and fixed random codes, namely Ij=(xpj,i, ypj,i, rj,i) Wherein (x)pj,i, ypj,i) Is a human body key point pj,iCoordinate Q ofj,i,rj,iFor random encoding, the random encoding satisfies the following conditions: (1) r isj,iTo obey an n-dimensional random vector of an n-dimensional Gaussian distribution, (2) rj,iThe modular length of (1) is fixed, (3) any two key points have different corresponding random codes;
finally, connecting the position code and the image vector to obtain an input vector Zj,i=concat(Ij,i, Vj,i);
Image Z of all key pointsj,1,Zj,2,…,Zj,kSending the pedestrian features into a Transformer network to finally obtain the pedestrian features gj,gj=T(Zj,1,Zj,2,…,Zj,k);
Step 3, carrying out pedestrian re-identification matching;
characterizing the pedestrian gjAnd comparing the pedestrian with the characteristics in the bottom library to determine whether the detected pedestrian is the target person.
2. The method for pedestrian re-identification based on Transformer as claimed in claim 1, wherein the first model in step 1 is a YOLOv3 model.
3. The method for pedestrian re-identification based on Transformer as claimed in claim 1, wherein the HRNet model is selected as the second model in step 1.
4. The method as claimed in claim 1, wherein when the first model is used to detect the pedestrian frame in step 1, if the aspect ratio or resolution of the picture does not meet the requirement of the first model, black borders are added to the picture, so that the aspect ratio of the picture is adjusted to the aspect ratio corresponding to the first model, and the picture is scaled to the resolution required by the first model.
5. The method as claimed in claim 1, wherein in the step 1, when the second model is used to detect the key points of the human body, if the resolution of the captured image does not meet the requirement of the second model, the image is scaled to the resolution required by the second model.
6. The method for re-identifying pedestrians based on Transformer according to claim 1, wherein the intercepting manner in step 1 is expanded outwards with a key point as a center, specifically: taking the key as the center, a rectangle with the side length being the width 1/3 of the pedestrian frame is cut out, or a circle with the radius being the width 1/6 of the pedestrian frame is cut out.
7. The transform-based pedestrian weight recognition method of any one of claims 1-6, wherein the human body key points correspond to a right ankle, a right knee, a right hip, a left knee, a left ankle, a right wrist, a right elbow, a right shoulder, a left elbow, a left wrist, a neck, a head center, and a body center of the human body.
8. The transform-based pedestrian re-identification method according to claim 7, wherein the transform-based pedestrian re-identification method is characterized in thatThe step 3 is specifically as follows: calculating pedestrian characteristics gjAnd when the maximum similarity is greater than the similarity threshold value with the similarity s of each bottom library feature in the bottom libraries, the matching is successful, namely the detected pedestrian belongs to the target person in the bottom libraries, otherwise, the matching is failed.
9. The Transformer-based pedestrian re-identification method according to claim 8, wherein the pedestrian feature g is calculated in step 3jThe similarity s with each bottom library feature in the bottom library specifically is as follows: for each bottom library feature f in the bottom libraryaAnd a belongs to {1,2, …, m }, and calculating the pedestrian characteristic gjAnd faThe degree of similarity s of (a) to (b),
Figure DEST_PATH_IMAGE001
the maximum similarity and the corresponding target person information t are as follows:
Figure DEST_PATH_IMAGE003
setting a similarity threshold sthresholdIf s ismax>sthresholdAnd if the matching is successful, returning the target personnel information corresponding to the t, otherwise, failing to match.
CN202110723088.XA 2021-06-29 2021-06-29 Pedestrian re-identification method based on Transformer Active CN113255598B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110723088.XA CN113255598B (en) 2021-06-29 2021-06-29 Pedestrian re-identification method based on Transformer

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110723088.XA CN113255598B (en) 2021-06-29 2021-06-29 Pedestrian re-identification method based on Transformer

Publications (2)

Publication Number Publication Date
CN113255598A CN113255598A (en) 2021-08-13
CN113255598B true CN113255598B (en) 2021-09-28

Family

ID=77190012

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110723088.XA Active CN113255598B (en) 2021-06-29 2021-06-29 Pedestrian re-identification method based on Transformer

Country Status (1)

Country Link
CN (1) CN113255598B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113688271B (en) * 2021-10-25 2023-05-16 浙江大华技术股份有限公司 File searching method and related device for target object

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107832672B (en) * 2017-10-12 2020-07-07 北京航空航天大学 Pedestrian re-identification method for designing multi-loss function by utilizing attitude information
CN108399381B (en) * 2018-02-12 2020-10-30 北京市商汤科技开发有限公司 Pedestrian re-identification method and device, electronic equipment and storage medium
CN110334675B (en) * 2019-07-11 2022-12-27 山东大学 Pedestrian re-identification method based on human skeleton key point segmentation and column convolution

Also Published As

Publication number Publication date
CN113255598A (en) 2021-08-13

Similar Documents

Publication Publication Date Title
CN109657631B (en) Human body posture recognition method and device
CN103530599B (en) The detection method and system of a kind of real human face and picture face
WO2020042419A1 (en) Gait-based identity recognition method and apparatus, and electronic device
CN109325412B (en) Pedestrian recognition method, device, computer equipment and storage medium
CN108090435B (en) Parking available area identification method, system and medium
CN108305283B (en) Human behavior recognition method and device based on depth camera and basic gesture
JP4951498B2 (en) Face image recognition device, face image recognition method, face image recognition program, and recording medium recording the program
CN111914642B (en) Pedestrian re-identification method, device, equipment and medium
US20220180534A1 (en) Pedestrian tracking method, computing device, pedestrian tracking system and storage medium
CN104200200B (en) Fusion depth information and half-tone information realize the system and method for Gait Recognition
CN103577815A (en) Face alignment method and system
CN112101195B (en) Crowd density estimation method, crowd density estimation device, computer equipment and storage medium
CN114187665A (en) Multi-person gait recognition method based on human body skeleton heat map
CN112200056B (en) Face living body detection method and device, electronic equipment and storage medium
CN113255598B (en) Pedestrian re-identification method based on Transformer
CN110991258A (en) Face fusion feature extraction method and system
CN114333023A (en) Face gait multi-mode weighting fusion identity recognition method and system based on angle estimation
CN112396036A (en) Method for re-identifying blocked pedestrians by combining space transformation network and multi-scale feature extraction
CN109784261B (en) Pedestrian segmentation and identification method based on machine vision
Vasconcelos et al. Methodologies to build automatic point distribution models for faces represented in images
CN112380966B (en) Monocular iris matching method based on feature point re-projection
CN113139504B (en) Identity recognition method, device, equipment and storage medium
CN114898287A (en) Method and device for dinner plate detection early warning, electronic equipment and storage medium
JP7479809B2 (en) Image processing device, image processing method, and program
Ferreira et al. Human detection and tracking using a Kinect camera for an autonomous service robot

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
PE01 Entry into force of the registration of the contract for pledge of patent right
PE01 Entry into force of the registration of the contract for pledge of patent right

Denomination of invention: A pedestrian recognition method based on transformer

Effective date of registration: 20220705

Granted publication date: 20210928

Pledgee: China Construction Bank Corporation Nanjing Jianye sub branch

Pledgor: Nanjing inspector Intelligent Technology Co.,Ltd.

Registration number: Y2022980009897

PC01 Cancellation of the registration of the contract for pledge of patent right
PC01 Cancellation of the registration of the contract for pledge of patent right

Date of cancellation: 20230720

Granted publication date: 20210928

Pledgee: China Construction Bank Corporation Nanjing Jianye sub branch

Pledgor: Nanjing inspector Intelligent Technology Co.,Ltd.

Registration number: Y2022980009897

PE01 Entry into force of the registration of the contract for pledge of patent right
PE01 Entry into force of the registration of the contract for pledge of patent right

Denomination of invention: A Transformer based pedestrian re recognition method

Effective date of registration: 20230803

Granted publication date: 20210928

Pledgee: China Construction Bank Corporation Nanjing Jianye sub branch

Pledgor: Nanjing inspector Intelligent Technology Co.,Ltd.

Registration number: Y2023980050832