CN114708617A

CN114708617A - Pedestrian re-identification method and device and electronic equipment

Info

Publication number: CN114708617A
Application number: CN202210420732.0A
Authority: CN
Inventors: 何群; 吴婷; 闾凡兵; 廖代海
Original assignee: Changsha Hisense Intelligent System Research Institute Co ltd
Current assignee: Changsha Hisense Intelligent System Research Institute Co ltd
Priority date: 2022-04-21
Filing date: 2022-04-21
Publication date: 2022-07-05

Abstract

The invention provides a pedestrian re-identification method and device and electronic equipment. According to the method, the background is filtered through the key point detection, and then the image after the background is filtered is used as the input of the pedestrian re-identification model to perform pedestrian re-identification, so that the problem that the pedestrian re-identification matching is difficult due to the background influence in the pedestrian re-identification can be solved, the accuracy of the pedestrian re-identification is improved, the framework of the pedestrian re-identification model does not need to be changed, and the complexity and the training cost of the pedestrian re-identification model are not increased.

Description

Pedestrian re-identification method and device and electronic equipment

Technical Field

The invention relates to the technical field of image recognition, in particular to a pedestrian re-recognition method and device and electronic equipment.

Background

In recent years, with the development of artificial intelligence technology, pedestrian re-identification technology in application scenes such as public security, image retrieval and the like has been widely researched and paid attention. However, compared with the traditional biological recognition technologies such as face recognition and gesture recognition, the pedestrian re-recognition has low recognition accuracy due to uncontrollable reasons such as the resolution of a monitoring video, background influence, different light and gestures and the like. Therefore, the pedestrian re-identification technology can face a great challenge in practical application scenarios. In addition, as the city is developing day by day, the extracted pedestrian samples contain a large amount of different background information, which can greatly affect the extraction characteristics of the model, and how to extract pure pedestrian characteristic information from the pictures with complex backgrounds is a problem to be solved urgently.

In the prior art, the processing of the background information interference needs to add spatio-temporal information to retrain a more complex model or generate a model to generate samples under more scenes so as to distinguish the target from the background. The training cost of the methods is large, and the models are too complex, so that the deployment and the use of the models in real scenes are difficult.

Disclosure of Invention

The invention aims to provide a pedestrian re-identification method, a pedestrian re-identification device and electronic equipment, which solve the problem that the pedestrian re-identification matching is difficult due to background influence in the pedestrian re-identification and improve the accuracy of the pedestrian re-identification.

According to an aspect of an embodiment of the present invention, there is provided a pedestrian re-identification method, including the steps of:

acquiring a first image set, wherein the first image set comprises a plurality of first pictures, and each first picture comprises a target pedestrian;

inputting the first image set into a trained key point detection model for key point detection to obtain a key point set corresponding to each target pedestrian;

connecting key points in a key point set corresponding to each target pedestrian according to a preset connection algorithm to obtain a first picture with the key point connecting lines of the target pedestrians, wherein the number of the key point connecting lines of the target pedestrians is multiple, and the key point connecting lines of the target pedestrians correspond to multiple different body parts of the target pedestrians respectively;

inputting a first picture with key point connecting lines of target pedestrians into a trained background filtering model, extracting pixel regions of a plurality of different body parts corresponding to the target pedestrians according to the key point connecting lines of the target pedestrians by the background filtering model, and combining the pixel regions of the plurality of different body parts corresponding to the target pedestrians as the first picture after background filtering;

and inputting the first pictures after the plurality of backgrounds are filtered into a trained pedestrian re-recognition model as a second image set for pedestrian re-recognition to obtain a pedestrian re-recognition result.

As an optional example, in the method, the plurality of key point connecting lines respectively correspond to a limb part connecting line, a trunk part connecting line and a face connecting line of the target pedestrian;

and the background filtering model respectively extracts pixel areas corresponding to the four limbs, the trunk and the face of the target pedestrian by taking the plurality of key point connecting lines as a basis, and combines the pixel areas corresponding to the four limbs, the trunk and the face of the target pedestrian to be used as a first picture after background filtering.

As an alternative example, in the method, the four-limb connecting line of the target pedestrian is a line segment extending along the four-limb direction of the target pedestrian, the trunk-part connecting line is a closed polygon of the trunk part surrounding the target pedestrian connected by a plurality of line segments, and the face connecting line is a closed polygon of the face surrounding the target pedestrian connected by a plurality of line segments.

As an optional example, in the method, the extracting, by the background filtering model, pixel regions corresponding to the four limbs, the trunk and the face of the target pedestrian respectively based on the plurality of key point connecting lines includes:

on the basis of the connecting line of the four limbs, a plurality of pixels are respectively expanded to two sides of the connecting line of the four limbs to obtain the four limb area;

and respectively extracting pixels in the four-limb area, the closed polygon of the face connecting line and the closed polygon of the trunk part connecting line, and finishing the extraction of the four-limb part, the trunk part and the face pixel area of the target pedestrian.

As an optional example, in the method, based on the connection line of the four limb portions, the extending of the plurality of pixels to both sides of the connection line of the four limb portions respectively to obtain the four limb area specifically includes:

determining the original slope of the connecting line of the four limb parts;

calculating an auxiliary slope perpendicular to the original slope of the connection line of the four limbs;

moving a plurality of pixels in parallel from the original slope to two sides to obtain two original edge lines;

moving a plurality of pixels in parallel from the auxiliary slope to two sides to obtain two auxiliary edge lines;

the two original edge lines and the two auxiliary edge lines enclose a four-limb area.

As an optional example, in the method, the set of key points corresponding to each target pedestrian includes 17 key points, which are:

key points K0 at the nose of the human face, key points K1 and K2 at the left and right eyes of the human face, key points K3 and K4 at the left and right ears of the human face, key points K5 and K6 at the left and right shoulders of the human skeleton, key points K7 and K8 at the left and right elbows of the human skeleton, key points K9 and K10 at the left and right wrists of the human skeleton, key points K11 and K12 at the left and right buttocks of the human skeleton, key points K13 and K14 at the left and right knees of the human skeleton, and key points K15 and K16 at the left and right ankles of the human skeleton.

The obtaining of the key point connecting line of each target pedestrian according to the preset connection algorithm and the key point set specifically includes:

respectively connecting a key point K5 at the left shoulder of a human skeleton with a key point K7 at the left elbow of the human skeleton, a key point K7 at the left elbow of the human skeleton with a key point K9 at the left wrist of the human skeleton, a key point K6 at the right shoulder of the human skeleton with a key point K8 at the right elbow of the human skeleton, a key point K8 at the right elbow of the human skeleton with a key point K10 at the right wrist of the human skeleton, a key point K11 at the left hip of the human skeleton and a key point K13 at the left knee of the human skeleton, a key point K13 at the left knee of the left leg of the human skeleton and a key point K15 at the left ankle of the human skeleton, a key point K12 at the right hip of the human skeleton and a key point K14 at the right knee of the right leg of the human skeleton, and a key point K14 at the right knee of the human skeleton and a key point K16 at the right ankle of the human skeleton to obtain eight four-limb connecting line portions 0-line 7;

sequentially connecting a key point K0 at the nose of the human face with a key point K3 at the left ear of the human face, a key point K3 at the left ear of the human face with a key point K1 at the left eye of the human face, a key point K1 at the left eye of the human face with a key point K2 at the right eye of the human face, a key point K2 at the right eye of the human face with a key point K4 at the right ear of the human face, a key point K4 at the right ear of the human face with a key point K0 at the nose of the human face to obtain a polygon0 with a face connection line;

sequentially connecting a key point K5 at the left shoulder of the human skeleton with a key point K11 at the left hip of the human skeleton, a key point K11 at the left hip of the human skeleton with a key point K12 at the right hip of the human skeleton, a key point K12 at the right hip of the human skeleton with a key point K6 at the right shoulder of the human skeleton, a key point K6 at the right shoulder of the human skeleton with a key point K5 at the left shoulder of the human skeleton to obtain a trunk part connecting line polygon 1.

The background filtering model extracts pixel regions corresponding to a plurality of different body parts of each target pedestrian according to the key point connecting line of each target pedestrian, and the combination of the pixel regions corresponding to the plurality of different body parts of each target pedestrian as a first picture after background filtering specifically comprises:

based on the eight four-limb connecting lines 0-7, a plurality of pixels are respectively expanded to two sides of each four-limb connecting line to obtain eight four-limb areas area 0-area 7;

a pixel area formed by eight limb areas area 0-area 7, face connecting line polygon0 and trunk part connecting line polygon1 is extracted and used as a first picture after background filtering.

As an optional example, in the method, the acquiring the first image set specifically includes:

acquiring a first video, carrying out target detection on the first video, and detecting a target pedestrian in the first video;

carrying out target tracking on target pedestrians in the first video, and classifying the same pedestrian in the first video;

performing frame extraction on the first video according to the results of target detection and target tracking and a preset optimal frame extraction algorithm to obtain a plurality of initial pictures;

clustering the plurality of initial pictures by using a clustering algorithm;

and marking each initial picture, and marking the figure ID, the camera ID, the shooting time and the picture sequence number of each initial picture to obtain a first image set comprising a plurality of first pictures which accord with the input format of the key point detection model.

As an optional example, in the method, inputting the first image set into a trained keypoint detection model for keypoint detection, and obtaining a keypoint set corresponding to each target pedestrian specifically includes:

expanding each first picture outwards by pixels with a preset proportion to obtain expanded data;

inputting the extended data into an STN module for affine transformation to obtain transformed data;

inputting the transformation data into an SPPE module for key point extraction to obtain key point coordinates;

inputting the key point coordinates into an SDTN module for reverse coordinate transformation to obtain a key point candidate set;

and inputting the key point candidate set into a poseNMS module for screening the key point candidate set to obtain a key point set of the target pedestrian.

According to another aspect of an embodiment of the present invention, there is provided a pedestrian re-recognition apparatus including:

the image acquisition unit is used for acquiring a first image set, wherein the first image set comprises a plurality of first images, and each first image comprises a target pedestrian;

the key point detection unit is used for inputting the first image set into a trained key point detection model for key point detection to obtain a key point set corresponding to each target pedestrian;

a connecting unit, configured to connect the key points in the key point set corresponding to each target pedestrian according to a preset connecting algorithm to obtain a first picture with the key point connecting lines of the target pedestrians, where the number of the key point connecting lines of the target pedestrians is multiple, and the multiple key point connecting lines correspond to multiple different body parts of the target pedestrians respectively

The background filtering unit is used for inputting a first picture with key point connecting lines of target pedestrians into a trained background filtering model, extracting pixel regions corresponding to a plurality of different body parts of the target pedestrians according to the key point connecting lines of the target pedestrians through the background filtering model, and combining the pixel regions corresponding to the plurality of different body parts of the target pedestrians as a first picture after background filtering;

and the pedestrian re-recognition unit is used for inputting the first pictures after the background filtering as a second image set into a trained pedestrian re-recognition model for pedestrian re-recognition to obtain a pedestrian re-recognition result.

According to another aspect of an embodiment of the present invention, there is provided an electronic apparatus including: a memory storing a computer program which, when executed by the processor, causes the processor to perform the steps of the above method.

The invention has the beneficial effects that: the invention provides a pedestrian re-identification method and device and electronic equipment. The method comprises the following steps: acquiring a first image set, wherein the first image set comprises a plurality of first pictures, and each first picture comprises a target pedestrian; inputting the first image set into a trained key point detection model for key point detection to obtain a key point set corresponding to each target pedestrian; connecting key points in a key point set corresponding to each target pedestrian according to a preset connection algorithm to obtain a first picture with the key point connecting lines of the target pedestrians, wherein the number of the key point connecting lines of the target pedestrians is multiple, and the key point connecting lines of the target pedestrians correspond to multiple different body parts of the target pedestrians respectively; inputting a first picture with key point connecting lines of target pedestrians into a trained background filtering model, extracting pixel regions of a plurality of different body parts corresponding to the target pedestrians according to the key point connecting lines of the target pedestrians by the background filtering model, and combining the pixel regions of the plurality of different body parts corresponding to the target pedestrians as the first picture after background filtering; the method comprises the steps of inputting a plurality of first pictures after background filtering into a trained pedestrian re-recognition model as a second image set to perform pedestrian re-recognition, obtaining a pedestrian re-recognition result, performing background filtering through key point detection, and then inputting the images after background filtering into the pedestrian re-recognition model to perform pedestrian re-recognition.

Drawings

For a better understanding of the nature and technical aspects of the present invention, reference should be made to the following detailed description of the invention, taken in conjunction with the accompanying drawings, which are provided for purposes of illustration and description and are not intended to limit the invention.

In the drawings, there is shown in the drawings,

FIG. 1 is a flow chart of a pedestrian re-identification method of the present invention;

FIG. 2 is an architecture diagram of a key point detection model in the pedestrian re-identification method of the present invention;

FIG. 3 is a schematic diagram of a set of key points according to an embodiment of a pedestrian re-identification method of the present invention;

FIG. 4 is a schematic diagram of a key point connection according to an embodiment of the pedestrian re-identification method of the present invention;

FIG. 5 is a schematic diagram of a four-limb portion connection line, a face connection line and a trunk portion connection line according to an embodiment of the pedestrian re-identification method of the invention;

FIG. 6 is a schematic diagram of a human re-recognition model in the pedestrian re-recognition method according to the present invention;

FIG. 7 is a schematic view of a pedestrian re-identification apparatus of the present invention;

FIG. 8 is a schematic view of an electronic device of the present invention;

fig. 9 is a schematic diagram of steps S2 to S4 of the pedestrian re-identification method of the present invention.

Detailed Description

Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art.

Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the disclosure. One skilled in the relevant art will recognize, however, that the subject matter of the present disclosure can be practiced without one or more of the specific details, or with other methods, components, devices, steps, and so forth. In other instances, well-known methods, devices, implementations, or operations have not been shown or described in detail to avoid obscuring aspects of the disclosure.

The block diagrams shown in the figures are functional entities only and do not necessarily correspond to physically separate entities. I.e. these functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor means and/or microcontroller means.

The flow charts shown in the drawings are merely illustrative and do not necessarily include all of the contents and operations/steps, nor do they necessarily have to be performed in the order described. For example, some operations/steps may be decomposed, and some operations/steps may be combined or partially combined, so that the actual execution sequence may be changed according to the actual situation.

Referring to fig. 1 to 9, an embodiment of the invention provides a pedestrian re-identification method, including the following steps:

step S1, obtaining a first image set, where the first image set includes a plurality of first pictures, and each first picture includes a target pedestrian.

Specifically, the acquiring of the first image set in step S1 specifically includes:

clustering the plurality of initial pictures by using a clustering algorithm;

Further, in one embodiment of the present invention, the process of acquiring the first image set is:

firstly, acquiring a real project video as a first video, then carrying out target detection, target tracking and frame extraction on the first video to obtain a plurality of initial pictures, wherein the obtained plurality of initial pictures can be stored in a plurality of folders according to different target pedestrians, and the initial picture in each folder corresponds to the same pedestrian;

and a plurality of initial pictures at this moment are unordered, can't directly input the key point detection model and carry out the key point detection and can't directly input the pedestrian and re-discern the model and carry out pedestrian and re-discern, consequently, need these initial pictures to carry out the preliminary treatment, specifically for renaming initial picture, the naming form is: pedestrian number _ picture taking time _ picture number, such as 1_20200901175946_12, where 1 denotes a pedestrian number, 20200901175946 denotes a picture taking time, and 12 denotes a picture number.

Then, clustering the plurality of initial pictures by using a clustering algorithm to remove the pictures which are identified by errors in the plurality of initial pictures;

then, each initial picture needs to be labeled, and the person ID, the camera ID, the shooting time, and the picture serial number of each initial picture are labeled, for example, in an embodiment of the present invention, the name of the labeled initial picture may be: 1514_ C6S1_20200901175946_180, where 1514 is a person ID, C6S1 is a camera ID, 20200901175946 is shooting time, 180 is a picture number;

and finally, obtaining a picture set of the same person under different lenses, and combining the original folders through scripts to obtain a first image set which can be finally used for key point detection and pedestrian re-identification.

Specifically, the preset optimal frame extraction algorithm can be selected according to needs, and a pixel frame obtained by clear and complete shooting of a target pedestrian should be selected when the frame extraction is performed.

And S2, inputting the first image set into a trained key point detection model for key point detection to obtain a key point set corresponding to each target pedestrian.

Specifically, the key point detection mainly serves to facilitate the subsequent separation of the target person from the background information.

Further, with reference to fig. 2 and fig. 9, the keypoint detection model in step S2 adopts a hourglass network (stack hour glass network), where the working process specifically includes:

each first picture is extended outward by pixels of a preset proportion to obtain extended data, for example, in some embodiments of the present invention, each first picture is extended outward by 20% pixels to obtain extended data.

Inputting the expanded data into a Space Transform Network (STN) module for affine transformation to obtain transformed data;

inputting the transformation data into a Single Person Posture Estimation (SPPE) module to perform key point extraction, and obtaining key point coordinates, further, in some embodiments of the present invention, a ResNet50 network is adopted for a backbone (backbone) of the SPPE module.

Inputting the coordinates of the key points into a space inverse transformation network (SDTN) module for inverse coordinate transformation to obtain a candidate set of the key points;

and inputting the key point candidate set into a posture non-maximum suppression (pos NMS) module for screening the key point candidate set to obtain the key point set of the target pedestrian.

Specifically, as shown in fig. 3, in some embodiments of the present invention, the set of key points corresponding to each target pedestrian in step S2 includes 17 key points, which are key points K0 at the nose of the human face, key points K1 and K2 at the left and right eyes of the human face, key points K3 and K4 at the left and right ears of the human face, key points K5 and K6 at the left and right shoulders of the human skeleton, key points K7 and K8 at the left and right elbows of the human skeleton, key points K9 and K10 at the left and right wrists of the human skeleton, key points K11 and K12 at the left and right buttocks of the human skeleton, key points K13 and K14 at the left and right knees of the human skeleton, and ankle key points K15 and K16 at the left and right ankles of the human skeleton.

Further, each of the above-mentioned key points is represented by a set of coordinates on the corresponding pixel map, where k0= (x _0, y _0) represents key points at the nose of the face, k1= (x _1, y _1), k2= (x _2, y _2) represent key points at the left and right eyes of the face, respectively, k3= (x _3, y _3), k4= (x _4, y _4) represents key points at the left and right ears of the face, k5= (x _5, y _5), k6= (x _6, y _6) represents key points at the left and right shoulders of the human skeleton, respectively, k7= (x _7, y _7), k8= (x _8, y _8) represent key points at the left and right elbows of the human skeleton, respectively, k9= (x _9, y _9), k10= (x _10, y _10) represents key points at the left and right shoulders of the human skeleton, respectively, and the wrist, 11, y _11), k12= (x _12, y _12) respectively represent key points at the left and right buttocks of the human skeleton, k13= (x _13, y _13), k14= (x _14, y _14) respectively represent key points at the knee of the left and right legs of the human skeleton, and k15= (x _15, y _15), k16= (x _16, y _16) respectively represent key points at the left and right ankles of the human skeleton.

Step S3, connecting the key points in the key point set corresponding to each target pedestrian according to a preset connection algorithm, to obtain a first picture with the key point connections of the target pedestrian, where the number of the key point connections of the target pedestrian is multiple, and the multiple key point connections of the target pedestrian correspond to multiple different body parts of the target pedestrian respectively.

As an alternative embodiment, in the method, the plurality of key point connecting lines respectively correspond to a limb part connecting line, a trunk part connecting line and a face connecting line of the target pedestrian; the four-limb-part connecting line of the target pedestrian is a line segment extending along the four-limb direction of the target pedestrian, the trunk-part connecting line is a closed polygon surrounding the trunk part of the target pedestrian and formed by connecting a plurality of line segments, and the face connecting line is a closed polygon surrounding the face of the target pedestrian and formed by connecting a plurality of line segments.

Specifically, in some embodiments of the present invention, a result obtained by connecting the key points in the key point set corresponding to each target pedestrian according to a preset connection algorithm is shown in fig. 4 and 9, where the result corresponds to the key point set of the 17 key points, and includes:

sequentially connecting a key point K0 at the nose of the human face with a key point K3 at the left ear of the human face, a key point K3 at the left ear of the human face with a key point K1 at the left eye of the human face, a key point K1 at the left eye of the human face with a key point K2 at the right eye of the human face, a key point K2 at the right eye of the human face with a key point K4 at the right ear of the human face, a key point K4 at the right ear of the human face with a key point K0 at the nose of the human face, and obtaining a face connecting line polygon 0;

sequentially connecting a key point K5 at the left shoulder of the human skeleton with a key point K11 at the left hip of the human skeleton, a key point K11 at the left hip of the human skeleton with a key point K12 at the right hip of the human skeleton, a key point K12 at the right hip of the human skeleton with a key point K6 at the right shoulder of the human skeleton, a key point K6 at the right shoulder of the human skeleton with a key point K5 at the left shoulder of the human skeleton to obtain a connecting line of the trunk part, namely the polygon 1.

The eight limb connecting lines line 0-line 7 are specifically represented as follows:

line0={(x_5,y_5),(x_7,y_7)}、

line1={(x_7,y_7),(x_9,y_9)}、

line2={(x_6,y_6),(x_8,y_8)}、

line3={(x_8,y_8),(x_10,y_10)}、

line4={(x_11,y_11),(x_13,y_13)}、

line5={(x_13,y_13),(x_15,y_15)}、

line6={(x_12,y_12),(x_14,y_14)}、

and line7= { (x _14, y _14), (x _16, y _16) };

the face connecting line polygon0 is specifically expressed as follows:

polygon0={(x_0,y_0),(x_3,y_3),(x_1,y_1),(x_2,y_2),(x_4,y_4)}；

the trunk part connecting line polygon1 is specifically expressed as follows:

polygon1={(x_5,y_5),(x_11,y_11),(x_12,y_12),(x_6,y_6)}。

step S4, as shown in fig. 5, inputting a first picture with a key point connecting line of a target pedestrian into a trained background filtering model, where the background filtering model extracts pixel regions corresponding to a plurality of different body parts of the target pedestrian according to the key point connecting line of each target pedestrian, and combines the pixel regions corresponding to the plurality of different body parts of the target pedestrian as a first picture after background filtering;

specifically, the background filtering model extracts pixel regions corresponding to the four limbs, the trunk and the face of the target pedestrian respectively based on the connection lines of the plurality of key points, and combines the pixel regions corresponding to the four limbs, the trunk and the face of the target pedestrian as a first picture after background filtering.

Further, in some embodiments of the present invention, the extracting pixel regions corresponding to the four-limb part, the trunk part and the face of the target pedestrian according to the plurality of key point connecting lines by the background filtering model respectively includes:

Wherein, on the basis of the four-limb part connecting line, a plurality of pixels are respectively expanded to two sides of the four-limb part connecting line, and the obtaining of the four-limb area specifically comprises:

determining the original slope of a connecting line of the four limb parts;

calculating an auxiliary slope vertical to the original slope of the connection line of the four limbs;

the two original edge lines and the two auxiliary edge lines enclose to form a four-limb area.

For example, in some embodiments of the present invention, based on the eight four-limb connecting lines 0-7, a plurality of pixels are respectively expanded to two sides of each four-limb connecting line to obtain eight four-limb areas area 0-area 7;

Further, in some embodiments of the present invention, the pixel size of the first picture is 384 × 256, and at this time, based on the eight four-limb connecting lines 0-7, a plurality of pixels are respectively extended to two sides of each four-limb connecting line, so as to obtain eight four-limb areas area 0-area 7 specifically as follows: the four limb connecting lines 0-7 are used as a basis to respectively expand 15 pixels inwards and outwards to obtain eight four limb areas area 0-area 7.

Specifically, the method of expanding a plurality of pixels to both sides of each connection line based on eight four-limb connection lines 0-line 7 to obtain first to eighth regions area 0-area 7 specifically includes:

determining an original slope of a connecting line of four limb parts;

moving the original slope to two sides by a plurality of pixels in parallel to obtain two original edge lines, such as 15 pixels in the above embodiment;

moving the auxiliary slope to two sides by a plurality of pixels in parallel to obtain two auxiliary edge lines, such as 15 pixels in the above embodiment;

Wherein, the original slope calculation formula is as follows:

L=(ye-ys)/(xe-xs )

wherein ye and ys are y-axis coordinates of two key points corresponding to the connecting line of the four limbs, xe and xs are x-axis coordinates of the two key points corresponding to the connecting line of the four limbs, and L is an original slope;

the calculation formula of the auxiliary slope is as follows:

l tau = -1/L, wherein L tau is an auxiliary slope.

And step S5, inputting the first pictures after the background filtration as a second image set into a trained pedestrian re-recognition model for pedestrian re-recognition to obtain a pedestrian re-recognition result.

Specifically, referring to fig. 6, in some embodiments of the present invention, the pedestrian re-identification model in step S5 includes: the system comprises a data preprocessing module, a backbone network, a convergence network, a head network, a characteristic vector output module, a distance calculation module and an evaluation index and visualization module;

the method comprises the steps that a trunk network is an ResNet101 network (IBN) added with an Instance batch normalization module and used for extracting features, a Gem Pooling operation is used for aggregating the features by the aggregation network, a Bnneck is used by a head network to obtain a final prediction result, cross entropy loss and triple loss are combined in the process of training a re-identification model, circleSoftmax is introduced into a classification layer, and finally an Adam optimizer is used for enabling the model to learn data distribution in a data set from multiple angles, so that better fitting is achieved.

And the distance calculation mode is Euclidean distance, namely:

x is the feature vector output by the feature vector output module, Y is the feature vector in the search base, X_iAnd y_iRepresenting the ith dimension in X and Y, respectively, and n is the total dimension of the vector.

Therefore, the most similar characteristic vector is matched with the characteristic vector output by the characteristic vector output module in the retrieval base to obtain the result of re-identification of the pedestrian, and the pictures in the retrieval base are matched with the characteristic vector output by the characteristic vector output module after being subjected to the key point detection of the key point detection model and the background filtering of the background filtering model

Finally, verification proves that the method successfully solves the problems that the background influence in pedestrian re-identification causes difficulty in pedestrian re-identification matching, the accuracy of the result of pedestrian re-identification is improved by 5% before and after use, the effect can be improved without re-marking pedestrian re-identification data and re-training or changing the framework of a pedestrian re-identification model, the use cost is low, and the method is favorably applied to pedestrian re-identification in actual complex scenes.

Referring to fig. 7, the present invention further provides a pedestrian re-identification apparatus, including:

the image processing device comprises an acquisition unit 10, a processing unit and a processing unit, wherein the acquisition unit acquires a first image set, the first image set comprises a plurality of first images, and each first image comprises a target pedestrian;

the key point detection unit 20 is configured to input the first image set into a trained key point detection model for key point detection, so as to obtain a key point set corresponding to each target pedestrian;

a connection unit 30, configured to connect, according to a preset connection algorithm, key points in the set of key points corresponding to each target pedestrian to obtain a first picture with the connection lines of the key points of the target pedestrians, where the number of the connection lines of the key points of the target pedestrians is multiple, and the connection lines of the key points of the target pedestrians are respectively corresponding to multiple different body parts of the target pedestrians

The background filtering unit 40 is configured to input a first picture with a key point connecting line of a target pedestrian into a trained background filtering model, extract pixel regions corresponding to a plurality of different body parts of the target pedestrian according to the key point connecting line of each target pedestrian through the background filtering model, and combine the pixel regions corresponding to the plurality of different body parts of the target pedestrian as a first picture after background filtering;

and the pedestrian re-recognition unit 50 is configured to input the multiple background-filtered first pictures as a second image set into a trained pedestrian re-recognition model for pedestrian re-recognition, so as to obtain a pedestrian re-recognition result.

Referring to fig. 8, the present invention further provides an electronic device, including: a memory 100 and a processor 200, the memory 100 storing a computer program which, when executed by the processor 200, causes the processor 200 to perform the steps of the method as described above.

In summary, the present invention provides a pedestrian re-identification method, a pedestrian re-identification device, and an electronic apparatus. The method comprises the following steps: acquiring a first image set, wherein the first image set comprises a plurality of first pictures, and each first picture comprises a target pedestrian; inputting the first image set into a trained key point detection model for key point detection to obtain a key point set corresponding to each target pedestrian; connecting key points in a key point set corresponding to each target pedestrian according to a preset connection algorithm to obtain a first picture with the key point connecting lines of the target pedestrians; inputting a first picture with key point connecting lines of a target pedestrian into a trained background filtering model for background filtering to obtain a plurality of first pictures after background filtering; the method comprises the steps of inputting a plurality of first pictures after background filtering into a trained pedestrian re-recognition model as a second image set to perform pedestrian re-recognition, obtaining a pedestrian re-recognition result, performing background filtering through key point detection, and then performing pedestrian re-recognition by taking the images after background filtering as the input of the pedestrian re-recognition model.

As described above, it is obvious to those skilled in the art that other various changes and modifications can be made based on the technical solution and the technical idea of the present invention, and all such changes and modifications should fall within the protection scope of the claims of the present invention.

Claims

1. A pedestrian re-identification method is characterized by comprising the following steps:

2. The pedestrian re-identification method according to claim 1, wherein the plurality of key point connecting lines respectively correspond to a four-limb connecting line, a trunk connecting line and a face connecting line of the target pedestrian;

and the background filtering model respectively extracts pixel regions corresponding to the four limbs, the trunk and the face of the target pedestrian according to the connection lines of the plurality of key points, and combines the pixel regions corresponding to the four limbs, the trunk and the face of the target pedestrian to be used as a first picture after background filtering.

3. The pedestrian re-identification method according to claim 2, wherein the four-limb-part connecting line of the target pedestrian is a line segment extending in a direction of four limbs of the target pedestrian, the trunk-part connecting line is a closed polygon of a trunk part surrounding the target pedestrian connected by a plurality of line segments, and the face connecting line is a closed polygon of a face surrounding the target pedestrian connected by a plurality of line segments.

4. The pedestrian re-identification method of claim 3, wherein the extracting pixel regions corresponding to the four limbs, the trunk and the face of the target pedestrian respectively by the background filtering model based on the plurality of key point connecting lines comprises:

5. The pedestrian re-identification method according to claim 3, wherein the step of expanding a plurality of pixels to both sides of the four-limb-part connecting line on the basis of the four-limb-part connecting line to obtain the four-limb area specifically comprises:

determining the original slope of a connecting line of the four limb parts;

6. The pedestrian re-identification method according to claim 1, wherein the set of key points corresponding to each target pedestrian includes 17 key points, which are respectively:

key points K0 at the nose of the human face, key points K1 and K2 at the left and right eyes of the human face, key points K3 and K4 at the left and right ears of the human face, key points K5 and K6 at the left and right shoulders of the human skeleton, key points K7 and K8 at the left and right elbows of the human skeleton, key points K9 and K10 at the left and right wrists of the human skeleton, key points K11 and K12 at the left and right buttocks of the human skeleton, key points K13 and K14 at the left and right knees of the human skeleton, and key points K15 and K16 at the left and right ankles of the human skeleton;

sequentially connecting a key point K5 at the left shoulder of a human body skeleton with a key point K11 at the left hip of the human body skeleton, a key point K11 at the left hip of the human body skeleton with a key point K12 at the right hip of the human body skeleton, a key point K12 at the right hip of the human body skeleton with a key point K6 at the right shoulder of the human body skeleton, a key point K6 at the right shoulder of the human body skeleton with a key point K5 at the left shoulder of the human body skeleton to obtain a trunk part connecting line polygon 1;

on the basis of eight four-limb connecting lines 0-7, a plurality of pixels are respectively expanded to two sides of each four-limb connecting line to obtain eight four-limb areas area 0-area 7;

7. The pedestrian re-identification method according to claim 1,

the acquiring of the first image set specifically includes:

clustering the plurality of initial pictures by using a clustering algorithm;

8. The pedestrian re-identification method of claim 1, wherein the step of inputting the first image set into a trained key point detection model for key point detection to obtain a key point set corresponding to each target pedestrian specifically comprises:

extending each first picture outwards by pixels with a preset proportion to obtain extended data;

9. A pedestrian re-recognition apparatus, comprising:

10. An electronic device, comprising: a memory and a processor, the memory storing a computer program that, when executed by the processor, causes the processor to perform the steps of the method according to any one of claims 1-8.