CN113627380B

CN113627380B - Cross-vision pedestrian re-identification method and system for intelligent security and early warning

Info

Publication number: CN113627380B
Application number: CN202110959012.7A
Authority: CN
Inventors: 寇旗旗; 程德强; 李云龙; 马尚; 王晓艺; 张皓翔
Original assignee: Jiangsu Huatu Mining Technology Co ltd; China University of Mining and Technology CUMT
Current assignee: Jiangsu Huatu Mining Technology Co ltd; China University of Mining and Technology CUMT
Priority date: 2021-08-20
Filing date: 2021-08-20
Publication date: 2024-03-15
Anticipated expiration: 2041-08-20
Also published as: CN113627380A

Abstract

The invention relates to a cross-vision pedestrian re-identification method and a system for intelligent security and early warning, wherein the method comprises the following steps: step 1, preprocessing a source domain sample and a target domain sample, step 2, extracting sample feature vectors through ResNet50, step 3, splicing feature vectors after feature extraction, step 4, calculating distance, generating a pseudo tag, and step 5, re-identifying, generating a pseudo tag, and calculating loss. The invention effectively carries out shielding treatment on the introduced gesture estimation points and the mixed background, prevents the network from focusing attention on background information by utilizing shielding, and strengthens the network learning ability, thereby increasing the recognition precision of unsupervised pedestrian re-recognition.

Description

Cross-vision pedestrian re-identification method and system for intelligent security and early warning

Technical Field

The invention relates to a pedestrian re-identification method, in particular to a cross-vision pedestrian re-identification method for intelligent security and early warning.

Background

The awareness of guaranteeing the safety of people in public places is continuously increasing nowadays. Governments and organizations focus on the safety of public areas such as parks, schools, shopping centers and the like, and provide public security with huge financial and material costs. When the security early warning is constructed, the video monitoring system plays a key role. Today, a large number of cameras are developing as reliable tools to solve various security problems, such as finding stray children, preventing crimes and protecting forbidden areas. The existing processing of camera image data is mainly carried out in a manual video monitoring system, the monitoring camera shooting distribution is wide in public places, monitoring operators observe the detection means of a single camera at a time, specific events or abnormal conditions of the monitoring operators are analyzed, other available camera shooting resources are ignored, and meanwhile, the factors such as the number of pedestrians and the mixing of background information under each image are ignored, so that the processing is a challenging task and the task of consuming manpower and material resources. The pedestrian Re-identification (Persons Re-ID) based on machine vision can realize cross-domain positioning to inquire the images of pedestrians under different cameras, and timely master the dynamic track of the pedestrians in a certain range, so that the method plays a key role in security and early warning of public places, and therefore the effective utilization of the pedestrian Re-identification (Re-ID) under the cross-lens has important significance for security of the public places.

Pedestrian Re-recognition, namely, recognizing pedestrian images or videos with the same identity under cross-border head through a design algorithm model, and given a pedestrian sample to be queried, the object of Re-ID is to determine whether the person appears at another place at different time captured by a different camera, and the image, the video sequence and the text description can be used for representing the person to be queried. The objective evaluation of the algorithm model can replace human eyes to quickly find out similar pedestrian images under a plurality of non-overlapping cameras. The method of Re-ID is mainly divided into two trends: closed environments and open environments.

The data set in the closed environment has the following characteristics: (1) data type agreement; (2) there are sufficient and accurate labels; (3) limited number of pedestrian identities; (4) scene closure.

The data set in the open environment has the following characteristics: (1) unprocessed image/video; (2) unlabeled/noisy labels; (3) scene opening; (4) the number of pedestrians is large.

The supervised pedestrian re-recognition under the closed environment has reached a good level at present, but under the conditions of large data volume, noise labels, unprocessed data and the like applied to the open environment, the unsupervised pedestrian re-recognition still faces a great challenge, and the problems faced by the unsupervised pedestrian re-recognition at present mainly include: 1. the method for manually calibrating the label cannot be used; 2. the initial clustering generates noise labels, affects accuracy, etc.

Disclosure of Invention

In order to achieve the above purpose, the invention provides a cross-vision pedestrian re-recognition method for intelligent security and early warning, which extracts more significant features after clipping, splices global features and local features, and further screens out pedestrian pictures with relevance by using camera index and frame number value information in the pictures.

In order to achieve the above purpose, the invention is realized by the following technical scheme:

the invention relates to a cross-vision pedestrian re-identification method for intelligent security and early warning, which comprises the following steps:

step 1, preprocessing a source domain sample and a target domain sample: and identifying gesture points by gesture estimation on the source domain picture and the target domain picture, shielding the picture by the gesture points, pre-training the model by the source domain picture and the ID information, and enabling the model to have the capability of extracting key features.

The preprocessing of the source domain sample and the target domain sample specifically comprises the following steps:

step 1-1, dividing a matrix area: identifying gesture points through gesture estimation on pictures of a source domain and a target domain, estimating a pedestrian joint point through the gesture points (Openpost model), separating a foreground and a mixed background through the joint point, and selecting a head joint point (x) ₁ ,y ₁ ) Foot joint (x) ₂ ,y ₂ ) Left and right joint points (x ₃ ,y ₃ )、(x ₄ ,y ₄ ) A rectangular area is calculated through 4 coordinate points and image sizes, and the calculation formula is as follows:

the coordinate value of the upper left corner of the rectangular area where the pedestrian is located is (W) ₁ ，H ₁ ) The lower right corner coordinate value is (W ₂ ，H ₂ )；

And 1-2, after dividing the matrix area, carrying out shielding treatment on the image, after projecting out the specific part of the pedestrian, respectively carrying out shielding treatment on the upper half body and the lower half body through the gesture estimation points, and inputting the image with the global characteristic and the shielding image of the half body into a network.

Step 2, extracting a sample feature vector through ResNet 50: loading the model pre-trained in the step 1, inputting the global feature image and the half-body shielding image with the source domain and the target domain into a network, extracting features related to pedestrians through ResNet50, and obtaining feature vectors through global feature processing and local feature processing, wherein the feature vector obtaining specifically comprises the following steps:

the global feature processing is that according to the network structure of ResNet50, a global feature image and a half-body shielding image sequentially pass through a global average pool layer (GAP layer), and after the output of the global average pool layer (GAP layer) is subjected to batch normalization, a complete connection layer (FC) is connected to obtain two 512-dimensional feature vectors;

the local feature processing is to input two half-body shielding images, sequentially access a global maximum pool layer (GMP layer) after the two half-body shielding images, and obtain two feature vectors with 512 dimensions at an access complete connection layer (FC) after the global maximum pool layer (GMP layer) output is subjected to batch normalization.

Step 3, feature vector splicing after feature extraction: and (3) connecting the feature vector obtained through the global feature in the step (2) with the feature vector processed by the local feature to obtain a dimensional feature vector, and splicing 4 512-dimensional vectors obtained through the global feature branch and the local feature branch together to form a 2048-dimensional feature vector, so that more potential detail features can be obtained through processing.

Step 4, calculating the distance to generate a pseudo tag: after extracting dimension feature vectors, calculating the distance between samples of a target domain and the distance between samples of a source domain to obtain a final distance matrix between the samples, obtaining a clustering radius R by using the distance matrix, and marking the samples of the target domain with more accurate pseudo labels by a density clustering method; in the step 4, the Jacquard distance and the Mars distance in the K mutual neighbors are used for calculating the distance between the target domain samples and the distance between the source domain samples, and the camera index and the frame number value information in the pictures are utilized in the calculation process to further screen out the pedestrian pictures with relevance, and the constraint distance d is added _f And the distance between the same identity samples is shortened, so that the distances of different identities are enlarged.

Step 5, re-identifying, namely generating pseudo tags and sequencing the pseudo tags: and (3) calculating the loss through the false label obtained in the step (4) and the dimension feature vector obtained in the step (3), updating the network, returning to the step (3) after updating the network, sequentially iterating to extract the features, calculating the distance, generating the false label, calculating the loss, and updating the network until the network converges or the iteration times are completed.

The invention further improves that: the distance calculation process in the step 5 is as follows:

(1) Constructing a matrix W with the same dimension as the matrix d, assigning samples meeting the same camera index and having a time difference of 10s to the position where W is located with a value of 0, assigning samples meeting different camera indexes and having a time difference of 10s to the position where W is located with a value of 1, and assigning other conditions with a value of 0.5;

(2) Constructing a zero matrix d of the same dimension as the matrix d _f Will W andd _f obtaining d after screening through dot multiplication _f The operation can pull the distance between the same identities, expand the distance between different identities and finally obtain the final distance

d(q,g _i )＝(1-λ ₁ -λ ₂ )d _f (q,g _i )+λ ₁ d _j (q,g _i )+λ ₂ d(q,g _i )

Wherein d is _f Is a constraint distance; d, d _j Is the Jacquard distance; d is the mahalanobis distance; d, d _all Is the final distance; lambda (lambda) ₁ 、λ ₂ 、λ ₃ The ratio between the distances is controlled for super parameters; q is a picture in a probe set; g _i Pictures are pooled for the gamma.

The invention further improves that: in the step 5, the overall loss function is:

L _tri ＝L _triplet (f _all ,y _it )+L _triplet (f _alle ,y _it )

wherein L is _tri L is the overall loss _triplet Is a triplet penalty.

The beneficial effects of the invention are as follows:

firstly, the introduced gesture estimation points are effectively shielded, the shielding is utilized to prevent the network from focusing attention on background information, so that the network learning capacity is enhanced, the learning local significance characteristics can be better avoided, the foreground and the background are separated, the detail characteristics can be better extracted, the better pseudo tag is obtained, intra-class aggregation can be better realized, the network can better learn the special characteristics, and the recognition precision of unsupervised pedestrian re-recognition is increased;

secondly, camera index and time information of the pictures are effectively utilized, and the characteristics that the images of pedestrians under the same camera are high in confidence and the images of pedestrians under different cameras are low in confidence are considered, so that good constraint is provided for pedestrian retrieval under cross-border heads.

Drawings

Fig. 1 is a flow chart of the present invention.

Fig. 2 is a schematic structural diagram of a re-identification apparatus according to an embodiment of the present invention.

Fig. 3 is a picture of the occlusion process of the image in step 1 of the present invention.

Fig. 4 is a basic structural diagram of the res net50 of the present invention.

FIG. 5 is a schematic diagram of the k-reciprocal nearest neighbors algorithm of the present invention.

Fig. 6 is a schematic diagram showing the method of the present invention capable of approaching pedestrians with the same identity at the same camera approach time and pushing away pedestrians with different identities at different camera approach times.

FIG. 7 is a schematic diagram of a cross-view pedestrian re-recognition system of the present invention.

Detailed Description

Embodiments of the invention are disclosed in the drawings, and for purposes of explanation, numerous practical details are set forth in the following description. However, it should be understood that these practical details are not to be taken as limiting the invention. That is, in some embodiments of the invention, these practical details are unnecessary.

As shown in fig. 7, the invention is a cross-vision pedestrian re-recognition system for intelligent security and early warning, which comprises a data preprocessing module, a data acquisition module, a network construction module, a network training module and a model evaluation module, wherein: the data preprocessing module marks node information on the data set through the gesture estimation model, and then performs shielding processing on the image; the data acquisition module loads data sets of a source domain and a target domain into a network, wherein the source domain data is a labeled data set, and the target domain data is an unlabeled data set; the network construction module modifies a backbone network for extracting the characteristics of the input image, processes global characteristics and local characteristics respectively, and finally splices the characteristic vectors; the network training process of the network training module is as follows: and extracting the spliced feature vectors of the source domain and the target domain data through a network, calculating the distance between samples, obtaining a clustering radius R according to the distance, generating a pseudo tag as a supervision signal, calculating a loss function, and updating network parameters. And the model evaluation module inputs the data set into the network for evaluation after the network training is finished, and does not update network parameters.

The cross-vision pedestrian re-recognition system for intelligent security and early warning is realized by using a cross-vision pedestrian re-recognition method, and a flow frame diagram of the method is shown in figure 1. The first part inputs the source domain sample and the target domain sample, the second part replaces a trunk part ResNet50 of a part of the structure, the third part splices the features proposed by the branches, and the fourth part calculates the distance. The fifth part is to extract the characteristics of the target domain test set to calculate each distance, weight the characteristics to obtain the final distance, and rank and score the final distance, the recognition system is realized by a re-recognition device which comprises an image acquisition unit, a core board based on the image processing of Atlas 200 development Kit, a PC control unit and a display module, and the structure of the PC control unit is shown as 2.

Step 1: preprocessing source domain samples and target domain samples, input network

Identifying gesture points by gesture estimation on a source domain picture and a target domain picture, shielding the picture by the gesture points, pre-training a model by the source domain picture and ID information, and enabling the model to have the capability of extracting key features, wherein the ID information refers to camera index and frame number value information;

at the beginning, the source domain image and ID information are input, and pre-training of the model is carried out, so that the model has the capability of extracting key features. Firstly, estimating the joint point of a pedestrian through a gesture point (Openpost model), separating a foreground and a mixed background through the joint point, and selecting a head joint point (x ₁ ,y ₁ ) Foot joint (x) ₂ ,y ₂ ) Left and right joint points (x ₃ ,y ₃ )、(x ₄ ,y ₄ ) A rectangular area is calculated through 4 coordinate points and image sizes, and the calculation formula is as follows:

the coordinate value of the upper left corner of the rectangular area where the pedestrian is located is (W) ₁ ，H ₁ ) The lower right corner coordinate value is (W ₂ ，H ₂ ). After the matrix area is divided, the image is subjected to occlusion processing as shown in fig. 3. After the specific part of the pedestrian protrudes, the upper half body and the lower half body are respectively subjected to shielding treatment through the gesture estimation points, and the images with global characteristics and the shielding images of the half body are input into a network.

The source domain data are trained according to identity loss ID loss and Triplet loss, 10 epochs of the pre-trained model are performed, and the pre-trained model is used for training. After the accuracy is more than 70% in the pre-training, the source domain data is subjected to data enhancement, such as overturning, random erasing, random cutting and other operations, so that the model is prevented from being fitted. In addition, at the last layer Fully Connect Layer of the network (FC layer), it should be noted that at the time of pre-training, the layer dimension is the number of pedestrian IDs, and the loss functions of the source domain are the softmax function and the triplet loss

L _src-id Represents an ID loss function, n _s Represents the number of samples in the source domain data, p (y _s ⁽ⁱ⁾ |f _s ⁽ⁱ⁾ ) The representation is that the ith sample of the source domain summary belongs to pedestrian y ⁽ⁱ⁾ Is a probability of (2).

L _src-tri Representing the triplet loss, alpha is the interval hyper-parameter,representing the characteristics of the Anchor sample extraction, +.>Representing the characteristics of other positive sample extractions that are the same as the i-th sample identity, representing the characteristics of other negative sample extractions that are different from the i-th sample identity.

The overall loss function is:

L _src ＝L _src-tri +L _src-id

wherein L is _src Representing the overall loss of source domain, L _src-tri Representing the triplet loss of the source domain, L _src-id Indicating loss of identity of the source domain.

Step 2, extracting sample feature vectors through ResNet50

Loading the pre-trained model in the step 1, inputting the images with global features and the half-body shielding images of the source domain and the target domain into a network, extracting features related to pedestrians through the ResNet50, obtaining feature vectors through global feature processing and local feature processing, extracting features after the images pass through 5 stages of the ResNet50, and processing the global features and the local features separately, wherein 5 stages of the backbone network ResNet50 are shown in fig. 4.

Step 3, feature vector splicing after feature extraction

After extracting the features of the input image, we connect the feature vectors of the two branches, and one of the feature vectors is spliced by obtaining 4 512-dimensional vectors through the global feature branch and the local feature branch to form a 2048-dimensional feature vector, so that more potential detail features can be obtained through processing.

f _iall ＝[f _i ,f _i' ,f _iup ,f _ilow ]

After the base model, f is also used _i The next FC layer gets a global embedded vector f _{i_e} 512-dims, and is equal to f _i This FC layer will also be updated during training, sharing the same pseudo tag.

Step 4, calculating the distance to generate a pseudo tag

After 2048-dimensional features are extracted and fused, the model is pretrained through source domain data, the effect of the model on the source domain data is good, but when the target domain training is carried out, the problem of domain difference exists, meanwhile, the problem is solved by generating a pseudo label through clustering because a target domain sample is not labeled, and before the pseudo label is generated, the clustering global radius length R is determined.

R is obtained by weighting the Jacquard distance in the k-reciprocal nearest neighbors algorithm and the Markov distance between the source domain and the target domain. The main idea of the K-reciprocal nearest neighbors algorithm is that if two pictures a, B are similar, then B should be within the first K neighbors of a, and conversely, a should be within the first K neighbors of B. As shown in fig. 5.

And calculating the Mahalanobis distance and the Jacquard distance of the source domain and the target domain from the feature vectors. The initial ranking is obtained by calculating the Mahalanobis distance, the Jacquard distance is the distance between k neighbors, and the final distance (final distance) is obtained by scaling the two distances by a certain scaling factor.

d(q,g _i )＝(1-λ)d _J (q,g _i )+λd(q,g _i )

Wherein d is _j Is the Jacquard distance; d is the mahalanobis distance; d, d _all Is the final distance; lambda (lambda) ₁ 、λ ₂ 、λ ₃ The ratio between distances is controlled for super parameters.

Step 5, re-identifying, generating pseudo tags and sequencing

And (3) calculating the loss through the false label obtained in the step (4) and the dimension feature vector obtained in the step (3), updating the network, returning to the step (3) after updating the network, sequentially iterating to extract the features, calculating the distance, generating the false label, calculating the loss, and updating the network until the network converges or the iteration times are completed.

In order to more accurately find out samples with high similarity, a new constraint distance d is introduced on the basis of a k-reciprocal nearest neighbors algorithm _f (q, gi). The main idea of the distance calculation is to use the characteristics that the angles of view of the cameras are not overlapped and the angles are different in a specific time period, for example, the time period is as follows: for example, within 10s, the mark 1501 dataset has 25 frames of 1s and the frame difference is 250. On one hand, pedestrians with the same identity under the same camera cannot suddenly appear under the view angles of other cameras, the similarity of the pedestrian gesture with the same identity is high, and on the other hand, the pedestrian gesture is not observed uniformly under the view angles of different cameras, so that the similarity of the same pedestrian under the different cameras is poor, and pedestrians with the same identity and high similarity under the different cameras in a specific time period are eliminated, as shown in fig. 6.

Therefore, the introduced constraint distance can better pull up the similarity of the same pedestrian, and the similarity among different pedestrians is enlarged.

The distance calculation process is as follows:

(1) The matrix W with the same dimension as the matrix d is constructed, the value of the position where the sample meeting the same camera index and having the time difference of 10s is given to W is 0, the value of the position where the sample meeting different camera indexes and having the time difference of 10s is given to W is 1, and the other values are 0.5.

(2) Constructing a zero matrix d of the same dimension as the matrix d _f Will W and d _f Obtaining d after screening through dot multiplication _f Thus, the distance between the same identities can be shortened, and the distance between different identities can be enlarged.

The improved algorithm thus yields the final distance d _all Is that

d _all (q,g _i )＝(1-λ ₁ -λ ₂ )d _f (q,g _i )+λ ₁ d _j (q,g _i )+λ ₂ d(q,g _i )

Newly generated d _all May be used for the next density cluster generation of pseudo tags.

By density clustering algorithm, according to d _all Calculating a cluster global radius R, generating a pseudo tag as a supervision signal according to the R, and adding d _f The method can better distinguish the difference between different identities, can reduce the generation of noise labels during training, and contributes to the generation of better labels for later iteration.

After the pseudo-labels are generated, each feature vector f and its corresponding self-label y are used as two inputs to the Triplet Loss given an image.

L _triplet (f, y), f-feature vectors, y-cluster generated pseudo tags

According to the characteristic vector f _all Grouping images, pseudo tags may be obtained for each image, expressed asThus, a label may be generated based on the feature vector to create a new target data set, where each image has a pseudo label.

Finally, using self-labeling as supervision information, triple Loss Triplet Loss fine tuning is used for cross dataset adaptive pre-training models, with overall Loss functions of:

the existing unsupervised pedestrian re-identification is characterized in that feature vectors are extracted from unlabeled pictures through ResNet50, pseudo labels are generated through density clustering, meanwhile, pedestrian nodes are positioned through posture estimation, and background areas outside the nodes are shielded, so that a network is focused on distinguishing features of non-shielded areas, and the influence of mixed backgrounds on the network is avoided.

The method is characterized in that a new model-independent k-reciprocal nearest neighbors method is introduced into a network, the top k images which are most similar to probes are selected from a gamma through a k-reciprocal nearest neighbors algorithm, ranking is further constrained by using picture camera indexes and frame numbers on the basis, pedestrian images with high ranking under the same camera ID indexes are improved, pedestrian images with low ranking under different camera ID indexes are reduced, overall reliability is improved, and the rest sample images are processed through suppression.

The camera index and time information of the pictures are effectively utilized, the relevance of pedestrians under the camera can be effectively improved through the information, the identity information of pedestrians under different cameras can be effectively distinguished, and the difficult positive samples and the difficult negative samples are further well divided.

The foregoing description is only illustrative of the invention and is not to be construed as limiting the invention. Various modifications and variations of the present invention will be apparent to those skilled in the art. Any modification, equivalent replacement, improvement, or the like, which is within the spirit and principles of the present invention, should be included in the scope of the claims of the present invention.

Claims

1. A cross-vision pedestrian re-identification method for intelligent security and early warning is characterized by comprising the following steps of: the identification method comprises the following steps:

step 1, preprocessing a source domain sample and a target domain sample: identifying gesture points by gesture estimation on a source domain picture and a target domain picture, shielding the picture by the gesture points, pre-training the source domain picture and ID information of a model, and enabling the model to have the capability of extracting key features;

step 2, extracting a sample feature vector through ResNet 50: loading the pre-trained model in the step 1, inputting the images with global features and the half-body shielding images of the source domain and the target domain into a network, extracting features related to pedestrians through ResNet50, and obtaining feature vectors through global feature processing and local feature processing;

step 3, feature vector splicing after feature extraction: connecting the feature vector obtained through the global feature in the step 2 with the feature vector processed by the local feature to obtain a dimension feature vector;

step 4, calculating the distance to generate a pseudo tag: after extracting dimension feature vectors, calculating the distance between samples of a target domain and the distance between samples of a source domain to obtain a final distance matrix between the samples, obtaining a clustering radius R by using the distance matrix, and marking the samples of the target domain with pseudo labels by a density clustering method;

step 5, re-identifying, namely generating pseudo tags and sequencing the pseudo tags: calculating loss through a triple loss through the pseudo tag obtained in the step 4 and the dimension feature vector obtained in the step 3, updating the network, returning to the step 2 after updating the network, sequentially iterating to extract features, calculating distance, generating the pseudo tag, calculating loss, updating the network until the network converges or the iteration times are completed, wherein

In the step 4, the Jacquard distance and the Mars distance in the K mutual neighbors are used for calculating the distance between the target domain samples and the distance between the source domain samples, and the camera index and the frame number value information in the pictures are utilized in the calculation process to screen out the pedestrian pictures with relevance, and the constraint distance d is added _f The distance between the same identity samples is pulled up, so that the distances of different identities are enlarged;

the distance calculation process in the step 5 is as follows:

(2) Constructing a zero matrix d of the same dimension as the matrix d _o Will W and d _o Obtaining d after screening through dot multiplication _f Drawing inDistance between identical identities, and enlarging distance between different identities, and final distance is

Wherein d is _f Is a constraint distance; d, d _j Is the Jacquard distance; d is the mahalanobis distance; d, d _all Is the final distance; lambda (lambda) ₁ 、λ ₂ 、λ ₃ The ratio between distances is controlled for super parameters.

2. The cross-vision pedestrian re-recognition method for intelligent security and early warning according to claim 1, which is characterized by comprising the following steps: the preprocessing source domain sample and target domain sample in the step 1 specifically comprises the following steps:

step 1-1, dividing a matrix area: identifying gesture points through gesture estimation on pictures of a source domain and a target domain, estimating joint points of pedestrians through the gesture points, separating foreground and mixed background through the joint points, and selecting head joint points (x ₁ ,y ₁ ) Foot joint (x) ₂ ,y ₂ ) Left and right joint points (x ₃ ,y ₃ )、(x ₄ ,y ₄ ) A rectangular area is calculated through 4 coordinate points and image sizes, and the calculation formula is as follows:

And 1-2, after dividing the matrix area, carrying out shielding treatment on the image, and after projecting out the specific part of the pedestrian, respectively carrying out shielding treatment on the upper half body and the lower half body through the gesture estimation points.

3. The cross-vision pedestrian re-recognition method for intelligent security and early warning according to claim 1, which is characterized by comprising the following steps: in the step 5, the overall loss function is:

wherein L is _tri L is the overall loss _triplet For triplet loss, f _all Representing the feature vector.

4. The cross-vision pedestrian re-recognition method for intelligent security and early warning according to claim 1, which is characterized by comprising the following steps: the feature vector obtaining in the step 2 specifically includes:

the global feature processing is that according to the network structure of the ResNet50, an original image and a shielding background image sequentially pass through the ResNet50 and a global average pool layer, and after the global average pool layer output is subjected to batch normalization, a complete connection layer is accessed to obtain two 512-dimensional feature vectors;

the local feature processing is to input two half-body shielding images, sequentially access a global maximum pool layer after the two half-body shielding images, and obtain two feature vectors with 512 dimensions at an access complete connection layer after the global maximum pool layer output is subjected to batch normalization.

5. The system of the cross-vision pedestrian re-recognition method for intelligent security and early warning according to any one of claims 1-4, wherein: the system comprises a data preprocessing module, a data acquisition module, a network construction module, a network training module and a model evaluation module, wherein:

the data preprocessing module marks node information on the data set through the gesture estimation model, and then performs shielding processing on the image;

the data acquisition module loads data sets of a source domain and a target domain into a network, wherein the source domain data is a labeled data set, and the target domain data is an unlabeled data set;

the network construction module modifies a backbone network for extracting the characteristics of the input image, processes global characteristics and local characteristics respectively, and finally splices the characteristic vectors;

the network training process of the network training module is as follows: extracting the spliced feature vectors of the source domain and the target domain data through a network, calculating the distance between samples, obtaining a clustering radius R according to the distance, generating a pseudo tag as a supervision signal, calculating a loss function, and updating network parameters;

the model evaluation module inputs the data set into the network for evaluation after the network training is finished, and does not update network parameters, wherein

Calculating the distance between the target domain samples and the distance between the source domain samples by using the Jacquard distance and the Mahalanobis distance in the K mutual neighbors, and screening out pedestrian pictures with relevance by using the camera index and the frame number value information in the pictures in the calculation process, and adding the constraint distance d _f The distance between the same identity samples is pulled up, so that the distances of different identities are enlarged;

the process of calculating the distance between samples is as follows:

(2) Constructing a zero matrix d of the same dimension as the matrix d _o Will W and d _o Obtaining d after screening through dot multiplication _f The distance between the same identities is shortened, and the difference is enlargedDistance between identities, final distance is