CN113627380B - Cross-vision pedestrian re-identification method and system for intelligent security and early warning - Google Patents

Cross-vision pedestrian re-identification method and system for intelligent security and early warning Download PDF

Info

Publication number
CN113627380B
CN113627380B CN202110959012.7A CN202110959012A CN113627380B CN 113627380 B CN113627380 B CN 113627380B CN 202110959012 A CN202110959012 A CN 202110959012A CN 113627380 B CN113627380 B CN 113627380B
Authority
CN
China
Prior art keywords
distance
network
samples
pedestrian
matrix
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110959012.7A
Other languages
Chinese (zh)
Other versions
CN113627380A (en
Inventor
寇旗旗
程德强
李云龙
马尚
王晓艺
张皓翔
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangsu Huatu Mining Technology Co ltd
China University of Mining and Technology CUMT
Original Assignee
Jiangsu Huatu Mining Technology Co ltd
China University of Mining and Technology CUMT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangsu Huatu Mining Technology Co ltd, China University of Mining and Technology CUMT filed Critical Jiangsu Huatu Mining Technology Co ltd
Priority to CN202110959012.7A priority Critical patent/CN113627380B/en
Publication of CN113627380A publication Critical patent/CN113627380A/en
Application granted granted Critical
Publication of CN113627380B publication Critical patent/CN113627380B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/088Non-supervised learning, e.g. competitive learning
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to a cross-vision pedestrian re-identification method and a system for intelligent security and early warning, wherein the method comprises the following steps: step 1, preprocessing a source domain sample and a target domain sample, step 2, extracting sample feature vectors through ResNet50, step 3, splicing feature vectors after feature extraction, step 4, calculating distance, generating a pseudo tag, and step 5, re-identifying, generating a pseudo tag, and calculating loss. The invention effectively carries out shielding treatment on the introduced gesture estimation points and the mixed background, prevents the network from focusing attention on background information by utilizing shielding, and strengthens the network learning ability, thereby increasing the recognition precision of unsupervised pedestrian re-recognition.

Description

Cross-vision pedestrian re-identification method and system for intelligent security and early warning
Technical Field
The invention relates to a pedestrian re-identification method, in particular to a cross-vision pedestrian re-identification method for intelligent security and early warning.
Background
The awareness of guaranteeing the safety of people in public places is continuously increasing nowadays. Governments and organizations focus on the safety of public areas such as parks, schools, shopping centers and the like, and provide public security with huge financial and material costs. When the security early warning is constructed, the video monitoring system plays a key role. Today, a large number of cameras are developing as reliable tools to solve various security problems, such as finding stray children, preventing crimes and protecting forbidden areas. The existing processing of camera image data is mainly carried out in a manual video monitoring system, the monitoring camera shooting distribution is wide in public places, monitoring operators observe the detection means of a single camera at a time, specific events or abnormal conditions of the monitoring operators are analyzed, other available camera shooting resources are ignored, and meanwhile, the factors such as the number of pedestrians and the mixing of background information under each image are ignored, so that the processing is a challenging task and the task of consuming manpower and material resources. The pedestrian Re-identification (Persons Re-ID) based on machine vision can realize cross-domain positioning to inquire the images of pedestrians under different cameras, and timely master the dynamic track of the pedestrians in a certain range, so that the method plays a key role in security and early warning of public places, and therefore the effective utilization of the pedestrian Re-identification (Re-ID) under the cross-lens has important significance for security of the public places.
Pedestrian Re-recognition, namely, recognizing pedestrian images or videos with the same identity under cross-border head through a design algorithm model, and given a pedestrian sample to be queried, the object of Re-ID is to determine whether the person appears at another place at different time captured by a different camera, and the image, the video sequence and the text description can be used for representing the person to be queried. The objective evaluation of the algorithm model can replace human eyes to quickly find out similar pedestrian images under a plurality of non-overlapping cameras. The method of Re-ID is mainly divided into two trends: closed environments and open environments.
The data set in the closed environment has the following characteristics: (1) data type agreement; (2) there are sufficient and accurate labels; (3) limited number of pedestrian identities; (4) scene closure.
The data set in the open environment has the following characteristics: (1) unprocessed image/video; (2) unlabeled/noisy labels; (3) scene opening; (4) the number of pedestrians is large.
The supervised pedestrian re-recognition under the closed environment has reached a good level at present, but under the conditions of large data volume, noise labels, unprocessed data and the like applied to the open environment, the unsupervised pedestrian re-recognition still faces a great challenge, and the problems faced by the unsupervised pedestrian re-recognition at present mainly include: 1. the method for manually calibrating the label cannot be used; 2. the initial clustering generates noise labels, affects accuracy, etc.
Disclosure of Invention
In order to achieve the above purpose, the invention provides a cross-vision pedestrian re-recognition method for intelligent security and early warning, which extracts more significant features after clipping, splices global features and local features, and further screens out pedestrian pictures with relevance by using camera index and frame number value information in the pictures.
In order to achieve the above purpose, the invention is realized by the following technical scheme:
the invention relates to a cross-vision pedestrian re-identification method for intelligent security and early warning, which comprises the following steps:
step 1, preprocessing a source domain sample and a target domain sample: and identifying gesture points by gesture estimation on the source domain picture and the target domain picture, shielding the picture by the gesture points, pre-training the model by the source domain picture and the ID information, and enabling the model to have the capability of extracting key features.
The preprocessing of the source domain sample and the target domain sample specifically comprises the following steps:
step 1-1, dividing a matrix area: identifying gesture points through gesture estimation on pictures of a source domain and a target domain, estimating a pedestrian joint point through the gesture points (Openpost model), separating a foreground and a mixed background through the joint point, and selecting a head joint point (x) 1 ,y 1 ) Foot joint (x) 2 ,y 2 ) Left and right joint points (x 3 ,y 3 )、(x 4 ,y 4 ) A rectangular area is calculated through 4 coordinate points and image sizes, and the calculation formula is as follows:
the coordinate value of the upper left corner of the rectangular area where the pedestrian is located is (W) 1 ,H 1 ) The lower right corner coordinate value is (W 2 ,H 2 );
And 1-2, after dividing the matrix area, carrying out shielding treatment on the image, after projecting out the specific part of the pedestrian, respectively carrying out shielding treatment on the upper half body and the lower half body through the gesture estimation points, and inputting the image with the global characteristic and the shielding image of the half body into a network.
Step 2, extracting a sample feature vector through ResNet 50: loading the model pre-trained in the step 1, inputting the global feature image and the half-body shielding image with the source domain and the target domain into a network, extracting features related to pedestrians through ResNet50, and obtaining feature vectors through global feature processing and local feature processing, wherein the feature vector obtaining specifically comprises the following steps:
the global feature processing is that according to the network structure of ResNet50, a global feature image and a half-body shielding image sequentially pass through a global average pool layer (GAP layer), and after the output of the global average pool layer (GAP layer) is subjected to batch normalization, a complete connection layer (FC) is connected to obtain two 512-dimensional feature vectors;
the local feature processing is to input two half-body shielding images, sequentially access a global maximum pool layer (GMP layer) after the two half-body shielding images, and obtain two feature vectors with 512 dimensions at an access complete connection layer (FC) after the global maximum pool layer (GMP layer) output is subjected to batch normalization.
Step 3, feature vector splicing after feature extraction: and (3) connecting the feature vector obtained through the global feature in the step (2) with the feature vector processed by the local feature to obtain a dimensional feature vector, and splicing 4 512-dimensional vectors obtained through the global feature branch and the local feature branch together to form a 2048-dimensional feature vector, so that more potential detail features can be obtained through processing.
Step 4, calculating the distance to generate a pseudo tag: after extracting dimension feature vectors, calculating the distance between samples of a target domain and the distance between samples of a source domain to obtain a final distance matrix between the samples, obtaining a clustering radius R by using the distance matrix, and marking the samples of the target domain with more accurate pseudo labels by a density clustering method; in the step 4, the Jacquard distance and the Mars distance in the K mutual neighbors are used for calculating the distance between the target domain samples and the distance between the source domain samples, and the camera index and the frame number value information in the pictures are utilized in the calculation process to further screen out the pedestrian pictures with relevance, and the constraint distance d is added f And the distance between the same identity samples is shortened, so that the distances of different identities are enlarged.
Step 5, re-identifying, namely generating pseudo tags and sequencing the pseudo tags: and (3) calculating the loss through the false label obtained in the step (4) and the dimension feature vector obtained in the step (3), updating the network, returning to the step (3) after updating the network, sequentially iterating to extract the features, calculating the distance, generating the false label, calculating the loss, and updating the network until the network converges or the iteration times are completed.
The invention further improves that: the distance calculation process in the step 5 is as follows:
(1) Constructing a matrix W with the same dimension as the matrix d, assigning samples meeting the same camera index and having a time difference of 10s to the position where W is located with a value of 0, assigning samples meeting different camera indexes and having a time difference of 10s to the position where W is located with a value of 1, and assigning other conditions with a value of 0.5;
(2) Constructing a zero matrix d of the same dimension as the matrix d f Will W andd f obtaining d after screening through dot multiplication f The operation can pull the distance between the same identities, expand the distance between different identities and finally obtain the final distance
d(q,g i )=(1-λ 12 )d f (q,g i )+λ 1 d j (q,g i )+λ 2 d(q,g i )
Wherein d is f Is a constraint distance; d, d j Is the Jacquard distance; d is the mahalanobis distance; d, d all Is the final distance; lambda (lambda) 1 、λ 2 、λ 3 The ratio between the distances is controlled for super parameters; q is a picture in a probe set; g i Pictures are pooled for the gamma.
The invention further improves that: in the step 5, the overall loss function is:
L tri =L triplet (f all ,y it )+L triplet (f alle ,y it )
wherein L is tri L is the overall loss triplet Is a triplet penalty.
The beneficial effects of the invention are as follows:
firstly, the introduced gesture estimation points are effectively shielded, the shielding is utilized to prevent the network from focusing attention on background information, so that the network learning capacity is enhanced, the learning local significance characteristics can be better avoided, the foreground and the background are separated, the detail characteristics can be better extracted, the better pseudo tag is obtained, intra-class aggregation can be better realized, the network can better learn the special characteristics, and the recognition precision of unsupervised pedestrian re-recognition is increased;
secondly, camera index and time information of the pictures are effectively utilized, and the characteristics that the images of pedestrians under the same camera are high in confidence and the images of pedestrians under different cameras are low in confidence are considered, so that good constraint is provided for pedestrian retrieval under cross-border heads.
Drawings
Fig. 1 is a flow chart of the present invention.
Fig. 2 is a schematic structural diagram of a re-identification apparatus according to an embodiment of the present invention.
Fig. 3 is a picture of the occlusion process of the image in step 1 of the present invention.
Fig. 4 is a basic structural diagram of the res net50 of the present invention.
FIG. 5 is a schematic diagram of the k-reciprocal nearest neighbors algorithm of the present invention.
Fig. 6 is a schematic diagram showing the method of the present invention capable of approaching pedestrians with the same identity at the same camera approach time and pushing away pedestrians with different identities at different camera approach times.
FIG. 7 is a schematic diagram of a cross-view pedestrian re-recognition system of the present invention.
Detailed Description
Embodiments of the invention are disclosed in the drawings, and for purposes of explanation, numerous practical details are set forth in the following description. However, it should be understood that these practical details are not to be taken as limiting the invention. That is, in some embodiments of the invention, these practical details are unnecessary.
As shown in fig. 7, the invention is a cross-vision pedestrian re-recognition system for intelligent security and early warning, which comprises a data preprocessing module, a data acquisition module, a network construction module, a network training module and a model evaluation module, wherein: the data preprocessing module marks node information on the data set through the gesture estimation model, and then performs shielding processing on the image; the data acquisition module loads data sets of a source domain and a target domain into a network, wherein the source domain data is a labeled data set, and the target domain data is an unlabeled data set; the network construction module modifies a backbone network for extracting the characteristics of the input image, processes global characteristics and local characteristics respectively, and finally splices the characteristic vectors; the network training process of the network training module is as follows: and extracting the spliced feature vectors of the source domain and the target domain data through a network, calculating the distance between samples, obtaining a clustering radius R according to the distance, generating a pseudo tag as a supervision signal, calculating a loss function, and updating network parameters. And the model evaluation module inputs the data set into the network for evaluation after the network training is finished, and does not update network parameters.
The cross-vision pedestrian re-recognition system for intelligent security and early warning is realized by using a cross-vision pedestrian re-recognition method, and a flow frame diagram of the method is shown in figure 1. The first part inputs the source domain sample and the target domain sample, the second part replaces a trunk part ResNet50 of a part of the structure, the third part splices the features proposed by the branches, and the fourth part calculates the distance. The fifth part is to extract the characteristics of the target domain test set to calculate each distance, weight the characteristics to obtain the final distance, and rank and score the final distance, the recognition system is realized by a re-recognition device which comprises an image acquisition unit, a core board based on the image processing of Atlas 200 development Kit, a PC control unit and a display module, and the structure of the PC control unit is shown as 2.
Step 1: preprocessing source domain samples and target domain samples, input network
Identifying gesture points by gesture estimation on a source domain picture and a target domain picture, shielding the picture by the gesture points, pre-training a model by the source domain picture and ID information, and enabling the model to have the capability of extracting key features, wherein the ID information refers to camera index and frame number value information;
at the beginning, the source domain image and ID information are input, and pre-training of the model is carried out, so that the model has the capability of extracting key features. Firstly, estimating the joint point of a pedestrian through a gesture point (Openpost model), separating a foreground and a mixed background through the joint point, and selecting a head joint point (x 1 ,y 1 ) Foot joint (x) 2 ,y 2 ) Left and right joint points (x 3 ,y 3 )、(x 4 ,y 4 ) A rectangular area is calculated through 4 coordinate points and image sizes, and the calculation formula is as follows:
the coordinate value of the upper left corner of the rectangular area where the pedestrian is located is (W) 1 ,H 1 ) The lower right corner coordinate value is (W 2 ,H 2 ). After the matrix area is divided, the image is subjected to occlusion processing as shown in fig. 3. After the specific part of the pedestrian protrudes, the upper half body and the lower half body are respectively subjected to shielding treatment through the gesture estimation points, and the images with global characteristics and the shielding images of the half body are input into a network.
The source domain data are trained according to identity loss ID loss and Triplet loss, 10 epochs of the pre-trained model are performed, and the pre-trained model is used for training. After the accuracy is more than 70% in the pre-training, the source domain data is subjected to data enhancement, such as overturning, random erasing, random cutting and other operations, so that the model is prevented from being fitted. In addition, at the last layer Fully Connect Layer of the network (FC layer), it should be noted that at the time of pre-training, the layer dimension is the number of pedestrian IDs, and the loss functions of the source domain are the softmax function and the triplet loss
L src-id Represents an ID loss function, n s Represents the number of samples in the source domain data, p (y s (i) |f s (i) ) The representation is that the ith sample of the source domain summary belongs to pedestrian y (i) Is a probability of (2).
L src-tri Representing the triplet loss, alpha is the interval hyper-parameter,representing the characteristics of the Anchor sample extraction, +.>Representing the characteristics of other positive sample extractions that are the same as the i-th sample identity, representing the characteristics of other negative sample extractions that are different from the i-th sample identity.
The overall loss function is:
L src =L src-tri +L src-id
wherein L is src Representing the overall loss of source domain, L src-tri Representing the triplet loss of the source domain, L src-id Indicating loss of identity of the source domain.
Step 2, extracting sample feature vectors through ResNet50
Loading the pre-trained model in the step 1, inputting the images with global features and the half-body shielding images of the source domain and the target domain into a network, extracting features related to pedestrians through the ResNet50, obtaining feature vectors through global feature processing and local feature processing, extracting features after the images pass through 5 stages of the ResNet50, and processing the global features and the local features separately, wherein 5 stages of the backbone network ResNet50 are shown in fig. 4.
The global feature processing is that according to the network structure of ResNet50, a global feature image and a half-body shielding image sequentially pass through a global average pool layer (GAP layer), and after the output of the global average pool layer (GAP layer) is subjected to batch normalization, a complete connection layer (FC) is connected to obtain two 512-dimensional feature vectors;
the local feature processing is to input two half-body shielding images, sequentially access a global maximum pool layer (GMP layer) after the two half-body shielding images, and obtain two feature vectors with 512 dimensions at an access complete connection layer (FC) after the global maximum pool layer (GMP layer) output is subjected to batch normalization.
Step 3, feature vector splicing after feature extraction
After extracting the features of the input image, we connect the feature vectors of the two branches, and one of the feature vectors is spliced by obtaining 4 512-dimensional vectors through the global feature branch and the local feature branch to form a 2048-dimensional feature vector, so that more potential detail features can be obtained through processing.
f iall =[f i ,f i' ,f iup ,f ilow ]
After the base model, f is also used i The next FC layer gets a global embedded vector f i_e 512-dims, and is equal to f i This FC layer will also be updated during training, sharing the same pseudo tag.
Step 4, calculating the distance to generate a pseudo tag
After 2048-dimensional features are extracted and fused, the model is pretrained through source domain data, the effect of the model on the source domain data is good, but when the target domain training is carried out, the problem of domain difference exists, meanwhile, the problem is solved by generating a pseudo label through clustering because a target domain sample is not labeled, and before the pseudo label is generated, the clustering global radius length R is determined.
R is obtained by weighting the Jacquard distance in the k-reciprocal nearest neighbors algorithm and the Markov distance between the source domain and the target domain. The main idea of the K-reciprocal nearest neighbors algorithm is that if two pictures a, B are similar, then B should be within the first K neighbors of a, and conversely, a should be within the first K neighbors of B. As shown in fig. 5.
And calculating the Mahalanobis distance and the Jacquard distance of the source domain and the target domain from the feature vectors. The initial ranking is obtained by calculating the Mahalanobis distance, the Jacquard distance is the distance between k neighbors, and the final distance (final distance) is obtained by scaling the two distances by a certain scaling factor.
d(q,g i )=(1-λ)d J (q,g i )+λd(q,g i )
Wherein d is j Is the Jacquard distance; d is the mahalanobis distance; d, d all Is the final distance; lambda (lambda) 1 、λ 2 、λ 3 The ratio between distances is controlled for super parameters.
Step 5, re-identifying, generating pseudo tags and sequencing
And (3) calculating the loss through the false label obtained in the step (4) and the dimension feature vector obtained in the step (3), updating the network, returning to the step (3) after updating the network, sequentially iterating to extract the features, calculating the distance, generating the false label, calculating the loss, and updating the network until the network converges or the iteration times are completed.
In order to more accurately find out samples with high similarity, a new constraint distance d is introduced on the basis of a k-reciprocal nearest neighbors algorithm f (q, gi). The main idea of the distance calculation is to use the characteristics that the angles of view of the cameras are not overlapped and the angles are different in a specific time period, for example, the time period is as follows: for example, within 10s, the mark 1501 dataset has 25 frames of 1s and the frame difference is 250. On one hand, pedestrians with the same identity under the same camera cannot suddenly appear under the view angles of other cameras, the similarity of the pedestrian gesture with the same identity is high, and on the other hand, the pedestrian gesture is not observed uniformly under the view angles of different cameras, so that the similarity of the same pedestrian under the different cameras is poor, and pedestrians with the same identity and high similarity under the different cameras in a specific time period are eliminated, as shown in fig. 6.
Therefore, the introduced constraint distance can better pull up the similarity of the same pedestrian, and the similarity among different pedestrians is enlarged.
The distance calculation process is as follows:
(1) The matrix W with the same dimension as the matrix d is constructed, the value of the position where the sample meeting the same camera index and having the time difference of 10s is given to W is 0, the value of the position where the sample meeting different camera indexes and having the time difference of 10s is given to W is 1, and the other values are 0.5.
(2) Constructing a zero matrix d of the same dimension as the matrix d f Will W and d f Obtaining d after screening through dot multiplication f Thus, the distance between the same identities can be shortened, and the distance between different identities can be enlarged.
The improved algorithm thus yields the final distance d all Is that
d all (q,g i )=(1-λ 12 )d f (q,g i )+λ 1 d j (q,g i )+λ 2 d(q,g i )
Newly generated d all May be used for the next density cluster generation of pseudo tags.
By density clustering algorithm, according to d all Calculating a cluster global radius R, generating a pseudo tag as a supervision signal according to the R, and adding d f The method can better distinguish the difference between different identities, can reduce the generation of noise labels during training, and contributes to the generation of better labels for later iteration.
After the pseudo-labels are generated, each feature vector f and its corresponding self-label y are used as two inputs to the Triplet Loss given an image.
L triplet (f, y), f-feature vectors, y-cluster generated pseudo tags
According to the characteristic vector f all Grouping images, pseudo tags may be obtained for each image, expressed asThus, a label may be generated based on the feature vector to create a new target data set, where each image has a pseudo label.
Finally, using self-labeling as supervision information, triple Loss Triplet Loss fine tuning is used for cross dataset adaptive pre-training models, with overall Loss functions of:
the existing unsupervised pedestrian re-identification is characterized in that feature vectors are extracted from unlabeled pictures through ResNet50, pseudo labels are generated through density clustering, meanwhile, pedestrian nodes are positioned through posture estimation, and background areas outside the nodes are shielded, so that a network is focused on distinguishing features of non-shielded areas, and the influence of mixed backgrounds on the network is avoided.
The method is characterized in that a new model-independent k-reciprocal nearest neighbors method is introduced into a network, the top k images which are most similar to probes are selected from a gamma through a k-reciprocal nearest neighbors algorithm, ranking is further constrained by using picture camera indexes and frame numbers on the basis, pedestrian images with high ranking under the same camera ID indexes are improved, pedestrian images with low ranking under different camera ID indexes are reduced, overall reliability is improved, and the rest sample images are processed through suppression.
The camera index and time information of the pictures are effectively utilized, the relevance of pedestrians under the camera can be effectively improved through the information, the identity information of pedestrians under different cameras can be effectively distinguished, and the difficult positive samples and the difficult negative samples are further well divided.
The foregoing description is only illustrative of the invention and is not to be construed as limiting the invention. Various modifications and variations of the present invention will be apparent to those skilled in the art. Any modification, equivalent replacement, improvement, or the like, which is within the spirit and principles of the present invention, should be included in the scope of the claims of the present invention.

Claims (5)

1. A cross-vision pedestrian re-identification method for intelligent security and early warning is characterized by comprising the following steps of: the identification method comprises the following steps:
step 1, preprocessing a source domain sample and a target domain sample: identifying gesture points by gesture estimation on a source domain picture and a target domain picture, shielding the picture by the gesture points, pre-training the source domain picture and ID information of a model, and enabling the model to have the capability of extracting key features;
step 2, extracting a sample feature vector through ResNet 50: loading the pre-trained model in the step 1, inputting the images with global features and the half-body shielding images of the source domain and the target domain into a network, extracting features related to pedestrians through ResNet50, and obtaining feature vectors through global feature processing and local feature processing;
step 3, feature vector splicing after feature extraction: connecting the feature vector obtained through the global feature in the step 2 with the feature vector processed by the local feature to obtain a dimension feature vector;
step 4, calculating the distance to generate a pseudo tag: after extracting dimension feature vectors, calculating the distance between samples of a target domain and the distance between samples of a source domain to obtain a final distance matrix between the samples, obtaining a clustering radius R by using the distance matrix, and marking the samples of the target domain with pseudo labels by a density clustering method;
step 5, re-identifying, namely generating pseudo tags and sequencing the pseudo tags: calculating loss through a triple loss through the pseudo tag obtained in the step 4 and the dimension feature vector obtained in the step 3, updating the network, returning to the step 2 after updating the network, sequentially iterating to extract features, calculating distance, generating the pseudo tag, calculating loss, updating the network until the network converges or the iteration times are completed, wherein
In the step 4, the Jacquard distance and the Mars distance in the K mutual neighbors are used for calculating the distance between the target domain samples and the distance between the source domain samples, and the camera index and the frame number value information in the pictures are utilized in the calculation process to screen out the pedestrian pictures with relevance, and the constraint distance d is added f The distance between the same identity samples is pulled up, so that the distances of different identities are enlarged;
the distance calculation process in the step 5 is as follows:
(1) Constructing a matrix W with the same dimension as the matrix d, assigning samples meeting the same camera index and having a time difference of 10s to the position where W is located with a value of 0, assigning samples meeting different camera indexes and having a time difference of 10s to the position where W is located with a value of 1, and assigning other conditions with a value of 0.5;
(2) Constructing a zero matrix d of the same dimension as the matrix d o Will W and d o Obtaining d after screening through dot multiplication f Drawing inDistance between identical identities, and enlarging distance between different identities, and final distance is
d all (q,g i )=(1-λ 12 )d f (q,g i )+λ 1 d j (q,g i )+λ 2 d(q,g i )
Wherein d is f Is a constraint distance; d, d j Is the Jacquard distance; d is the mahalanobis distance; d, d all Is the final distance; lambda (lambda) 1 、λ 2 、λ 3 The ratio between distances is controlled for super parameters.
2. The cross-vision pedestrian re-recognition method for intelligent security and early warning according to claim 1, which is characterized by comprising the following steps: the preprocessing source domain sample and target domain sample in the step 1 specifically comprises the following steps:
step 1-1, dividing a matrix area: identifying gesture points through gesture estimation on pictures of a source domain and a target domain, estimating joint points of pedestrians through the gesture points, separating foreground and mixed background through the joint points, and selecting head joint points (x 1 ,y 1 ) Foot joint (x) 2 ,y 2 ) Left and right joint points (x 3 ,y 3 )、(x 4 ,y 4 ) A rectangular area is calculated through 4 coordinate points and image sizes, and the calculation formula is as follows:
the coordinate value of the upper left corner of the rectangular area where the pedestrian is located is (W) 1 ,H 1 ) The lower right corner coordinate value is (W 2 ,H 2 );
And 1-2, after dividing the matrix area, carrying out shielding treatment on the image, and after projecting out the specific part of the pedestrian, respectively carrying out shielding treatment on the upper half body and the lower half body through the gesture estimation points.
3. The cross-vision pedestrian re-recognition method for intelligent security and early warning according to claim 1, which is characterized by comprising the following steps: in the step 5, the overall loss function is:
wherein L is tri L is the overall loss triplet For triplet loss, f all Representing the feature vector.
4. The cross-vision pedestrian re-recognition method for intelligent security and early warning according to claim 1, which is characterized by comprising the following steps: the feature vector obtaining in the step 2 specifically includes:
the global feature processing is that according to the network structure of the ResNet50, an original image and a shielding background image sequentially pass through the ResNet50 and a global average pool layer, and after the global average pool layer output is subjected to batch normalization, a complete connection layer is accessed to obtain two 512-dimensional feature vectors;
the local feature processing is to input two half-body shielding images, sequentially access a global maximum pool layer after the two half-body shielding images, and obtain two feature vectors with 512 dimensions at an access complete connection layer after the global maximum pool layer output is subjected to batch normalization.
5. The system of the cross-vision pedestrian re-recognition method for intelligent security and early warning according to any one of claims 1-4, wherein: the system comprises a data preprocessing module, a data acquisition module, a network construction module, a network training module and a model evaluation module, wherein:
the data preprocessing module marks node information on the data set through the gesture estimation model, and then performs shielding processing on the image;
the data acquisition module loads data sets of a source domain and a target domain into a network, wherein the source domain data is a labeled data set, and the target domain data is an unlabeled data set;
the network construction module modifies a backbone network for extracting the characteristics of the input image, processes global characteristics and local characteristics respectively, and finally splices the characteristic vectors;
the network training process of the network training module is as follows: extracting the spliced feature vectors of the source domain and the target domain data through a network, calculating the distance between samples, obtaining a clustering radius R according to the distance, generating a pseudo tag as a supervision signal, calculating a loss function, and updating network parameters;
the model evaluation module inputs the data set into the network for evaluation after the network training is finished, and does not update network parameters, wherein
Calculating the distance between the target domain samples and the distance between the source domain samples by using the Jacquard distance and the Mahalanobis distance in the K mutual neighbors, and screening out pedestrian pictures with relevance by using the camera index and the frame number value information in the pictures in the calculation process, and adding the constraint distance d f The distance between the same identity samples is pulled up, so that the distances of different identities are enlarged;
the process of calculating the distance between samples is as follows:
(1) Constructing a matrix W with the same dimension as the matrix d, assigning samples meeting the same camera index and having a time difference of 10s to the position where W is located with a value of 0, assigning samples meeting different camera indexes and having a time difference of 10s to the position where W is located with a value of 1, and assigning other conditions with a value of 0.5;
(2) Constructing a zero matrix d of the same dimension as the matrix d o Will W and d o Obtaining d after screening through dot multiplication f The distance between the same identities is shortened, and the difference is enlargedDistance between identities, final distance is
d all (q,g i )=(1-λ 12 )d f (q,g i )+λ 1 d j (q,g i )+λ 2 d(q,g i )
Wherein d is f Is a constraint distance; d, d j Is the Jacquard distance; d is the mahalanobis distance; d, d all Is the final distance; lambda (lambda) 1 、λ 2 、λ 3 The ratio between distances is controlled for super parameters.
CN202110959012.7A 2021-08-20 2021-08-20 Cross-vision pedestrian re-identification method and system for intelligent security and early warning Active CN113627380B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110959012.7A CN113627380B (en) 2021-08-20 2021-08-20 Cross-vision pedestrian re-identification method and system for intelligent security and early warning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110959012.7A CN113627380B (en) 2021-08-20 2021-08-20 Cross-vision pedestrian re-identification method and system for intelligent security and early warning

Publications (2)

Publication Number Publication Date
CN113627380A CN113627380A (en) 2021-11-09
CN113627380B true CN113627380B (en) 2024-03-15

Family

ID=78386846

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110959012.7A Active CN113627380B (en) 2021-08-20 2021-08-20 Cross-vision pedestrian re-identification method and system for intelligent security and early warning

Country Status (1)

Country Link
CN (1) CN113627380B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116405651B (en) * 2023-06-09 2023-08-01 中国矿业大学(北京) Multi-camera cross-view pedestrian data generation method and system
CN118230362B (en) * 2024-05-22 2024-10-18 西北工业大学 Target re-identification method based on visual angle self-adaptive mechanism
CN118351340B (en) * 2024-06-17 2024-08-20 中国海洋大学 Double-branch non-supervision target re-identification method and system based on sample mining

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110942025A (en) * 2019-11-26 2020-03-31 河海大学 Unsupervised cross-domain pedestrian re-identification method based on clustering
CN111476168A (en) * 2020-04-08 2020-07-31 山东师范大学 Cross-domain pedestrian re-identification method and system based on three stages
CN111666843A (en) * 2020-05-25 2020-09-15 湖北工业大学 Pedestrian re-identification method based on global feature and local feature splicing
CN111967294A (en) * 2020-06-23 2020-11-20 南昌大学 Unsupervised domain self-adaptive pedestrian re-identification method
CN112036322A (en) * 2020-09-01 2020-12-04 清华大学 Method, system and device for constructing cross-domain pedestrian re-identification model of multi-task network
CN112069940A (en) * 2020-08-24 2020-12-11 武汉大学 Cross-domain pedestrian re-identification method based on staged feature learning

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11721028B2 (en) * 2018-05-28 2023-08-08 Universiteit Gent Motion segmentation in video from non-stationary cameras

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110942025A (en) * 2019-11-26 2020-03-31 河海大学 Unsupervised cross-domain pedestrian re-identification method based on clustering
CN111476168A (en) * 2020-04-08 2020-07-31 山东师范大学 Cross-domain pedestrian re-identification method and system based on three stages
CN111666843A (en) * 2020-05-25 2020-09-15 湖北工业大学 Pedestrian re-identification method based on global feature and local feature splicing
CN111967294A (en) * 2020-06-23 2020-11-20 南昌大学 Unsupervised domain self-adaptive pedestrian re-identification method
CN112069940A (en) * 2020-08-24 2020-12-11 武汉大学 Cross-domain pedestrian re-identification method based on staged feature learning
CN112036322A (en) * 2020-09-01 2020-12-04 清华大学 Method, system and device for constructing cross-domain pedestrian re-identification model of multi-task network

Also Published As

Publication number Publication date
CN113627380A (en) 2021-11-09

Similar Documents

Publication Publication Date Title
CN108764065B (en) Pedestrian re-recognition feature fusion aided learning method
Zheng et al. Improving visual reasoning through semantic representation
CN113627380B (en) Cross-vision pedestrian re-identification method and system for intelligent security and early warning
Luo et al. Traffic sign recognition using a multi-task convolutional neural network
Zhou et al. Salient region detection via integrating diffusion-based compactness and local contrast
CN111709311B (en) Pedestrian re-identification method based on multi-scale convolution feature fusion
CN111709409A (en) Face living body detection method, device, equipment and medium
CN111539370A (en) Image pedestrian re-identification method and system based on multi-attention joint learning
CN110503076B (en) Video classification method, device, equipment and medium based on artificial intelligence
US10445602B2 (en) Apparatus and method for recognizing traffic signs
CN108960059A (en) A kind of video actions recognition methods and device
CN109344285A (en) A kind of video map construction and method for digging, equipment towards monitoring
CN106909938B (en) Visual angle independence behavior identification method based on deep learning network
CN110516707B (en) Image labeling method and device and storage medium thereof
JP2022082493A (en) Pedestrian re-identification method for random shielding recovery based on noise channel
CN113221770B (en) Cross-domain pedestrian re-recognition method and system based on multi-feature hybrid learning
CN110956158A (en) Pedestrian shielding re-identification method based on teacher and student learning frame
Ouchra et al. Object detection approaches in images: a survey
CN115482508A (en) Reloading pedestrian re-identification method, reloading pedestrian re-identification device, reloading pedestrian re-identification equipment and computer-storable medium
Prasanna et al. RETRACTED ARTICLE: An effiecient human tracking system using Haar-like and hog feature extraction
Yang et al. Bottom-up foreground-aware feature fusion for practical person search
Pang et al. Analysis of computer vision applied in martial arts
Gao et al. Occluded person re-identification based on feature fusion and sparse reconstruction
Gonzalez-Soler et al. Semi-synthetic data generation for tattoo segmentation
Gong et al. Learning human pose in crowd

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB03 Change of inventor or designer information
CB03 Change of inventor or designer information

Inventor after: Kou Qiqi

Inventor after: Cheng Deqiang

Inventor after: Li Yunlong

Inventor after: Ma Shang

Inventor after: Wang Xiaoyi

Inventor after: Zhang Haoxiang

Inventor before: Li Yunlong

Inventor before: Ma Shang

Inventor before: Cheng Deqiang

Inventor before: Kou Qiqi

Inventor before: Wang Xiaoyi

Inventor before: Zhang Haoxiang

GR01 Patent grant
GR01 Patent grant