CN112381024B - Multi-mode-fused unsupervised pedestrian re-identification rearrangement method - Google Patents

Multi-mode-fused unsupervised pedestrian re-identification rearrangement method Download PDF

Info

Publication number
CN112381024B
CN112381024B CN202011313048.XA CN202011313048A CN112381024B CN 112381024 B CN112381024 B CN 112381024B CN 202011313048 A CN202011313048 A CN 202011313048A CN 112381024 B CN112381024 B CN 112381024B
Authority
CN
China
Prior art keywords
pedestrian
information
time
camera
wifi
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011313048.XA
Other languages
Chinese (zh)
Other versions
CN112381024A (en
Inventor
吕建明
林少川
梁天保
胡超杰
莫晚成
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
South China University of Technology SCUT
Original Assignee
South China University of Technology SCUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by South China University of Technology SCUT filed Critical South China University of Technology SCUT
Priority to CN202011313048.XA priority Critical patent/CN112381024B/en
Publication of CN112381024A publication Critical patent/CN112381024A/en
Application granted granted Critical
Publication of CN112381024B publication Critical patent/CN112381024B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects
    • G06V20/53Recognition of crowd images, e.g. recognition of crowd congestion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Abstract

The invention discloses a multi-mode integrated unsupervised pedestrian re-identification rearrangement method, which comprises the following steps: collecting multi-mode information of pedestrians in the walking process; extracting pedestrian characteristics by using a convolutional neural network model, and calculating visual similarity; constructing image space-time distribution by utilizing the image space-time information; constructing WiFi space-time distribution by utilizing WiFi information; and merging the visual similarity, the image space-time distribution and the WiFi space-time distribution, and rearranging the sequencing results of the pedestrian re-identification. The method disclosed by the invention synthesizes the multi-mode information to carry out secondary rearrangement, is an effective measure for reducing the search space, and effectively overcomes the defect that the traditional pedestrian re-identification method based on visual characteristics is sensitive to the monitoring environment.

Description

Multi-mode-fused unsupervised pedestrian re-identification rearrangement method
Technical Field
The invention belongs to the field of multi-mode intelligent security, and particularly relates to a multi-mode integrated unsupervised pedestrian re-identification rearrangement method.
Background
At present, pedestrian re-identification, also called pedestrian re-identification, aims to quickly and effectively search out target characters in massive monitoring videos, can track the target characters, identify the identity, locate the missing population and the like, and plays an important role in safe cities. Pedestrian re-identification attracts extensive research due to its huge application value and the challenging problems of angle, illumination, shielding, face blurring and the like.
The current mainstream pedestrian re-identification method mainly utilizes a labeled data set to carry out model training, but in reality, the labeled data consumes a great deal of manpower and financial resources, and the acquisition difficulty is high. The existing unsupervised pedestrian re-recognition research is mainly based on the appearance characteristics of pedestrian images. In existing appearance-based pedestrian re-recognition studies, researchers have developed many approaches around feature extraction and similarity metrics. The former focus is on designing a robust and reliable pedestrian image characteristic representation model, namely different pedestrians can be distinguished, and meanwhile, the influence of illumination and visual angle change can be avoided; the latter focus is on learning a distance function conforming to the characteristic distribution of the pedestrian image characteristics, so that the characteristic distance of the same pedestrian image is smaller, and the characteristic distances of different pedestrian images are larger. However, these methods still present significant challenges for application to actual monitoring services. Mainly expressed in that pictures in the problem of re-identification of pedestrians are sourced from different cameras, and the appearance characteristics of the pictures of the same pedestrian are changed to a certain extent due to the influence of the angles, illumination and other environments of the different cameras; in contrast, due to variations in pedestrian pose and camera angle, the appearance characteristics of different pedestrians may be more similar than the appearance characteristics of the same person in different cameras.
In order to minimize the interference of uncontrollable monitoring environmental factors, the existing pedestrian re-recognition technology has to provide a group of images in the recognition result of each monitoring device for people to select, and then refine the recognition result through an interactive related feedback method. The processing mode not only increases the workload of manual research and judgment and reduces the automation degree of video analysis, but also can greatly change the appearance characteristics of pedestrians due to the differences of visual angles and illumination, and can lead to the fact that the provided result with the earlier sequence is not necessarily more reliable.
Disclosure of Invention
The invention aims to solve the defects in the prior art and provides a multi-mode integrated unsupervised pedestrian re-identification rearrangement method.
The aim of the invention can be achieved by adopting the following technical scheme:
a method for re-identifying and re-arranging an unsupervised pedestrian fused with multiple modes, comprising the following steps:
s1, acquiring multi-mode information of a pedestrian when the pedestrian walks, wherein the multi-mode information comprises a pedestrian image, camera ID information acquired by the pedestrian image, time information acquired by the pedestrian image and WiFi information captured by the pedestrian when the pedestrian passes through each camera;
s2, extracting relevant characteristics of pedestrians from the pedestrian images obtained in the step S1 by using a convolutional neural network model, calculating visual similarity between the pedestrian images, and then sequencing to obtain an original pedestrian re-identification sequencing result;
s3, carrying out statistical calculation on the camera ID information acquired by the pedestrian image and the time information acquired by the pedestrian image obtained in the step S1, and constructing image space-time distribution;
s4, carrying out statistical calculation on the WiFi information captured by the pedestrians passing through each camera obtained in the step S1, and constructing WiFi space-time distribution;
and S5, re-fusing the visual similarity obtained in the step S2, the spatial-temporal distribution of the images obtained in the step S3 and the WiFi spatial-temporal distribution obtained in the step S4, and re-arranging the original pedestrian re-identification sequencing result obtained in the step S2 for the second time.
Further, the process of step S1 is as follows:
s11, acquiring a pedestrian image from a monitoring video acquired by a specified road section crossing camera equipment, dividing the monitoring video acquired by the camera equipment into video frames of one frame by one frame, and then carrying out pedestrian detection on the video frames by an SSD pedestrian detection algorithm or a fast RCNN pedestrian detection algorithm;
s12, recording time information displayed in the monitoring video when each pedestrian image is acquired while acquiring the pedestrian image, and taking the time information as the time information of the pedestrian image acquired under the camera equipment; recording the ID information of the camera equipment as the space information of the pedestrian image acquired under the camera equipment;
s13, collecting WiFi signals sent by the mobile terminal when pedestrians pass through each camera by utilizing a WiFi collector arranged near the camera in the moving process of the pedestrians, wherein the WiFi information comprises unique MAC information of the mobile terminal equipment, time information of capturing the WiFi information and camera equipment ID information of capturing the WiFi information in a certain camera equipment;
s14, dividing the pedestrian image acquired in the step S11 into a query set and a candidate set randomly or proportionally.
Further, in the step S2, relevant features of pedestrians are extracted by using a res net-50 convolutional neural network model, then visual similarity between pedestrian images is calculated by using cosine similarity, and the obtained visual similarity is ordered in a descending order to obtain an original pedestrian re-identification ordering result, wherein the network structure of the res net-50 convolutional neural network model is sequentially connected from an input layer to an output layer as follows: the method comprises the steps of (1) adding a layer of a flexible material, (C) and (B) to the layer of the flexible material, and (C) adding a layer of a flexible material (C) to the layer of the flexible material, wherein the layer of the flexible material is a flexible material, and the (C) is a flexible material, and is a flexible material for the (C) and a flexible material (L) and (B) and (C) is a flexible material.
Further, the process of step S3 is as follows:
s31, splicing the visual similarity between the pedestrian images obtained in the step S2 into a visual similarity matrix, wherein the matrix size is Q multiplied by P, Q is the number of pedestrian images in the query set, and P is the number of pedestrian images in the candidate set;
s32, sorting the visual similarity matrixes according to a descending order of rows, and selecting K images with the largest similarity for each row to obtain a Q multiplied by K screening matrix;
s33, regarding each row in the screening matrix as the same person, and then performing image space-time distribution calculation.
Further, the process of step S33 is as follows:
s331, calculating barrel time difference of the same person moving across cameras by using time information and space information of the pedestrian image obtained in the step S1:
Figure BDA0002790425460000051
wherein t is i And t j Respectively denoted by c i And c j Time, t, of each of two pedestrian images appearing under camera interval A sub-bucket representing a difference in migration time across the camera;
s332, carrying out frequency statistics on the calculated value:
Figure BDA0002790425460000052
wherein, the liquid crystal display device comprises a liquid crystal display device,
Figure BDA0002790425460000053
is->
Figure BDA0002790425460000054
The statistical sum of occurrence frequency, i is any one of the sub-bucket time differences, < >>
Figure BDA0002790425460000055
Statistics of occurrence frequency for any one sub-bucket time difference and +.>
Figure BDA0002790425460000056
Sum of all frequency statistics;
s333, on a two-dimensional rectangular coordinate axis
Figure BDA0002790425460000057
Is a horizontal axis, in->
Figure BDA0002790425460000058
And drawing a dot connecting line for the vertical axis to obtain the space-time distribution of the image.
Further, the process of step S4 is as follows:
s41, calculating a barrel time difference of the same MAC information in the migration of the cross-camera equipment by utilizing the unique MAC information of the mobile terminal equipment, the time information of capturing the WiFi information and the camera equipment ID information of capturing the WiFi information in a certain camera equipment, which are obtained in the step S13:
Figure BDA0002790425460000059
wherein t is i And t j Respectively denoted by c i And c j Time, t ', of each of two pieces of WiFi information appearing under camera' interval A sub-bucket representing a difference in migration time across the camera;
s42, carrying out frequency statistics on the calculated value:
Figure BDA0002790425460000061
wherein, the liquid crystal display device comprises a liquid crystal display device,
Figure BDA0002790425460000062
is->
Figure BDA0002790425460000063
The statistical sum of occurrence frequency, l' is any one of the sub-barrel time differences,/>
Figure BDA0002790425460000064
Statistics of occurrence frequency for any one sub-bucket time difference and +.>
Figure BDA0002790425460000065
Sum of all frequency statistics;
s43, on a two-dimensional rectangular coordinate axis
Figure BDA0002790425460000066
Is a horizontal axis, in->
Figure BDA0002790425460000067
And (5) drawing a point connecting line for the longitudinal axis to obtain WiFi space-time distribution.
Further, the process of step S5 is as follows:
s51, calculating visual probability according to the visual similarity obtained in the step S2:
Figure BDA0002790425460000068
wherein, the liquid crystal display device comprises a liquid crystal display device,
Figure BDA0002790425460000069
representative camera c i And c j V respectively obtained m And v n Visual similarity of the two images, delta and beta are super parameters;
s52, calculating the space-time probability of the image according to the space-time distribution of the image obtained in the step S3:
Figure BDA00027904254600000610
wherein, the liquid crystal display device comprises a liquid crystal display device,
Figure BDA00027904254600000611
representative camera c i And c j The pedestrian shifts with k barrel time difference, epsilon and alpha are super parameters;
s53, calculating WiFi space-time probability for the WiFi space-time distribution obtained in the step S4:
Figure BDA00027904254600000612
wherein, the liquid crystal display device comprises a liquid crystal display device,
Figure BDA00027904254600000613
representing camera c' i And c' j The MAC information is migrated according to the time difference of k' sub-barrels, and mu and gamma are super parameters;
s54, fusing the visual probability, the image space-time probability and the WiFi space-time probability through the following formula:
Figure BDA0002790425460000071
s55, utilize Pr fuse And (3) rearranging the original pedestrian re-identification sequencing result obtained in the step (S2).
Compared with the prior art, the invention has the following advantages and effects:
the thought of merging multiple modes to conduct rearrangement is an effective measure for reducing search space, has good popularization value, and has reference function on the detection, tracking and retrieval of suspected targets in massive monitoring video big data. Compared with the traditional pedestrian re-identification method based on the visual characteristics of the appearance of the human body, the method of the invention has the following advantages and positive effects:
(1) According to the method, the pedestrian migration in the monitoring equipment is skillfully utilized, the WiFi information sent by the mobile terminal equipment is migrated, the image space-time probability and the WiFi space-time probability are introduced to jointly measure the matching probability of various pedestrians on the basis of the original visual probability, and the reliability of the pedestrian re-identification result is remarkably improved;
(2) The method introduces the image and WiFi space-time information, and the space-time information is not influenced by the shooting environments such as illumination, visual angles and the like, so that the defect that the traditional pedestrian re-identification method based on visual characteristics is sensitive to the shooting environments is effectively overcome.
Drawings
FIG. 1 is a flow chart of a multi-modality fused unsupervised pedestrian re-identification rearrangement method of the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Example 1
In practice, if the pedestrian image recognition results of the respective monitoring devices on the travel path are regarded as a whole, there should be a strong space-time dependency relationship therebetween. For example, the same pedestrian cannot appear in different monitoring devices with a physical position at the same time, the time difference of the pedestrian appearing in the different monitoring devices must have a reasonable relationship with the distance between the monitoring devices and the traveling speed of the pedestrian in common sense, and the time of the pedestrian appearing in the rear monitoring device on the traveling path should not be earlier than the front monitoring. However, the spatial-temporal dependency relationship related to the pedestrian image needs to know in advance which pedestrian images are migrated across cameras, and more noise is brought to the spatial-temporal dependency relationship related to the pedestrian image constructed under an unsupervised condition.
Meanwhile, the pedestrian-related WiFi information also has strong time-space dependent information. Further, wiFi information has a natural advantage for unsupervised-we rely on the feature that each mobile terminal device has unique MAC information, which WiFi information has been migrated across cameras can be known in advance. However, when the WiFi information is collected, a lot of non-human information is collected, so that more noise exists in the time-space dependence of WiFi.
The method mainly integrates visual information related to the pedestrian image, spatial-temporal dependency relation related to the pedestrian image and WiFi spatial-temporal dependency relation related to the pedestrian, and performs secondary rearrangement on the pedestrian re-recognition result, so that the search space is reduced, the defect of sensitivity to the shooting environment when the visual features are simply relied on is overcome, and the noise influence of the spatial-temporal dependency relation related to the pedestrian and the WiFi spatial-temporal dependency relation is reduced.
Based on the above ideas, the present embodiment discloses a multi-mode fused unsupervised pedestrian re-identification rearrangement method, which includes the following steps:
s1, acquiring multi-mode information of pedestrians during walking, wherein the multi-mode information comprises pedestrian images, camera ID information acquired by the pedestrian images, time information acquired by the pedestrian images, and WiFi information captured by the pedestrians during passing through each camera;
s2, extracting relevant characteristics of pedestrians from the pedestrian images obtained in the step S1 by using a convolutional neural network model, calculating visual similarity between the pedestrian images, and then sequencing to obtain an original pedestrian re-identification sequencing result;
s3, carrying out statistical calculation on the camera ID information acquired by the pedestrian image and the time information acquired by the pedestrian image obtained in the step S1, and constructing image space-time distribution;
s4, carrying out statistical calculation on the WiFi information captured by the pedestrians passing through each camera obtained in the step S1, and constructing WiFi space-time distribution;
s5, further re-fusing the visual similarity obtained in the step S2, the spatial-temporal distribution of the images obtained in the step S3 and the WiFi spatial-temporal distribution obtained in the step S4, and performing secondary rearrangement on the original pedestrian re-identification sequencing result obtained in the step S2.
In this embodiment, the specific implementation process of the foregoing step S1 is as follows:
s11, firstly, acquiring a pedestrian image from a monitoring video acquired by a certain road section crossing the camera equipment.
For example, the method for acquiring the pedestrian image from the monitoring video may be that firstly, the monitoring video acquired by the camera device is divided into video frames of one frame and one frame, and then pedestrian detection is performed on the video frames through a pedestrian detection algorithm. The pedestrian detection algorithm can adopt an SSD algorithm or a Faster RCNN algorithm, and can achieve the purpose of acquiring a pedestrian image in a frame of video frame. The embodiment of the invention does not limit the pedestrian detection algorithm, and the person skilled in the art can select the pedestrian detection algorithm according to actual conditions.
S12, recording time information displayed in the monitoring video when each pedestrian image is acquired while acquiring the pedestrian image, and taking the time information as the time information of the pedestrian image acquired under the camera equipment; the camera device ID information is recorded as spatial information of the pedestrian image acquired under the camera device. The embodiment of the invention refers to the time information and the space information of the pedestrian image as the space-time information of the pedestrian image.
S13, in the moving process of pedestrians, wiFi signal sent by the mobile terminal when the pedestrians pass through each camera is collected by utilizing a WiFi collector arranged near the camera. The WiFi information comprises unique MAC information of the mobile terminal equipment, and the time information of the WiFi information captured is used as the time information of the WiFi information captured near the camera; the camera device ID information of the WiFi information captured at a certain camera device is used as the spatial information of the WiFi information. In this embodiment, the time information and the space information of the WiFi information are collectively referred to as the space-time information of the WiFi information.
Illustratively, the WiFi collector in the embodiment of the invention is developed on a development board of Haishi kylin HiKey970, and a person skilled in the art can independently develop the WiFi collector according to actual situations.
S14, dividing the pedestrian image acquired in the S11 into a query set and a candidate set.
For example, the partitioning method may be random partitioning, for example, 30% of the pedestrian images may be used as the query set, and the remaining pedestrian images may be used as the candidate set. The method for acquiring the query set and the candidate set is not limited, and can be determined according to the needs.
In this embodiment, the specific implementation process of the foregoing step S2 is as follows:
and (3) extracting relevant characteristics of pedestrians from the pedestrian image obtained in the step (S1) by using a convolutional neural network model, and calculating visual similarity between pedestrian images by using cosine similarity. And performing descending order sequencing on the acquired visual similarity to obtain an original pedestrian re-identification sequencing result. However, this method, which relies solely on visual features, has the disadvantage of being sensitive to the imaging environment. The subsequent step rearranges the re-identification and ordering result of the original pedestrians.
Illustratively, the convolutional neural network model this embodiment employs a ResNet-50 model, and the network structure of the ResNet-50 model is specifically as follows:
the input layer is connected with the output layer in turn: the method comprises the steps of (1) adding a layer of a flexible material, (C) and (B) to the layer of the flexible material, and (C) adding a layer of a flexible material (C) to the layer of the flexible material, wherein the layer of the flexible material is a flexible material, and the (C) is a flexible material, and is a flexible material for the (C) and (B) flexible material, and a flexible layer of the (C) and (B) flexible material.
In this embodiment, the specific implementation process of the foregoing step S3 is as follows:
s31, splicing the visual similarity between the pedestrian images obtained in the step S2 into a visual similarity matrix, wherein the matrix size is Q multiplied by P. Wherein Q is the number of pedestrian images in the query set, and P is the number of pedestrian images in the candidate set.
S32, sorting the visual similarity matrix obtained in S31 in a descending order according to rows, and further selecting K images with the largest similarity for each row to finally obtain a Q multiplied by K screening matrix.
S33, in the screening matrix, each row is temporarily treated as the same person, and then image space-time distribution calculation is carried out.
According to the embodiment of the invention, the time difference of barrel division of the same person moving across the camera is calculated by utilizing the space-time information of the pedestrian image obtained in the step S1:
Figure BDA0002790425460000121
wherein t is i And t j Respectively denoted by c i And c j Time, t, of each of two pedestrian images appearing under camera interval Representing the buckets of the time differences migrated across the cameras. The purpose of binning the migration time differences is to make the final statistics smoother. Then, the calculated value is subjected to frequency statistics:
Figure BDA0002790425460000122
wherein, the liquid crystal display device comprises a liquid crystal display device,
Figure BDA0002790425460000123
is->
Figure BDA0002790425460000124
The sum of the frequency, i is any one of the barrel time differences,/>
Figure BDA0002790425460000125
Statistics of occurrence frequency for any one sub-bucket time difference and +.>
Figure BDA0002790425460000126
Is the sum of all frequency statistics. Next, on a two-dimensional rectangular coordinate axis, to
Figure BDA0002790425460000127
Is a horizontal axis, in->
Figure BDA0002790425460000128
And drawing a dot connecting line for the vertical axis to obtain the space-time distribution of the image.
In this embodiment, the specific implementation process of the foregoing step S4 is as follows:
according to the embodiment of the invention, the MAC information and the space-time information of the WiFi information obtained in the step S1 are utilized to calculate the barrel time difference of the same MAC information in the migration of the cross-camera equipment:
Figure BDA0002790425460000129
wherein t is i And t j Respectively denoted by c i And c j Time, t ', of each of two pieces of WiFi information appearing under camera' interval Representing the buckets of the time differences migrated across the cameras. The purpose of binning the migration time differences is to make the final statistics smoother. Then, the calculated value is subjected to frequency statistics:
Figure BDA0002790425460000131
wherein, the liquid crystal display device comprises a liquid crystal display device,
Figure BDA0002790425460000132
is->
Figure BDA0002790425460000133
The sum of the frequency, l' is any one of the sub-bucket time differences, < >>
Figure BDA0002790425460000134
Statistics of occurrence frequency for any one sub-bucket time difference and +.>
Figure BDA0002790425460000135
Is the sum of all frequency statistics. Next, on a two-dimensional rectangular coordinate axis, to
Figure BDA0002790425460000136
Is a horizontal axis, in->
Figure BDA0002790425460000137
And (5) drawing a point connecting line for the longitudinal axis to obtain WiFi space-time distribution.
In this embodiment, the specific implementation process of the foregoing step S5 is as follows:
s51, calculating the visual probability of the visual similarity obtained in the step S2:
Figure BDA0002790425460000138
wherein, the liquid crystal display device comprises a liquid crystal display device,
Figure BDA0002790425460000139
representative camera c i And c j V respectively obtained m And v n Visual similarity of two images, delta and beta are super parameters. In the present embodiment, δ and β are set to 5 and 1, respectively. The embodiment of the invention does not limit the specific delta and beta, and can be set according to the actual situation.
S52, calculating the space-time probability of the image for the space-time distribution of the image obtained in the step S3:
Figure BDA00027904254600001310
wherein, the liquid crystal display device comprises a liquid crystal display device,
Figure BDA00027904254600001311
representative camera c i And c j The pedestrian shifts with k barrel time difference, epsilon and alpha are super parameters. In this embodiment, ε and α are both set to 10. The embodiment of the invention does not limit the specific epsilon and alpha, and can be set according to the actual situation.
S53, calculating WiFi space-time probability for the WiFi space-time distribution obtained in the step S4:
Figure BDA00027904254600001312
wherein, the liquid crystal display device comprises a liquid crystal display device,
Figure BDA00027904254600001313
representing camera c' i And c' j The frequency of migration of the MAC information is carried out by the time difference of k' in the barrel, and mu and gamma are super parameters. In this embodiment, μ and γ are set to 1 and 10, respectively. The embodiment of the invention does not limit the specific mu and gamma, and can be set according to the actual situation.
S54, fusing the visual probability, the image space-time probability and the WiFi space-time probability through the following formula:
Figure BDA0002790425460000141
s55, utilize Pr fuse And (3) rearranging the original pedestrian re-identification sequencing result obtained in the step (S2). The method integrates the multi-mode information to carry out secondary rearrangement, is an effective measure for reducing the search space, and not only overcomes the defect that the camera shooting environment is purely dependent on visual characteristicsAnd the sensitivity defect also reduces the noise influence of the pedestrian correlation space-time dependency and the WiFi space-time dependency.
Example two
The present embodiment also provides a computer storage medium storing computer executable instructions that can perform the fused multi-modality unsupervised pedestrian re-identification rearrangement method of the first embodiment. Wherein the storage medium may be a magnetic Disk, an optical Disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a Flash Memory (Flash Memory), a Hard Disk (HDD), or a Solid State Drive (SSD); the storage medium may also comprise a combination of memories of the kind described above.
It is apparent that the above examples are given by way of illustration only and are not limiting of the embodiments. Other variations or modifications of the above teachings will be apparent to those of ordinary skill in the art. It is not necessary here nor is it exhaustive of all embodiments. While still being apparent from variations or modifications that may be made by those skilled in the art are within the scope of the invention.

Claims (5)

1. The multi-mode integrated unsupervised pedestrian re-identification rearrangement method is characterized by comprising the following steps of:
s1, acquiring multi-mode information of a pedestrian when the pedestrian walks, wherein the multi-mode information comprises a pedestrian image, camera ID information acquired by the pedestrian image, time information acquired by the pedestrian image and WiFi information captured by the pedestrian when the pedestrian passes through each camera;
s2, extracting relevant characteristics of pedestrians from the pedestrian images obtained in the step S1 by using a convolutional neural network model, calculating visual similarity between the pedestrian images, and then sequencing to obtain an original pedestrian re-identification sequencing result;
s3, carrying out statistical calculation on the camera ID information acquired by the pedestrian image and the time information acquired by the pedestrian image obtained in the step S1, and constructing image space-time distribution;
s4, carrying out statistical calculation on the WiFi information captured by the pedestrians passing through each camera obtained in the step S1, and constructing WiFi space-time distribution; the process of the step S4 is as follows:
s41, calculating a barrel time difference of the same MAC information in the migration of the cross-camera equipment by utilizing the unique MAC information of the mobile terminal equipment, the time information of capturing the WiFi information and the camera equipment ID information of capturing the WiFi information in a certain camera equipment, which are obtained in the step S13:
Figure FDA0004181131350000011
wherein t is i And t j Respectively denoted by c i And c j Time, t ', of each of two pieces of WiFi information appearing under camera' interval A sub-bucket representing a difference in migration time across the camera;
s42, carrying out frequency statistics on the calculated value:
Figure FDA0004181131350000012
wherein, the liquid crystal display device comprises a liquid crystal display device,
Figure FDA0004181131350000013
is->
Figure FDA0004181131350000014
The statistical sum of occurrence frequency, l' is any one of the sub-barrel time differences,/>
Figure FDA0004181131350000015
Statistics of occurrence frequency for any one sub-bucket time difference and +.>
Figure FDA0004181131350000016
For all frequenciesSum of sub-statistics;
s43, on a two-dimensional rectangular coordinate axis
Figure FDA0004181131350000021
Is a horizontal axis, in->
Figure FDA0004181131350000022
Drawing a point connecting line for a longitudinal axis to obtain WiFi space-time distribution;
s5, re-fusing the visual similarity obtained in the step S2, the spatial-temporal distribution of the images obtained in the step S3 and the WiFi spatial-temporal distribution obtained in the step S4, and re-rearranging the original pedestrian re-identification sequencing result obtained in the step S2; the process of the step S5 is as follows:
s51, calculating visual probability according to the visual similarity obtained in the step S2:
Figure FDA0004181131350000023
wherein, the liquid crystal display device comprises a liquid crystal display device,
Figure FDA0004181131350000024
representative camera c i And c j V respectively obtained m And v n Visual similarity of the two images, delta and beta are super parameters;
s52, calculating the space-time probability of the image according to the space-time distribution of the image obtained in the step S3:
Figure FDA0004181131350000025
wherein, the liquid crystal display device comprises a liquid crystal display device,
Figure FDA0004181131350000026
representative camera c i And c j The pedestrian shifts with k barrel time difference, epsilon and alpha are super parameters;
s53, calculating WiFi space-time probability for the WiFi space-time distribution obtained in the step S4:
Figure FDA0004181131350000027
wherein, the liquid crystal display device comprises a liquid crystal display device,
Figure FDA0004181131350000028
representing camera c' i And c' j The MAC information is migrated according to the time difference of k' sub-barrels, and mu and gamma are super parameters;
s54, fusing the visual probability, the image space-time probability and the WiFi space-time probability through the following formula:
Figure FDA0004181131350000029
s55, utilize Pr fuse And (3) rearranging the original pedestrian re-identification sequencing result obtained in the step (S2).
2. The method for re-identification and rearrangement of an unsupervised pedestrian fused with multiple modes according to claim 1, wherein the procedure of the step S1 is as follows:
s11, acquiring a pedestrian image from a monitoring video acquired by a specified road section crossing camera equipment, dividing the monitoring video acquired by the camera equipment into video frames of one frame by one frame, and then carrying out pedestrian detection on the video frames by an SSD pedestrian detection algorithm or a fast RCNN pedestrian detection algorithm;
s12, recording time information displayed in the monitoring video when each pedestrian image is acquired while acquiring the pedestrian image, and taking the time information as the time information of the pedestrian image acquired under the camera equipment; recording the ID information of the camera equipment as the space information of the pedestrian image acquired under the camera equipment;
s13, collecting WiFi signals sent by the mobile terminal when pedestrians pass through each camera by utilizing a WiFi collector arranged near the camera in the moving process of the pedestrians, wherein the WiFi information comprises unique MAC information of the mobile terminal equipment, time information of capturing the WiFi information and camera equipment ID information of capturing the WiFi information in a certain camera equipment;
s14, dividing the pedestrian image acquired in the step S11 into a query set and a candidate set randomly or proportionally.
3. The method for re-identifying and rearranging the unsupervised pedestrians in a multi-mode fusion manner according to claim 1, wherein in the step S2, relevant features of the pedestrians are extracted by using a res net-50 convolutional neural network model, visual similarity among pedestrian images is calculated by using cosine similarity, the obtained visual similarity is ordered in a descending order, and an original pedestrian re-identifying and ordering result is obtained, wherein the network structure of the res net-50 convolutional neural network model is sequentially connected from an input layer to an output layer: the method comprises the steps of (1) adding a layer of a flexible material, (C) and (B) to the layer of the flexible material, and (C) adding a layer of a flexible material (C) to the layer of the flexible material, wherein the layer of the flexible material is a flexible material, and the (C) is a flexible material, and is a flexible material for the (C) and a flexible material (L) and (B) and (C) is a flexible material.
4. The method for re-identification and rearrangement of an unsupervised pedestrian fused with multiple modes according to claim 1, wherein the procedure of the step S3 is as follows:
s31, splicing the visual similarity between the pedestrian images obtained in the step S2 into a visual similarity matrix, wherein the matrix size is Q multiplied by P, Q is the number of pedestrian images in the query set, and P is the number of pedestrian images in the candidate set;
s32, sorting the visual similarity matrixes according to a descending order of rows, and selecting K images with the largest similarity for each row to obtain a Q multiplied by K screening matrix;
s33, regarding each row in the screening matrix as the same person, and then performing image space-time distribution calculation.
5. The method for re-recognition rearrangement of an unsupervised pedestrian fused with multiple modes according to claim 4, wherein the procedure of step S33 is as follows:
s331, calculating barrel time difference of the same person moving across cameras by using time information and space information of the pedestrian image obtained in the step S1:
Figure FDA0004181131350000051
wherein t is i And t j Respectively denoted by c i And c j Time, t, of each of two pedestrian images appearing under camera interval A sub-bucket representing a difference in migration time across the camera;
s332, carrying out frequency statistics on the calculated value:
Figure FDA0004181131350000052
wherein, the liquid crystal display device comprises a liquid crystal display device,
Figure FDA0004181131350000053
is->
Figure FDA0004181131350000054
The statistical sum of occurrence frequency, i is any one of the sub-bucket time differences, < >>
Figure FDA0004181131350000055
Statistics of occurrence frequency for any one sub-bucket time difference and +.>
Figure FDA0004181131350000056
Sum of all frequency statistics;
s333, on a two-dimensional rectangular coordinate axis
Figure FDA0004181131350000057
Is a horizontal axis, in->
Figure FDA0004181131350000058
And drawing a dot connecting line for the vertical axis to obtain the space-time distribution of the image.
CN202011313048.XA 2020-11-20 2020-11-20 Multi-mode-fused unsupervised pedestrian re-identification rearrangement method Active CN112381024B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011313048.XA CN112381024B (en) 2020-11-20 2020-11-20 Multi-mode-fused unsupervised pedestrian re-identification rearrangement method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011313048.XA CN112381024B (en) 2020-11-20 2020-11-20 Multi-mode-fused unsupervised pedestrian re-identification rearrangement method

Publications (2)

Publication Number Publication Date
CN112381024A CN112381024A (en) 2021-02-19
CN112381024B true CN112381024B (en) 2023-06-23

Family

ID=74584553

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011313048.XA Active CN112381024B (en) 2020-11-20 2020-11-20 Multi-mode-fused unsupervised pedestrian re-identification rearrangement method

Country Status (1)

Country Link
CN (1) CN112381024B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113111778B (en) * 2021-04-12 2022-11-15 内蒙古大学 Large-scale crowd analysis method with video and wireless integration

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109977917A (en) * 2019-04-09 2019-07-05 中通服公众信息产业股份有限公司 A kind of pedestrian of unsupervised shift learning recognition methods and system again
CN110263697A (en) * 2019-06-17 2019-09-20 哈尔滨工业大学(深圳) Pedestrian based on unsupervised learning recognition methods, device and medium again
CN111160297A (en) * 2019-12-31 2020-05-15 武汉大学 Pedestrian re-identification method and device based on residual attention mechanism space-time combined model
CN111178284A (en) * 2019-12-31 2020-05-19 珠海大横琴科技发展有限公司 Pedestrian re-identification method and system based on spatio-temporal union model of map data
CN111444758A (en) * 2019-12-26 2020-07-24 珠海大横琴科技发展有限公司 Pedestrian re-identification method and device based on spatio-temporal information

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109977917A (en) * 2019-04-09 2019-07-05 中通服公众信息产业股份有限公司 A kind of pedestrian of unsupervised shift learning recognition methods and system again
CN110263697A (en) * 2019-06-17 2019-09-20 哈尔滨工业大学(深圳) Pedestrian based on unsupervised learning recognition methods, device and medium again
CN111444758A (en) * 2019-12-26 2020-07-24 珠海大横琴科技发展有限公司 Pedestrian re-identification method and device based on spatio-temporal information
CN111160297A (en) * 2019-12-31 2020-05-15 武汉大学 Pedestrian re-identification method and device based on residual attention mechanism space-time combined model
CN111178284A (en) * 2019-12-31 2020-05-19 珠海大横琴科技发展有限公司 Pedestrian re-identification method and system based on spatio-temporal union model of map data

Also Published As

Publication number Publication date
CN112381024A (en) 2021-02-19

Similar Documents

Publication Publication Date Title
CN104301630B (en) A kind of video image joining method and device
CN110298226B (en) Cascading detection method for millimeter wave image human body carried object
CN109800624A (en) A kind of multi-object tracking method identified again based on pedestrian
CN108564598B (en) Improved online Boosting target tracking method
CN102495998B (en) Static object detection method based on visual selective attention computation module
CN111383244B (en) Target detection tracking method
Li et al. Deep people counting with faster R-CNN and correlation tracking
CN106504274A (en) A kind of visual tracking method and system based under infrared camera
Hu et al. Parallel spatial-temporal convolutional neural networks for anomaly detection and location in crowded scenes
CN112465854A (en) Unmanned aerial vehicle tracking method based on anchor-free detection algorithm
Luo et al. A lightweight face detector by integrating the convolutional neural network with the image pyramid
Yu et al. Key point detection by max pooling for tracking
Li et al. An efficient spatiotemporal attention model and its application to shot matching
CN112381024B (en) Multi-mode-fused unsupervised pedestrian re-identification rearrangement method
CN114821430A (en) Cross-camera target object tracking method, device, equipment and storage medium
CN110826390A (en) Video data processing method based on face vector characteristics
CN112686122B (en) Human body and shadow detection method and device, electronic equipment and storage medium
CN116824641B (en) Gesture classification method, device, equipment and computer storage medium
Mantini et al. Camera Tampering Detection using Generative Reference Model and Deep Learned Features.
Li et al. Global anomaly detection in crowded scenes based on optical flow saliency
Hartung et al. Improvement of persistent tracking in wide area motion imagery by CNN-based motion detections
CN115393788A (en) Multi-scale monitoring pedestrian re-identification method based on global information attention enhancement
CN110322471B (en) Method, device and equipment for concentrating panoramic video and storage medium
Guo et al. Geospatial object detection with single shot anchor-free network
Li et al. Application of thermal infrared imagery in human action recognition

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant