CN110807434A - Pedestrian re-identification system and method based on combination of human body analysis and coarse and fine particle sizes - Google Patents

Pedestrian re-identification system and method based on combination of human body analysis and coarse and fine particle sizes Download PDF

Info

Publication number
CN110807434A
CN110807434A CN201911078998.6A CN201911078998A CN110807434A CN 110807434 A CN110807434 A CN 110807434A CN 201911078998 A CN201911078998 A CN 201911078998A CN 110807434 A CN110807434 A CN 110807434A
Authority
CN
China
Prior art keywords
pedestrian
human body
module
image
video data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911078998.6A
Other languages
Chinese (zh)
Other versions
CN110807434B (en
Inventor
陈彬
赵聪聪
白雪峰
于水
胡明亮
朴铁军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Weihai Ruowi Information Technology Co Ltd
Original Assignee
Weihai Ruowi Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Weihai Ruowi Information Technology Co Ltd filed Critical Weihai Ruowi Information Technology Co Ltd
Priority to CN201911078998.6A priority Critical patent/CN110807434B/en
Publication of CN110807434A publication Critical patent/CN110807434A/en
Application granted granted Critical
Publication of CN110807434B publication Critical patent/CN110807434B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/103Static body considered as a whole, e.g. static pedestrian or occupant recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Library & Information Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Databases & Information Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

A pedestrian re-recognition system based on human body analysis coarse and fine granularity combination comprises a parameter pre-training initialization module, a monitoring video data reading module, a video image analysis module, a pedestrian feature extraction module, a human body re-recognition model loading module and a user retrieval module; the parameter pre-training initialization module is used for carrying out parameter pre-training initialization network in the public data set to obtain a pedestrian re-identification network model; the monitoring video data reading module is used for uploading and reading video data and sending the video data to the video image analysis module; compared with the prior art, the invention has the beneficial effects that: in the aspect of human body analysis, the neural network model is designed in a mode of combining the granularity of the human body, and the human body semantics of different levels are emphasized, so that the pedestrian features with more discriminative power are extracted, and the accuracy is improved; and in addition, a loss function is designed by combining a knowledge distillation idea, the recognition time of pedestrian re-recognition is effectively reduced by optimizing the training of the network, and the efficiency is improved.

Description

Pedestrian re-identification system and method based on combination of human body analysis and coarse and fine particle sizes
Technical Field
The invention relates to the field of pedestrian re-identification, in particular to a pedestrian re-identification system based on human body analysis thickness and granularity combination.
Background
In the case of massive videos, the traditional method of analyzing videos manually consumes manpower, and meanwhile, the long-time observation easily causes visual fatigue of workers to cause certain errors. In order to solve the problems in the conventional manual search, people pay attention to how to accurately and efficiently complete the retrieval of interested pedestrians from the mass video by means of a computer vision technology, and therefore, a pedestrian Re-Identification (Person Re-Identification) technology in computer vision is used to assist or even replace a worker to analyze pedestrians in videos under different cameras.
The pedestrian re-identification aims at retrieving the same pedestrians from given images with different camera view angles in a centralized mode, and is different from a face identification technology which needs the cooperation of people to be identified and requires high-quality pictures.
The current pedestrian re-identification research work mainly aims at extracting features of a pedestrian picture to obtain robust features capable of coping with complex changes of different camera scenes so as to realize accurate matching of target pedestrians. The traditional pedestrian re-identification method is mainly researched in two aspects: 1) characteristic representation learning, namely dealing with appearance changes of pedestrians under different camera viewing angles by designing characteristic representation with certain invariance to the identity of the pedestrians; 2) and (4) metric learning, namely mapping the high-dimensional features to a new feature space through learning, so that the same person feature distance is closer and the feature distances of different persons are farther in the new feature space. In 2014, with the introduction of deep learning into the field of pedestrian re-identification by researchers, feature representation learning and metric learning can be jointly optimized in an end-to-end mode through a convolutional neural network, the performance exceeds that of a traditional method, and the deep learning gradually becomes a mainstream method in the field of pedestrian re-identification.
In the process that the development of pedestrian re-identification is from two stages of feature extraction and metric learning of a traditional method to end-to-end learning based on deep learning, the pedestrian re-identification technology based on deep learning adopts data driving to improve the robustness and the discrimination capability of the changing features of pedestrian pictures under different cameras through end-to-end learning. At present, pedestrian re-recognition methods based on deep learning have obtained good results on most public data sets, but because pedestrian pictures in the data sets are usually obtained by manual cutting and screening, in the pre-training process of a large data set ImageNet, the prior information of a human body structure of the existing pedestrian re-recognition technology often has large field deviation with a monitoring scene, and the error rate of pedestrian re-recognition can be increased by dividing the pedestrian pictures by using wrong prediction results; in addition, in the attention degree of the features of different areas of the image details to be recognized, the pedestrian re-recognition technology also often causes the problem of re-recognition defects due to different illumination, different camera angles and the like.
Disclosure of Invention
The invention provides a pedestrian re-identification system and method based on human body analysis combination of thickness and granularity, which can effectively enhance the accuracy and efficiency of pedestrian re-identification under different visual angles, postures and illumination changes.
A pedestrian re-recognition system based on human body analysis coarse and fine granularity combination is characterized by comprising a parameter pre-training initialization module, a monitoring video data reading module, a video image analysis module, a pedestrian feature extraction module, a human body re-recognition model loading module and a user retrieval module;
the parameter pre-training initialization module is used for carrying out parameter pre-training initialization network in the public data set to obtain a pedestrian re-identification network model;
the monitoring video data reading module is used for uploading and reading video data and sending the video data to the video image analysis module;
the user retrieval module is used for uploading a human body image to be retrieved and sending the human body image to the video image analysis module;
the video image analysis module comprises a video decoding submodule and an image preprocessing submodule, and the video decoding submodule is used for decoding the video data uploaded by the monitoring video data reading module and processing the video data into a processable image; the image preprocessing submodule is used for improving the visual effects of the image after video decoding and the human body image to be retrieved;
the pedestrian feature extraction module is used for designing a coarse-and-fine-granularity combined neural network, and learning coarse-granularity branches and fine-granularity branches in the coarse-and-fine-granularity combined neural network to respectively extract pedestrian features of the video-decoded image and the human body image to be retrieved and store the pedestrian features;
and the human body re-recognition model loading module is used for carrying out retrieval matching by using the pedestrian re-recognition network model according to the stored pedestrian characteristics and the human body image to be retrieved, and calculating to obtain the similarity.
In the foregoing technical solution, the user search module is further configured to set a similarity threshold. The pedestrian pictures with different degrees of similarity can be identified by setting the similarity threshold value, so that the identification standard is more flexible.
In the above technical solution, the human weight recognition model loading module is further configured to feed back the calculated similarity to the user retrieval module.
A method of a pedestrian re-identification system as described, comprising the steps of:
step A: performing parameter pre-training on the public data set to initialize a network to obtain a pedestrian re-identification network model;
and B: uploading and reading video data in the monitoring video data reading module; the video decoding submodule decodes the video data into a picture format which can be adopted, carries out image preprocessing on the video data, and then utilizes a designed neural network model combining coarse granularity and fine granularity, wherein the neural network model comprises coarse granularity branches and fine granularity branches; for coarse grain branches, enhancing extraction of global features by adopting a knowledge distillation loss function, and for fine grain branches, enhancing extraction of detailed features by adopting the knowledge distillation loss function and a triple loss function; the learned features are spliced to obtain a pedestrian feature set fi(ii) a Then, using SE Block to learn a feature vector importance weight W to selectively enhance the features with strong discrimination and inhibit the features with weak discrimination;
W=Sigmoid(FC(ReLU(FC(fi))))
wherein, two FC layers from inside to outside are used for compression and activation;
after the importance weight W of the pedestrian feature vector is obtained, the pedestrian feature f is output0
f0=fi*W+fi
And storing;
and C: uploading the human body image to be retrieved in the retrieval module, and calculating and outputting the pedestrian characteristics of the human body image to be retrieved by utilizing the step B;
step D: and the pedestrian re-identification network model extracts, detects and calculates the similarity of the pedestrian characteristics in the image after video decoding in a certain frame interval according to the pedestrian characteristics of the human body image to be retrieved, and stores the pedestrian characteristics and returns the pedestrian characteristics in the form of the similarity if the pedestrian characteristics are higher than a threshold value.
In the above technical solution, further, in the step B, the picture format may be JPG or PNG. And the method supports pictures with various formats and improves the adaptation range.
In the above technical solution, further, in step B, the video data comes from a monitoring camera.
According to the above technical solution, in step B, the image preprocessing refers to performing distortion processing on the image. The image quality is improved, and the influence of interference information on the extraction of pedestrian characteristics is reduced.
In the above technical solution, further, in step B, the pedestrian characteristic f0Stored in a mat file. Facilitating later queries.
In the foregoing technical solution, in step D, the FPN-Person is further used to detect the image after video decoding.
In the above technical solution, further, in step D, if the image after video decoding has been subjected to pedestrian feature extraction, the CFNet is used to extract pedestrian features from the uploaded human body image to be retrieved, and the pedestrian features appearing in the corresponding mat file are read.
Compared with the prior art, the invention has the beneficial effects that: in the aspect of human body analysis, the neural network model is designed in a mode of combining the thickness and the granularity, and the human body semantics of different levels are emphasized, so that the pedestrian features with more discriminative power are extracted, and the accuracy is improved; in addition, loss functions are designed by combining knowledge and distillation thinking, the training of the optimization network effectively reduces the recognition time of pedestrian re-recognition, and the efficiency is improved.
Drawings
Fig. 1 is a block diagram of a pedestrian re-identification system according to the present invention.
Fig. 2 is a flowchart of a method of the pedestrian re-identification system according to the present invention.
Fig. 3 is a human semantic attention diagram in the method of the pedestrian re-identification system according to the present invention.
Fig. 4 is a schematic diagram of a pedestrian re-identification network model of the pedestrian re-identification system according to the present invention.
Fig. 5 is a schematic diagram illustrating three categories of similarity information in the pedestrian re-identification system according to the present invention.
Fig. 6 is a schematic diagram illustrating a starting process of the user retrieval module in the pedestrian re-identification system according to the present invention.
Detailed Description
The following examples further describe the invention in conjunction with the accompanying drawings.
As shown in fig. 1 to 6, a pedestrian re-identification system based on human body analysis combination of coarse and fine granularity comprises a parameter pre-training initialization module, a monitoring video data reading module, a video image analysis module, a pedestrian feature extraction module, a human body re-identification model loading module and a user retrieval module;
the parameter pre-training initialization module is used for carrying out parameter pre-training initialization network in the public data set to obtain a pedestrian re-identification network model;
the pedestrian re-identification network model is used for retrieving the similarity of image features in the uploaded image and video data of the pedestrian;
the monitoring video data reading module is used for uploading and reading video data and sending the video data to the video image analysis module;
the monitoring video data reading module is responsible for managing input and output of images and video data, and comprises the steps of reading pedestrian retrieval pictures uploaded by a user, specified time periods and monitoring video data under the camera serial number.
The user retrieval module is used for uploading a human body image to be retrieved and sending the human body image to the video image analysis module; after a user uploads a pedestrian picture to be inquired and specifies a video to be contrastedly searched, and clicks a query button, the module reads and displays the pedestrian picture to be searched uploaded by the user, then reads the video data under the specified time period of the user and the serial number of the camera, and saves the processing result in a return after the system processing is finished.
The video image analysis module comprises a video decoding submodule and an image preprocessing submodule, and the video decoding submodule is used for decoding the video data uploaded by the monitoring video data reading module and processing the video data into a processable image; the image preprocessing submodule is used for improving the visual effects of the image after video decoding and the human body image to be retrieved;
video data decoding is a mature prior art, which is not described in detail in this embodiment;
the main purposes of image preprocessing are to eliminate irrelevant information in images, recover useful real information, enhance the detectability of relevant information, and simplify data to the maximum extent, thereby improving the reliability of feature extraction, image segmentation, matching and recognition.
The pedestrian feature extraction module is used for designing a coarse-and-fine-granularity combined neural network, and learning coarse-granularity branches and fine-granularity branches in the coarse-and-fine-granularity combined neural network to respectively extract pedestrian features of the video-decoded image and the human body image to be retrieved and store the pedestrian features;
and the human body re-recognition model loading module is used for carrying out retrieval matching by using the pedestrian re-recognition network model according to the stored pedestrian characteristics and the human body image to be retrieved, and calculating to obtain the similarity.
The invention relates to a pedestrian re-identification system method, which comprises the following steps:
first, the initialization network needs to be pre-trained with parameters on a large public dataset of ImageNet. The neural network model generally depends on random gradient descent for model training and parameter updating, the final performance of the network is directly related to the optimal solution obtained by convergence, and the convergence result is actually greatly dependent on the initial initialization of the network parameters. The ideal network parameter initialization makes the model training twice with half effort, on the contrary, the poor initialization scheme not only affects the network convergence, but even causes the gradient dispersion or explosion, when the parameter pre-training is initialized, the distribution of the input data is changed into Gaussian distribution by using Batch Normalization, thus ensuring that the input of each layer of neural network keeps the same distribution. This results in the gradient disappearing when propagating backwards. BN is to forcibly pull back the distribution of the input value of any neuron of each layer of neural network to the standard normal distribution of the mean value 0 and the variance 1 by a standardized means, so that the activated input value falls into a sensitive area in a nonlinear function, the gradient can be increased, the learning convergence speed is high, and the convergence speed can be greatly increased;
after the pedestrian re-identification network model is obtained, uploading and reading video data in the monitoring video data reading module; the video decoding submodule decodes the video data, processes the video data into an adoptable picture format and carries out image preprocessing on the video data, in the embodiment, the image preprocessing can be distortion processing on the image, when the image preprocessing submodule carries out image preprocessing operation, the image preprocessing submodule uses image enhancement operation to enhance useful information in the image, the image preprocessing is a distortion process, the purpose is to improve the visual effect of the image, purposefully emphasize the whole or local characteristics of the image aiming at the application occasion of a given image, changes the original unclear image into clear or emphasizes certain interesting characteristics, enlarges the difference between different object characteristics in the image, inhibits the uninteresting characteristics, improves the image quality and the information content, enhances the image interpretation and identification effects, and meets the analysis requirement;
designing a Coarse-Fine granularity combined neural network model, in order to enable a network to extract pedestrian features with different granularities, designing the Coarse-Fine granularity combined neural network model (Coarse Fine Net, CFNet), selecting ResNet-50 as a backbone network, dividing the part behind a Res Block2 convolution module into two types of branches, wherein one Branch is a Coarse granularity Branch (Coarse Branch), the other Branch is a Fine granularity Branch (Fine Branch), and the Fine granularity Branch is further divided into two sub-branches: upper and lower body branches; as shown in fig. 3, the human body analysis attention mechanism operation is performed: geometric transformation is carried out through the acquired human body analysis key points to calculate the same cross-view-angle area between two pedestrian images, and currently, popular attention maps have certain similarity, so that probability maps of 20 body parts are combined to generate human body semantic attention maps of 7 body parts in different layers: mShoes with air-permeable layer={Socks、LeftShoe、RightShoe},MHead part= {Hat、Hair、Sunglasses、Face},MUpper body={Glove、UpperClothes、Coat、Scarf、 LeftArm、RightArm},MLower body={Dress、Pants、Jumpsuits、Skirt、LeftLeg、 RightLeg},MThe upper half part=MUpper body+MHead part,MLower half=MLower body+MShoes with air-permeable layer,MWhole body= MThe upper half part+MLower half
Through the semantic attention drawings, different parts of a human body can be positioned, different layers of the convolutional neural network output have different semantic information, the human body semantic attention drawings are combined with different levels of features of the convolutional network at different stages by adopting a similar attention mechanism to enable the network to pay attention to local regions of the body, a more macroscopic semantic graph is provided at a shallow layer to capture more detailed features, higher-level semantic information is gradually provided at a deep layer to enhance capture of abstract features, and the form definition is as shown in the formula:
Fattetnion=Fi*M+Fi
where M is in the range of { M ∈ [ ]Whole body、MUpper half body、MLower body、MUpper body、MLower body、MHead part、MShoes with air-permeable layerIs a semantic attention map of different levels, FiFeature maps output for each layer of the network, FattentionFeature maps to enhance the focus on local regions;
when the model fails to output a good segmentation result when the resolution is very low, M is close to 0, and F is thusattentionIs close to Fi. By the method, bad segmentation results cannot generate negative effects, good segmentation results can provide sufficient information to improve the accuracy of identification, and the semantic attention map generated based on human body analysis is combined with the network in a manner similar to an attention mechanism, so that compared with other methods, the prior information of the human body can be fully utilized without damaging the performance of the model;
in the training process of the pedestrian re-recognition network model, a plurality of works regard the pedestrian re-recognition network model as a classification task, and a cross entropy function with a One-Hot (One-Hot) coding label is used as a loss function for training. Whereas a one-hot coded tag typically does not contain similarity information between categories.
For the task of re-identifying pedestrians, the current common method is to regard the task as a classification task in a training stage, predict the task by using a cross entropy loss function with a unique hot coding label, abandon a classification layer in a testing stage, and directly use a feature vector after a global pooling layer as the feature representation of the pedestrians to perform similarity calculation. The goals of training and testing in this way are greatly different, because the final goal of pedestrian re-identification is to distinguish the similarity of different pedestrian pictures with unknown identities rather than simple classification on the training set, the one-hot coding labels the class to which the data belongs as 1 and the other classes as 0, ignores the similarity information between pedestrian pictures and is easy to over-fit on the training set, so this approach may not be optimal. By identifying the Distillation idea, we expect to introduce more similarity information in the training phase to optimize the network training process, and further reduce the difference between training and testing, we propose a Knowledge Distillation Loss function (Knowledge Distillation Loss) to improve the cross entropy Loss function with the unique thermal coding label.
Firstly, classifying and training CFNet as a teacher model on a re-recognition data set to predict a soft label containing pedestrian image similar information, and then, re-training the model by using a knowledge distillation loss function formed by the soft label and a single hot label coding label, wherein the mathematical expression of the model is shown as the formula:
Figure BDA0002262651370000081
where H (-) is the cross entropy, ptSoft tag, p, output for teacher modelsAnd (4) outputting a standard softmax function of the student model, wherein tau is the smoothness degree of the temperature parameter control probability distribution, and α is the weight of a balance factor balance two terms.
Meanwhile, in order to enable the network to learn complementary features, different loss functions are used for different branches to learn so as to emphasize feature extraction in different aspects. For coarse grain branches, a knowledge distillation loss function is adopted to pay attention to the extraction of global features; for fine-grained branches, a knowledge distillation loss function and a triplet loss function are adopted to enhance the extraction of detailed features.
The pedestrian feature extraction process is as follows: firstly, training a backbone network (in the embodiment, Resnet50 is adopted as a backbone network) on a pedestrian re-identification data set, wherein the loss adopts cross entropy loss based on pedestrian ID; then, fusing the backbone network with the obtained pedestrian component region prediction result to obtain a pedestrian component characteristic map: performing point multiplication on the characteristic diagram of the backhaul network and the characteristic diagram of the pedestrian component prediction region; carrying out global average pooling on the feature map output by the backbone network, the pedestrian component feature map and the component region feature map to obtain global features, component region feature vectors and component visual probability; obtaining component feature weight by convolution of the component region feature vector and the component visual probability by 1 multiplied by 1, and performing dot multiplication on the component region feature vector to obtain final component local features;
the pedestrian features learned by different branches are spliced and then the features with more discrimination are highlighted through Feature Select Module (FSM) to obtain final pedestrian Feature representation; the importance of different features can be ignored by directly splicing feature vectors of different branches, and inspired by Hu and other people, the people think that elements of the learned pedestrian feature vectors have different importance degrees, the invention selects SE Block to learn an importance weight W to selectively enhance the feature with strong discrimination and inhibit the feature with weak discrimination, and the operation of the part is shown as the following formula:
W=Sigmoid(FC(ReLU(FC(fi))))
wherein two FC layers from the inside out are used for compression and activation operations. After the weight W of the importance of the feature vector is obtained, the feature f is outputoThe calculation is shown as follows:
fo=fi*W+fi
and the operation of the sum are operations among elements, and the distinguishing capability of the features is further enhanced by adding the enhanced features and the original feature vector.
In order to enable the two types of branches to pay Attention to information of different granularities of a human body, human Semantic Attention Maps (SAM) of different levels are generated through a human body analytic model, and different Semantic information is provided for guiding the learning of a network at different branches; in addition, by analyzing the defects of a cross entropy Loss function commonly used in the training process of the pedestrian re-recognition model, a knowledge distillation concept is adopted to design a knowledge distillation Loss function (KD Loss) to provide a soft label containing pedestrian identity similarity information for a network to optimize the training of the model, and meanwhile, in order to enable pedestrian features of two types of branch learning to be complementary as much as possible, for a coarse-granularity branch, a triple Loss function is only used for supervision to perform extraction of the global features, and for a fine-granularity branch, a triple Loss function (triple Loss) and the knowledge distillation Loss function are used for joint supervision to enhance the attention of the network to the fine-granularity features. Fig. 4 is a schematic diagram of a pedestrian re-identification network model based on human body analysis coarse and fine granularity combination.
Uploading the human body image to be retrieved in the retrieval module, and calculating and outputting the pedestrian characteristics of the human body image to be retrieved by utilizing the step B;
and the pedestrian re-identification network model detects the pedestrian features appearing in the video data according to certain frame intervals by using FPN-Person according to the pedestrian features of the human body image to be retrieved.
After the detection is completed, for the monitoring video data which is inquired for the first time, the CFNet is used for simultaneously extracting the characteristics of the pedestrian picture to be searched uploaded by the user and the pedestrian picture obtained by detection, and the extracted characteristics are stored in the mat file so as to be convenient for the later inquiry.
For the monitoring video data with extracted features, only the CFNet is needed to extract the features of the pedestrian picture to be searched uploaded by the user, and then the pedestrian features appearing in the corresponding mat file video are directly read.
Calculating the similarity between the features of the pedestrian to be retrieved and the features of the detected pedestrian pictures, and sorting the pedestrian pictures with the similarity larger than a given threshold value according to the similarity and returning the pedestrian pictures to the user;
for similarity calculation, as shown in fig. 5, when we train a model to perform three classification tasks of car, horse and zebra, we usually have labels [1,0,0], [0,1,0], [0,0,1], and a trained network predictor is usually a probability distribution generated by a Softmax function, whose basic form is shown as follows:
Figure BDA0002262651370000101
wherein z is a logits value output by the last layer of the network, and p is a probability value of the corresponding category after being processed by the softmax function.
The network may have a predicted probability distribution for the classes of the car prediction of fig. 5a) of 0.95,0.03,0.02, for the prediction probability distribution of the horse of fig. 5b) of 0.06,0.73,0.21, and for the prediction probability distribution of the zebra of fig. 5c) of 0.09,0.19,0.72, from the predicted probability distribution of fig. 5b) it can be seen that the picture has a probability of 0.21 being zebra, a probability of 0.06 being car, indicating that the zebra is more horse-like than the car, this prediction value containing information of the similarity between the classes. The final purpose of the pedestrian re-identification task is to compare pedestrian image feature similarity information for identification, the analysis shows that similar information among pedestrian identities can be ignored when training is carried out by using the unique hot code, and a label containing the pedestrian similarity information is introduced by using the idea of knowledge distillation for optimization of network training and feature extraction.
The user retrieval module is responsible for interaction of user query, and comprises functions of uploading pictures of pedestrians to be retrieved, designating time periods and camera numbers, and displaying and browsing retrieval results. The user can select the picture of the pedestrian needing to be searched for uploading, appoint the time period and the number of the camera needing to be searched, and finally check and browse the search result returned by the system, wherein the implementation flow chart of the module is shown in FIG. 5;
a user selects a pedestrian picture needing to be inquired through a Choose File button and then the pedestrian picture is read in and displayed by the input and output module.
And then selecting a camera number to be inquired from the camera list, and specifying a time period to be inquired in the time input box.
After the inquiry button is clicked, the monitoring video reading module reads video data under a specified time period and a camera number and sends the video data to the video image analysis module, the pedestrian feature extraction module and the human body weight recognition model loading module;
and finally, displaying the result returned by the human body weight recognition model loading module on a retrieval result display interface for a user to browse, and displaying pedestrian pictures which are arranged in the front 30 bits from large to small according to the similarity and have the similarity larger than a given threshold value in a search library under the condition that the user specifies the number of the camera as a retrieval result.
The present invention is not limited to the above-described embodiments, and those skilled in the art can make various changes within the knowledge of the person skilled in the art without departing from the spirit of the present invention.

Claims (10)

1. A pedestrian re-recognition system based on human body analysis coarse and fine granularity combination is characterized by comprising a parameter pre-training initialization module, a monitoring video data reading module, a video image analysis module, a pedestrian feature extraction module, a human body re-recognition model loading module and a user retrieval module;
the parameter pre-training initialization module is used for carrying out parameter pre-training initialization network in the public data set to obtain a pedestrian re-identification network model;
the monitoring video data reading module is used for uploading and reading video data and sending the video data to the video image analysis module;
the user retrieval module is used for uploading a human body image to be retrieved and sending the human body image to the video image analysis module;
the video image analysis module comprises a video decoding submodule and an image preprocessing submodule, and the video decoding submodule is used for decoding the video data uploaded by the monitoring video data reading module and processing the video data into a processable image; the image preprocessing submodule is used for improving the visual effects of the image after video decoding and the human body image to be retrieved;
the pedestrian feature extraction module is used for designing a coarse-and-fine-granularity combined neural network, and learning coarse-granularity branches and fine-granularity branches in the coarse-and-fine-granularity combined neural network to respectively extract pedestrian features of the video-decoded image and the human body image to be retrieved and store the pedestrian features;
and the human body weight recognition model loading module is used for carrying out retrieval matching by using the pedestrian weight recognition network model according to the stored pedestrian characteristics and the human body image to be retrieved, and calculating to obtain the similarity.
2. The pedestrian re-identification system based on human body analysis combination of coarse and fine granularity according to claim 1, wherein the user retrieval module is further configured to set a similarity threshold.
3. The pedestrian re-identification system based on human body analysis combination of coarse and fine granularity according to claim 1, wherein the human body weight identification model loading module is further configured to feed back the calculated similarity to the user retrieval module.
4. A method of a pedestrian re-identification system according to claim 1, comprising the steps of:
step A: performing parameter pre-training on the public data set to initialize a network to obtain a pedestrian re-identification network model;
and B: uploading and reading video data in the monitoring video data reading module; the video decoding submodule decodes the video data into a picture format which can be adopted, carries out image preprocessing on the video data, and then utilizes a designed neural network model combined by coarse granularity and fine granularity, wherein the neural network model comprises coarse granularity branches and fine granularity branches; for coarse grain branching, the pair is enhanced by a knowledge distillation loss functionExtracting global features, namely for fine-grained branches, enhancing the extraction of detailed features by adopting a knowledge distillation loss function and a triple loss function; the learned features are spliced to obtain a pedestrian feature set fi(ii) a Then, learning a feature vector importance weight W by using SEBlock to selectively enhance the features with strong discrimination and inhibit the features with weak discrimination;
W=Sigmoid(FC(ReLU(FC(fi))))
wherein, two FC layers from inside to outside are used for compression and activation;
after the importance weight W of the pedestrian feature vector is obtained, the pedestrian feature f is output0
f0=fi*W+fi
And storing;
and C: uploading the human body image to be retrieved in the retrieval module, and calculating and outputting the pedestrian characteristics of the human body image to be retrieved by utilizing the step B;
step D: and the pedestrian re-identification network model extracts, detects and calculates the similarity of the pedestrian characteristics in the image after video decoding in a certain frame interval according to the pedestrian characteristics of the human body image to be retrieved, and stores the pedestrian characteristics and returns the pedestrian characteristics in the form of the similarity if the pedestrian characteristics are higher than a threshold value.
5. The method of pedestrian re-identification system according to claim 4, wherein in step B, the picture format is JPG, PNG.
6. The method of pedestrian re-identification system of claim 4 wherein in step B, said video data is from a surveillance camera.
7. The method of pedestrian re-identification system according to claim 4, wherein in step B, said image preprocessing is a distortion processing of the image.
8. A method of pedestrian re-identification system according to claim 4, which isCharacterized in that, in step B, the pedestrian feature f0Stored in a mat file.
9. The method of pedestrian re-identification system as claimed in claim 4, wherein in the step D, the video-decoded image is detected using FPN-Person.
10. The method of pedestrian re-identification system according to claim 8, wherein in step D, if the image after video decoding has been subjected to pedestrian feature extraction, the pedestrian features are extracted from the uploaded human body image to be retrieved by using CFNet, and the pedestrian features appearing in the corresponding mat file are read.
CN201911078998.6A 2019-11-06 2019-11-06 Pedestrian re-recognition system and method based on human body analysis coarse-fine granularity combination Active CN110807434B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911078998.6A CN110807434B (en) 2019-11-06 2019-11-06 Pedestrian re-recognition system and method based on human body analysis coarse-fine granularity combination

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911078998.6A CN110807434B (en) 2019-11-06 2019-11-06 Pedestrian re-recognition system and method based on human body analysis coarse-fine granularity combination

Publications (2)

Publication Number Publication Date
CN110807434A true CN110807434A (en) 2020-02-18
CN110807434B CN110807434B (en) 2023-08-15

Family

ID=69501407

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911078998.6A Active CN110807434B (en) 2019-11-06 2019-11-06 Pedestrian re-recognition system and method based on human body analysis coarse-fine granularity combination

Country Status (1)

Country Link
CN (1) CN110807434B (en)

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111553205A (en) * 2020-04-12 2020-08-18 西安电子科技大学 Vehicle weight recognition method, system, medium and video monitoring system without license plate information
CN111738362A (en) * 2020-08-03 2020-10-02 成都睿沿科技有限公司 Object recognition method and device, storage medium and electronic equipment
CN111753092A (en) * 2020-06-30 2020-10-09 深圳创新奇智科技有限公司 Data processing method, model training device and electronic equipment
CN111832514A (en) * 2020-07-21 2020-10-27 内蒙古科技大学 Unsupervised pedestrian re-identification method and unsupervised pedestrian re-identification device based on soft multiple labels
CN111950411A (en) * 2020-07-31 2020-11-17 上海商汤智能科技有限公司 Model determination method and related device
CN112233776A (en) * 2020-11-09 2021-01-15 江苏科技大学 Dermatosis self-learning auxiliary judgment system based on visual asymptotic cavity network
CN113269117A (en) * 2021-06-04 2021-08-17 重庆大学 Knowledge distillation-based pedestrian re-identification method
CN113277388A (en) * 2021-04-02 2021-08-20 东南大学 Data acquisition control method for electric hanging basket
CN114049609A (en) * 2021-11-24 2022-02-15 大连理工大学 Multilevel aggregation pedestrian re-identification method based on neural architecture search
CN116052220A (en) * 2023-02-07 2023-05-02 北京多维视通技术有限公司 Pedestrian re-identification method, device, equipment and medium
CN116824695A (en) * 2023-06-07 2023-09-29 南通大学 Pedestrian re-identification non-local defense method based on feature denoising
CN116935447A (en) * 2023-09-19 2023-10-24 华中科技大学 Self-adaptive teacher-student structure-based unsupervised domain pedestrian re-recognition method and system

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105354548A (en) * 2015-10-30 2016-02-24 武汉大学 Surveillance video pedestrian re-recognition method based on ImageNet retrieval
CN109165738A (en) * 2018-09-19 2019-01-08 北京市商汤科技开发有限公司 Optimization method and device, electronic equipment and the storage medium of neural network model
CN109271895A (en) * 2018-08-31 2019-01-25 西安电子科技大学 Pedestrian's recognition methods again based on Analysis On Multi-scale Features study and Image Segmentation Methods Based on Features
CN109871821A (en) * 2019-03-04 2019-06-11 中国科学院重庆绿色智能技术研究院 The pedestrian of adaptive network recognition methods, device, equipment and storage medium again
CN109919246A (en) * 2019-03-18 2019-06-21 西安电子科技大学 Pedestrian's recognition methods again based on self-adaptive features cluster and multiple risks fusion
CN110188611A (en) * 2019-04-26 2019-08-30 华中科技大学 A kind of pedestrian recognition methods and system again introducing visual attention mechanism
CN110245592A (en) * 2019-06-03 2019-09-17 上海眼控科技股份有限公司 A method of for promoting pedestrian's weight discrimination of monitoring scene
CN110414368A (en) * 2019-07-04 2019-11-05 华中科技大学 A kind of unsupervised pedestrian recognition methods again of knowledge based distillation

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105354548A (en) * 2015-10-30 2016-02-24 武汉大学 Surveillance video pedestrian re-recognition method based on ImageNet retrieval
CN109271895A (en) * 2018-08-31 2019-01-25 西安电子科技大学 Pedestrian's recognition methods again based on Analysis On Multi-scale Features study and Image Segmentation Methods Based on Features
CN109165738A (en) * 2018-09-19 2019-01-08 北京市商汤科技开发有限公司 Optimization method and device, electronic equipment and the storage medium of neural network model
CN109871821A (en) * 2019-03-04 2019-06-11 中国科学院重庆绿色智能技术研究院 The pedestrian of adaptive network recognition methods, device, equipment and storage medium again
CN109919246A (en) * 2019-03-18 2019-06-21 西安电子科技大学 Pedestrian's recognition methods again based on self-adaptive features cluster and multiple risks fusion
CN110188611A (en) * 2019-04-26 2019-08-30 华中科技大学 A kind of pedestrian recognition methods and system again introducing visual attention mechanism
CN110245592A (en) * 2019-06-03 2019-09-17 上海眼控科技股份有限公司 A method of for promoting pedestrian's weight discrimination of monitoring scene
CN110414368A (en) * 2019-07-04 2019-11-05 华中科技大学 A kind of unsupervised pedestrian recognition methods again of knowledge based distillation

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
GEOFFREY HINTON ET AL.: "Distilling the Knowledge in a Neural Network" *
ZHONG ZHANG ET AL.: "Coarse-Fine Convolutional Neural Network for Person Re-Identification in Camera Sensor Networks" *

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111553205A (en) * 2020-04-12 2020-08-18 西安电子科技大学 Vehicle weight recognition method, system, medium and video monitoring system without license plate information
CN111553205B (en) * 2020-04-12 2022-11-15 西安电子科技大学 Vehicle weight recognition method, system, medium and video monitoring system without license plate information
CN111753092B (en) * 2020-06-30 2024-01-26 青岛创新奇智科技集团股份有限公司 Data processing method, model training method, device and electronic equipment
CN111753092A (en) * 2020-06-30 2020-10-09 深圳创新奇智科技有限公司 Data processing method, model training device and electronic equipment
CN111832514A (en) * 2020-07-21 2020-10-27 内蒙古科技大学 Unsupervised pedestrian re-identification method and unsupervised pedestrian re-identification device based on soft multiple labels
CN111832514B (en) * 2020-07-21 2023-02-28 内蒙古科技大学 Unsupervised pedestrian re-identification method and unsupervised pedestrian re-identification device based on soft multiple labels
CN111950411A (en) * 2020-07-31 2020-11-17 上海商汤智能科技有限公司 Model determination method and related device
CN111738362A (en) * 2020-08-03 2020-10-02 成都睿沿科技有限公司 Object recognition method and device, storage medium and electronic equipment
CN112233776A (en) * 2020-11-09 2021-01-15 江苏科技大学 Dermatosis self-learning auxiliary judgment system based on visual asymptotic cavity network
CN113277388A (en) * 2021-04-02 2021-08-20 东南大学 Data acquisition control method for electric hanging basket
CN113269117A (en) * 2021-06-04 2021-08-17 重庆大学 Knowledge distillation-based pedestrian re-identification method
CN113269117B (en) * 2021-06-04 2022-12-13 重庆大学 Knowledge distillation-based pedestrian re-identification method
CN114049609A (en) * 2021-11-24 2022-02-15 大连理工大学 Multilevel aggregation pedestrian re-identification method based on neural architecture search
CN114049609B (en) * 2021-11-24 2024-05-31 大连理工大学 Multi-stage aggregation pedestrian re-identification method based on neural architecture search
CN116052220B (en) * 2023-02-07 2023-11-24 北京多维视通技术有限公司 Pedestrian re-identification method, device, equipment and medium
CN116052220A (en) * 2023-02-07 2023-05-02 北京多维视通技术有限公司 Pedestrian re-identification method, device, equipment and medium
CN116824695A (en) * 2023-06-07 2023-09-29 南通大学 Pedestrian re-identification non-local defense method based on feature denoising
CN116935447A (en) * 2023-09-19 2023-10-24 华中科技大学 Self-adaptive teacher-student structure-based unsupervised domain pedestrian re-recognition method and system
CN116935447B (en) * 2023-09-19 2023-12-26 华中科技大学 Self-adaptive teacher-student structure-based unsupervised domain pedestrian re-recognition method and system

Also Published As

Publication number Publication date
CN110807434B (en) 2023-08-15

Similar Documents

Publication Publication Date Title
CN110807434A (en) Pedestrian re-identification system and method based on combination of human body analysis and coarse and fine particle sizes
Leng et al. A survey of open-world person re-identification
CN111368815B (en) Pedestrian re-identification method based on multi-component self-attention mechanism
CN111709311B (en) Pedestrian re-identification method based on multi-scale convolution feature fusion
CN108520226B (en) Pedestrian re-identification method based on body decomposition and significance detection
CN111783576B (en) Pedestrian re-identification method based on improved YOLOv3 network and feature fusion
CN104715023A (en) Commodity recommendation method and system based on video content
CN110580460A (en) Pedestrian re-identification method based on combined identification and verification of pedestrian identity and attribute characteristics
CN109829467A (en) Image labeling method, electronic device and non-transient computer-readable storage medium
CN111563452A (en) Multi-human body posture detection and state discrimination method based on example segmentation
CN112069940A (en) Cross-domain pedestrian re-identification method based on staged feature learning
CN111738048B (en) Pedestrian re-identification method
CN110298297A (en) Flame identification method and device
CN110728216A (en) Unsupervised pedestrian re-identification method based on pedestrian attribute adaptive learning
CN110413825B (en) Street-clapping recommendation system oriented to fashion electronic commerce
CN110008899B (en) Method for extracting and classifying candidate targets of visible light remote sensing image
CN113221770B (en) Cross-domain pedestrian re-recognition method and system based on multi-feature hybrid learning
CN110858276A (en) Pedestrian re-identification method combining identification model and verification model
CN111815582B (en) Two-dimensional code region detection method for improving background priori and foreground priori
Saqib et al. Intelligent dynamic gesture recognition using CNN empowered by edit distance
CN111882000A (en) Network structure and method applied to small sample fine-grained learning
CN113435329B (en) Unsupervised pedestrian re-identification method based on video track feature association learning
Yang et al. Bottom-up foreground-aware feature fusion for practical person search
Matzen et al. Bubblenet: Foveated imaging for visual discovery
CN110688512A (en) Pedestrian image search algorithm based on PTGAN region gap and depth neural network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant