CN112784674B - Cross-domain identification method of key personnel search system based on class center self-adaption - Google Patents

Cross-domain identification method of key personnel search system based on class center self-adaption Download PDF

Info

Publication number
CN112784674B
CN112784674B CN202011267881.5A CN202011267881A CN112784674B CN 112784674 B CN112784674 B CN 112784674B CN 202011267881 A CN202011267881 A CN 202011267881A CN 112784674 B CN112784674 B CN 112784674B
Authority
CN
China
Prior art keywords
pedestrian
data
image
images
cluster
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011267881.5A
Other languages
Chinese (zh)
Other versions
CN112784674A (en
Inventor
冷彪
李子涵
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beihang University
Original Assignee
Beihang University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beihang University filed Critical Beihang University
Priority to CN202011267881.5A priority Critical patent/CN112784674B/en
Publication of CN112784674A publication Critical patent/CN112784674A/en
Application granted granted Critical
Publication of CN112784674B publication Critical patent/CN112784674B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • G06V20/42Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items of sport video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Multimedia (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Molecular Biology (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Human Computer Interaction (AREA)
  • Probability & Statistics with Applications (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a cross-domain identification method of a key personnel search system based on class center self-adaptation, which comprises two stages of training and testing: the invention not only overcomes the problem that the training and testing performance of the deep learning model is sharply reduced in different data fields, but also clusters more data of the unmarked data fields through a certain learning strategy and applies the data to the training of the pedestrian re-identification model, so that the key personnel search system has better generalization performance and stronger cross-domain identification capability.

Description

Cross-domain identification method of key personnel search system based on class center self-adaption
Technical Field
The invention relates to a cross-domain identification method of a key personnel search system based on class center self-adaptation, and belongs to the field of machine learning.
Background
The rapid development of various industries is being deeply influenced by artificial intelligence, and the artificial intelligence accumulates knowledge and achievements as much as the sea of tobacco through the development of nearly 70 years and is widely applied. Under the development of the internet, big data and supercomputers, the development of AI technology also enters a new stage with characteristics of deep learning, cross-border fusion, man-machine cooperation and the like. Computer vision is the most rapidly developing and widely applied technical direction in the field of artificial intelligence.
Pedestrian re-recognition, also known as pedestrian re-recognition, refers to a technique for retrieving an image of a pedestrian crossing equipment after a monitored pedestrian image is given by using a computer vision technique, and is also regarded as a sub-problem of image retrieval. The pedestrian re-identification can make up the visual limitation of the current fixed camera, can be combined with pedestrian detection and pedestrian tracking technologies, and can be widely applied to the fields of intelligent video monitoring, intelligent security and the like. With the development of deep learning in recent years, deep learning technology is widely applied to a pedestrian re-identification task. Currently, many deep learning-based methods have improved the performance of homologous pedestrian re-identification to a very high level.
However, the performance of the deep learning models is reduced catastrophically once the deep learning models are put on a new non-homologous data set, which is a major problem when the human re-identification technology is put into practical use. For the key personnel searching system, data collected by different cameras in different scenes and different time needs to be processed, and video clips appearing in key personnel to be searched are found. Because figure shapes, illumination, environments and the like in pictures of different cameras and scenes are changed, the problem that cross-domain pedestrian re-identification is achieved when a model is generalized to different domains is solved, namely the model can be generalized to a target domain after the model is trained in a source domain, wherein data of the source domain has marking information, and data of the target domain has no marking information. The cross-domain problem arises because of inconsistencies in the distribution between the data sets, which is referred to as domain drift or domain bias. In addition, in practical application scenes, the cost of marking data in each new scene is very high, and cross-camera marking is more needed for marking the pedestrian re-identification, so that more marking cost is needed. Therefore, a cross-domain re-identification method for pedestrian data is needed to solve the above problems faced by the key person search system, and improve the usability and robustness of the system.
Disclosure of Invention
The technical problem of the invention is solved: the method overcomes the defects of the prior art, provides a key personnel search system cross-domain identification method based on class center self-adaptation, can efficiently utilize large-scale unlabeled data to train a pedestrian re-identification model, still keeps performance unattenuated in a cross-domain scene, and has high-accuracy cross-domain identification capability.
In the invention, under the background of a cross-domain pedestrian re-identification task of key personnel search, a key personnel search system with high cross-domain pedestrian re-identification capability is constructed based on a class center self-adaptive strategy and a deep learning model, and the technical problems to be solved by the application are as follows: 1. pedestrian labeling on the monitoring data is high in cost, and a general pedestrian re-identification model cannot efficiently use a large amount of data without pedestrian labeling to train the model; 2. the common key personnel search system only has better pedestrian re-identification capability in a single data source or a plurality of data sources with higher similarity, and cannot effectively solve the problem of pedestrian re-identification between different data sources with larger difference.
The technical problem to be solved by the invention is as follows: the cross-domain re-recognition method of the key personnel search system based on the class center self-adaptive strategy is provided, the problem that training and testing performance of a deep learning model in different data domains is rapidly reduced is solved, more data of unmarked data domains are clustered through a certain learning strategy and are used for training a pedestrian re-recognition model, and the key personnel search system has better generalization performance and stronger cross-domain recognition capability.
The technical scheme is as follows:
the cross-domain identification method of the key personnel search system based on class center self-adaptation comprises the following steps:
a training stage:
the method comprises the following steps: new data collection and data pre-processing
In the running process of the pedestrian monitoring equipment, the data collection module continuously collects new pedestrian video data collected by different cameras; the pedestrian video data are respectively stored according to different collection scenes, and the pedestrian video data collected in the same scene are stored in an independent database; for pedestrian video data of the same scene, after the number of new data reaches a threshold value, entering the subsequent step;
step two: pedestrian detection
Inputting pedestrian video data to be detected, detecting pedestrians for each frame of image in the pedestrian video data by using a real-time target detection model realized based on a region proposal network, and judging the positions of the pedestrians in each frame of image to obtain a pedestrian detection frame;
step three: pedestrian tracking
Extracting the features of the pedestrian images in the pedestrian detection frames of each frame of image by using a convolutional neural network as apparent features, matching the pedestrian detection frames of the previous and next frames by using the apparent features of the pedestrian images in the pedestrian detection frames between each frame of image, realizing the tracking of the same pedestrian in the continuous video frames, cutting out the matched pedestrian images according to the positions of the pedestrian detection frames, and finally obtaining a pedestrian image cluster formed by one or more continuous video frames of each pedestrian appearing in input pedestrian video data;
step four: pedestrian image cluster key frame extraction
The pedestrian image clusters obtained in the step three are formed by continuous video frames, and the pedestrian image difference of adjacent continuous video frames is very small, so that the key frame extraction algorithm is used for extracting the key frame of each pedestrian image cluster to obtain the key frame in the pedestrian image cluster, and a new non-redundant pedestrian image cluster is formed; each key frame is very similar to the frames around the key frame, and one key frame can meet the subsequent use requirement;
when extracting the key frame, firstly extracting image characteristics of the pedestrian image cluster by using a convolutional neural network, then calculating the distance between adjacent video frames according to the image characteristics, and marking the video frames with the distance larger than a threshold value as the key frames;
step five: pedestrian image clustering
Extracting features from the pedestrian image clusters obtained in the fourth step by using a convolutional neural network, obtaining a distance matrix by calculating cosine distances between the features, clustering the distance matrix by using a density-based spatial clustering algorithm with noise to obtain N classes, and assuming that N images are shared in the pedestrian image clusters for clustering, the value range of N is 1 to N;
when clustering is carried out, firstly, the minimum number m of samples belonging to a class and the minimum distance e between samples in the class are set, then, clusters are searched for each data in a distance matrix according to the minimum distance e, and for a certain sample c in the distance matrix, if the number of samples contained in a neighborhood taking c as the center e as the radius is more than m, a cluster taking c as a core object is created; then, adding the sample with the distance less than e from any sample in the cluster into the cluster, and continuously iterating the process until each sample finds the cluster or does not belong to any cluster;
step six: class center adaptive strategy-based pedestrian re-recognition training model
In order to train by using unlabeled data, the invention innovatively provides a class center projection transduction layer which is applied to a class center self-adaptive learning mechanism and is used for replacing a full connection layer used for supervised classification; the center-like projection conversion layer is improved from a full-connection layer, the projection matrix of the layer is consistent with the previous full-connection layer for labeled data, and the projection matrix is formed by simulating anchors by using respective center-like projection layers for unlabeled data;
x shapeLRepresenting a labeled dataset, χ, containing M classesURepresenting the unmarked data set, namely the data obtained by the fifth step; firstly setting a threshold value, wherein the threshold value is smaller than the threshold value, a small batch is set, and in a training turn, a small batch of data is constructed
Figure BDA0002776774610000031
Including annotated data therein
Figure BDA0002776774610000032
And unlabeled data
Figure BDA0002776774610000033
p represents the number of data marked in the small batch of data, and q represents the number of data not marked in the small batch of data;
Figure BDA0002776774610000034
randomly selected from the marked data; while
Figure BDA0002776774610000035
Randomly selecting r unmarked clusters from unmarked data, and randomly selecting s samples from each cluster to construct, namely q is r.s; it is noted that for each oneThe selected r clusters are all dynamically changed in the small batch of data; thus, after this small batch of data B is fed into the network, the features extracted before the class-centric projection turn-around layer are denoted as f ═ fL,fU]T∈R(p+q)·DWhere D is the feature dimension, fLAnd fURespectively representing feature vectors of marked data and unmarked data; the projection matrix of the center-like projection transduction layer is represented as W ═ WM,Wr]∈R(M +r)·(p+q)Where the first M columns represent anchors with labeled data and the remaining r columns represent class center vectors for selected unlabeled data
Figure BDA0002776774610000041
In the case of a small batch of data,
Figure BDA0002776774610000042
the calculation formula of (c) is as follows:
Figure BDA0002776774610000043
the self-adaptive coefficient alpha is the average size of the class center of the labeled data cluster, and M is the number of the classes of the labeled data; therefore, the output y ═ W without the bias term can be obtained by the quasi-center projection transduction layerTf, then transmitting y into a softmax loss layer; the function of the adaptive coefficient alpha is to normalize the class center of the unlabeled data; for stability and fast convergence of training, a suitable scale factor is required for scaling, so that the activation mapping y for unlabeled dataUActivation mapping y similar to annotation dataL(ii) a In fact, the l2 norm inherently provides a reasonable a priori scale for each class center to map the input features fLTo an output yLActivation of (2); therefore, the similar scaling is carried out by multiplying the class center of the unlabeled data by the adaptive coefficient alpha, so that the activation of the unlabeled data and the activation of the labeled data have similar distribution, thereby ensuring the stability of training and fast receivingAstringency;
training a pedestrian re-recognition model based on the class center self-adaptive strategy, and converging when the size of the loss function is not fluctuated any more to obtain the pedestrian re-recognition model with high-efficiency recognition capability;
step seven: model compression quantization
Compressing and quantizing the pedestrian re-identification model obtained in the step six by using a deep learning inference optimizer TensrT, wherein the compression comprises transverse combination for combining convolution, bias and activation layers into an independent structure and longitudinal combination for combining layers with the same structure but different weights into a wider layer; the quantization is to map the 32-bit floating point type data of the compressed model into 8-bit integer data, finally reduce the size of the pedestrian re-identification model to one fourth of the original size, and increase the testing speed by 3 to 5 times;
finally obtaining a pedestrian re-identification model for testing;
and (3) a testing stage:
the method comprises the following steps: data pre-processing
In the data preprocessing, the pedestrian video data to be recognized are processed in the same stage as the training stage, namely, the pedestrian detection and the pedestrian tracking are required to be completed so as to obtain data which can be directly input into the pedestrian re-recognition model, and the data are used as an image library for searching pedestrians;
step two: pedestrian search
Pedestrian searching requires finding t pieces of pedestrian image data with the same identity as the query image in an image library by using the query image; firstly, feature extraction is carried out on a query image and images in an image library by using a pedestrian re-recognition model finally obtained in a training stage, and then the cosine distance between the features of the query image and the features of the images in the image library is calculated to obtain a distance matrix and an image pair corresponding relation;
step three: reordering
Reordering the distance matrix by using an extended query algorithm; the distance matrixes in the second step are sequenced to obtain t images with the minimum distance from the query image, the t images and the features of the query image are summed and averaged to serve as the features of a new query image, the cosine distances of the features of the new query image and the features of the images in the image library are calculated to obtain a new distance matrix and the corresponding relation of the image pairs, and finally the distance matrixes are sequenced, wherein the first t images are the t pedestrian image data which are finally searched in the image library and have the same identity as the query image.
Compared with the prior art, the invention has the advantages and effects that:
(1) the invention innovatively provides a class center projection transduction layer to be applied to a class center self-adaptive learning mechanism to replace a full connection layer for supervised classification. By detecting and clustering a large amount of non-labeled cross-domain pedestrian data in a real scene, dynamically estimating class centers after each small batch of non-labeled data are clustered, and adding the class centers as new anchors into a projection matrix of a re-identification model class center projection conversion layer, the learning capacity of a pedestrian re-identification model on the cross-domain data is enhanced. The invention overcomes the defect that the cross-domain data identification performance of a common key personnel search system is sharply reduced, and has strong cross-domain re-identification capability.
(2) In a transverse comparison mode, the method learns the characteristics of richer data domains by using the class center self-adaptive strategy, and has better cross-domain identification capability compared with other classical pedestrian re-identification methods. In a longitudinal comparison mode, the method and the device effectively combine the unmarked data and the marked data for learning instead of only considering one of the unmarked data and the marked data, and improve the utilization efficiency and the value of the pedestrian monitoring data.
Drawings
FIG. 1 is a flow chart of a key person searching system based on a class center adaptive strategy.
Detailed Description
For a better understanding of the present invention, some concepts will be explained.
1. The labeled pedestrian dataset used in the present invention is from Market-1501 dataset published by the university of qinghua, which comprises 1501 pedestrians and 32668 detected rectangular boxes of pedestrians photographed by 6 cameras (of which 5 high-definition cameras and 1 low-definition camera).
Softmax normalized exponential function which "compresses" a K-dimensional vector z containing arbitrary real numbers into another K-dimensional real vector σ (z), each element ranging between (0,1) and all element sums being 1, calculated as:
Figure BDA0002776774610000061
wherein
Figure BDA0002776774610000062
As shown in FIG. 1, the method for identifying the cross-domain key personnel search system based on class center self-adaptation of the invention comprises two stages of training and testing, and comprises the following implementation steps:
a training stage:
the method comprises the following steps: new data collection and data pre-processing
In the running process of the pedestrian monitoring equipment, the data collection module continuously collects new pedestrian video data collected by different cameras; the pedestrian video data are respectively stored according to different acquisition scenes, and the pedestrian video data acquired in the same scene are stored in an independent database; for pedestrian video data of the same scene, after the number of new data reaches a threshold value, entering a subsequent step;
step two: pedestrian detection
Inputting pedestrian video data to be detected, detecting pedestrians in each frame of image in the pedestrian video data by using a real-time target detection model realized based on a regional proposal network, and judging the positions of the pedestrians in each frame of image to obtain a pedestrian detection frame;
step three: pedestrian tracking
Extracting the features of the pedestrian images in the pedestrian detection frames of each frame of image by using a convolutional neural network as apparent features, matching the pedestrian detection frames of the previous and next frames by using the apparent features of the pedestrian images in the pedestrian detection frames between each frame of image, realizing the tracking of the same pedestrian in the continuous video frames, cutting out the matched pedestrian images according to the positions of the pedestrian detection frames, and finally obtaining a pedestrian image cluster formed by one or more continuous video frames of each pedestrian appearing in input pedestrian video data;
step four: pedestrian image cluster key frame extraction
The pedestrian image clusters obtained in the step three are formed by continuous video frames, and the pedestrian image difference of adjacent continuous video frames is very small, so that the key frame extraction algorithm is used for extracting the key frame of each pedestrian image cluster to obtain the key frame in the pedestrian image cluster, and a new non-redundant pedestrian image cluster is formed; each key frame is very similar to the frames around the key frame, and one key frame can meet the subsequent use requirement;
when extracting key frames, firstly extracting image characteristics of pedestrian image clusters by using a convolutional neural network, then calculating the distance between adjacent video frames according to the image characteristics, and marking the video frames with the distance larger than a threshold value as the key frames;
step five: pedestrian image clustering
Extracting features from the pedestrian image clusters obtained in the fourth step by using a convolutional neural network, obtaining a distance matrix by calculating cosine distances between the features, clustering the distance matrix by using a density-based spatial clustering algorithm with noise to obtain N classes, and assuming that N images are shared in the pedestrian image clusters for clustering, the value range of N is 1 to N;
when clustering is carried out, firstly, the minimum sample number m belonging to a class and the minimum distance e between samples in the class are set, then, clusters are searched for each data in a distance matrix according to the minimum distance e, and for a certain sample c in the distance matrix, if the number of samples contained in a neighborhood taking c as a center e as a radius is more than m, a cluster taking c as a core object is created; then, adding the sample with the distance less than e from any sample in the cluster into the cluster, and continuously iterating the process until each sample finds the cluster or does not belong to any cluster;
step six: class center adaptive strategy-based pedestrian re-recognition training model
In order to train by using unlabeled data, the invention innovatively provides a class center projection transfer layer to be applied to a class center self-adaptive learning mechanism to replace a full connection layer used for supervised classification before; the center-like projection conversion layer is improved from a full-connection layer, the projection matrix of the layer is consistent with the previous full-connection layer for labeled data, and the projection matrix is formed by simulating anchors by using respective center-like projection layers for unlabeled data;
let's ChiLRepresenting a labeled data set, χ, containing M classesURepresenting the unmarked data set, namely the data obtained by the fifth step; firstly, setting a threshold value, wherein the threshold value is smaller than the threshold value, a small batch is set, and in a training turn, constructing data of the small batch
Figure BDA0002776774610000071
Including annotated data therein
Figure BDA0002776774610000072
And unlabeled data
Figure BDA0002776774610000073
p represents the number of data marked in the small batch of data, and q represents the number of data unmarked in the small batch of data;
Figure BDA0002776774610000074
randomly selected from the annotated data; and then
Figure BDA0002776774610000075
Randomly selecting r unmarked clusters from unmarked data, and randomly selecting s samples from each cluster to construct, namely q is r.s; it should be noted that, for each small batch of data, the selected r clusters are dynamically changed; thus, this small batch of data B is extracted after being fed into the network, before the class-centric projection transduction layerThe characteristic is expressed as f ═ fL,fU]T∈R(p+q)·DWhere D is the feature dimension, fLAnd fURespectively labeling feature vectors of data and unlabeled data; the projection matrix of the class center projection transduction layer is represented as W ═ WM,Wr]∈R(M+r)·(p+q)Where the first M columns represent anchors with labeled data and the remaining r columns represent class center vectors for selected unlabeled data
Figure BDA0002776774610000076
In a small amount of the data of a lot,
Figure BDA0002776774610000077
the calculation formula of (a) is as follows:
Figure BDA0002776774610000078
the self-adaptive coefficient alpha is the average size of the class center of the labeled data cluster, and M is the number of the classes of the labeled data; therefore, the output y ═ W without the bias term can be obtained by the quasi-center projection transduction layerTf, then transferring y into a softmax loss layer; the function of the self-adaptive coefficient alpha is to normalize the class center of the unlabeled data; for stability and fast convergence of training, a suitable scale factor is required for scaling, so that the activation mapping y for unlabeled dataUActivation mapping y similar to annotation dataL(ii) a In fact, the l2 norm inherently provides a reasonable a priori scale for each class center to map the input features fLTo an output yLActivation of (2); therefore, the similar scale scaling with the labeled data is carried out by multiplying the class center of the unlabeled data by the adaptive coefficient alpha, so that the activation of the unlabeled data and the activation of the labeled data have similar distribution, and the stability and the quick convergence of the training are ensured;
training a pedestrian re-recognition model based on the above class center self-adaptive strategy, and converging when the size of the loss function does not fluctuate any more to obtain the pedestrian re-recognition model with high-efficiency recognition capability;
step seven: model compression quantization
Compressing and quantizing the pedestrian re-identification model obtained in the step six by using a deep learning inference optimizer TensrT, wherein the compression comprises transverse combination for combining convolution, bias and activation layers into an independent structure and longitudinal combination for combining layers with the same structure but different weights into a wider layer; the quantization is to map the data of the 32-bit floating point number type of the compressed model into 8-bit integer data; finally, the size of the pedestrian re-identification model can be reduced to one fourth of the original size, and the testing speed is improved by 3 to 5 times;
finally obtaining a pedestrian re-identification model for testing;
and (3) a testing stage:
the method comprises the following steps: data pre-processing
In the data preprocessing, pedestrian video data to be recognized are processed in the same stage as a training stage, namely pedestrian detection and pedestrian tracking are required to be completed so as to obtain data which can be directly input into a pedestrian re-recognition model, and the data are used as an image library for pedestrian searching;
step two: pedestrian search
Pedestrian searching requires finding t pieces of pedestrian image data with the same identity as the query image in an image library by using the query image; firstly, feature extraction is carried out on a query image and images in an image library by using a pedestrian re-recognition model finally obtained in a training stage, and then the cosine distance between the features of the query image and the features of the images in the image library is calculated to obtain a distance matrix and an image pair corresponding relation;
step three: reordering
Reordering the distance matrix by using an extended query algorithm; the distance matrixes in the second step are sequenced to obtain t images with the minimum distance from the query image, the t images and the features of the query image are summed and averaged to serve as the features of a new query image, the cosine distances of the features of the new query image and the features of the images in the image library are calculated to obtain a new distance matrix and the corresponding relation of the image pairs, and finally the distance matrixes are sequenced, wherein the first t images are the t pedestrian image data which are finally searched in the image library and have the same identity as the query image.
The invention innovatively provides a class center projection transduction layer to be applied to a class center self-adaptive learning mechanism to replace a full connection layer for supervised classification. By detecting and clustering a large amount of non-labeled cross-domain pedestrian data in a real scene, dynamically estimating class centers after each small batch of non-labeled data are clustered, and adding the class centers as new anchors into a projection matrix of a re-identification model class center projection transition layer, the non-labeled data and the labeled data are effectively combined for learning. The method overcomes the defect that the cross-domain data identification performance of a common key personnel search system is sharply reduced, and has strong cross-domain re-identification capability.
The cross-domain re-identification method can be applied to searching key personnel in a real scene, can improve the system performance by fully utilizing a large amount of unlabeled pedestrian data, and has excellent cross-domain re-identification effect.
In a word, the method for cross-domain re-identification of the key personnel search system based on the class center self-adaptive strategy is utilized, so that the key personnel search system has strong generalization and cross-domain re-identification capability while solving the problem of performance reduction of cross-domain pedestrian re-identification.
Portions of the invention not described in detail are well within the skill of the art.
The above examples are provided for the purpose of describing the present invention only and are not intended to limit the scope of the present invention. The scope of the invention is defined by the appended claims. Various equivalent substitutions and modifications can be made without departing from the spirit and principles of the invention, and are intended to be included within the scope of the invention.

Claims (1)

1. A key personnel search system cross-domain identification method based on class center self-adaptation is characterized by comprising two stages of training and testing:
the training phase is implemented as follows:
the method comprises the following steps: new data collection and data pre-processing
In the running process of the pedestrian monitoring equipment, the data collection module continuously collects new pedestrian video data collected by different cameras, wherein the pedestrian video data are respectively stored according to different collection scenes, and the pedestrian video data collected in the same scene are stored in an independent database; for pedestrian video data of the same scene, after the number of new data reaches a threshold value, entering the subsequent step;
step two: pedestrian detection
Inputting pedestrian video data to be detected, detecting pedestrians in each frame of image in the pedestrian video data by using a real-time target detection model realized based on a regional proposal network, and judging the positions of the pedestrians in each frame of image to obtain a pedestrian detection frame;
step three: pedestrian tracking
Extracting the features of the pedestrian images in the pedestrian detection frames of each frame of image by using a convolutional neural network as apparent features, matching the pedestrian detection frames of the previous and next frames by using the apparent features of the pedestrian images in the pedestrian detection frames between each frame of image, realizing the tracking of the same pedestrian in the continuous video frames, cutting the matched pedestrian images according to the positions of the pedestrian detection frames, and finally obtaining a pedestrian image cluster formed by one or more continuous video frames of each pedestrian appearing in the input pedestrian video data;
step four: pedestrian image cluster key frame extraction
Extracting a key frame of each pedestrian image cluster by using a key frame extraction algorithm to obtain key frames in the pedestrian image clusters and form a new non-redundant pedestrian image cluster;
when extracting the key frame, firstly extracting image characteristics of the pedestrian image cluster by using a convolutional neural network, then calculating the distance between adjacent video frames according to the image characteristics, and marking the video frames with the distance larger than a threshold value as the key frames;
step five: pedestrian image clustering
Extracting features from the pedestrian image clusters obtained in the fourth step by using a convolutional neural network, obtaining a distance matrix by calculating cosine distances between the features, clustering the distance matrix by using a density-based spatial clustering algorithm with noise to obtain N classes, and assuming that N images are shared in the pedestrian image clusters for clustering, the value range of N is 1 to N;
when clustering is carried out, firstly, the minimum number m of samples belonging to a class and the minimum distance e between samples in the class are set, then, clusters are searched for each data in a distance matrix according to the minimum distance e, and for a certain sample c in the distance matrix, if the number of samples contained in a neighborhood taking c as the center e as the radius is more than m, a cluster taking c as a core object is created; then, adding the sample with the distance less than e from any sample in the cluster into the cluster, and continuously iterating the process until each sample finds the cluster or does not belong to any cluster;
step six: class center adaptive strategy-based pedestrian re-recognition training model
The method comprises the steps that a convolutional neural network is used for constructing a pedestrian re-identification network framework, a class center projection transfer layer is used for replacing a common full-connection layer for supervised classification, the class center projection transfer layer is improved from the full-connection layer, a projection matrix of the layer is consistent with the full-connection layer for labeled data, and the projection matrix is formed by simulating anchors by using respective class centers of the labeled data;
let's ChiLRepresenting a labeled dataset, χ, containing M classesURepresenting the unlabeled data set, namely the data obtained by the step five; setting a threshold value, wherein the threshold value is smaller than the threshold value, a small batch is set, and in a training turn, constructing data of the small batch
Figure FDA0002776774600000021
Including labeled data therein
Figure FDA0002776774600000022
And unlabeled data
Figure FDA0002776774600000023
p represents the amount of data marked in the small batch of data, q represents the amount of data not marked in the small batch of data,
Figure FDA0002776774600000024
randomly selected from the marked data; while
Figure FDA0002776774600000025
The method is constructed by randomly selecting r unmarked clusters from unmarked data and then randomly selecting s samples from each cluster, namely q is r.s, after the small batch of data B is sent into a network, the characteristic extracted before a class center projection transduction layer is expressed as f, and the output y without bias items is obtained by the class center projection transduction layerTf is used for classification, and then y is transmitted into a softmax loss layer;
training a pedestrian re-recognition model based on the class center self-adaptive strategy, and converging when the size of the loss function is not fluctuated any more to obtain the pedestrian re-recognition model with high-efficiency recognition capability;
step seven: model compression quantization
Compressing and quantizing the pedestrian re-identification model obtained in the step six by using a deep learning inference optimizer TensrT, wherein the compression comprises transverse combination for combining convolution, bias and activation layers into an independent structure and longitudinal combination for combining layers with the same structure but different weights into a wider layer; the quantization is to map the data of 32-bit floating point number type of the compressed model into 8-bit integer data, finally reduce the size of the pedestrian re-identification model to one fourth of the original size, and improve the testing speed by 3 to 5 times;
finally obtaining a pedestrian re-identification model for testing;
the test phase is implemented as follows:
the method comprises the following steps: data pre-processing
In the data preprocessing, pedestrian video data to be recognized are processed in the same stage as a training stage, namely pedestrian detection and pedestrian tracking need to be completed, data which can be directly input into a pedestrian re-recognition model are obtained, and the data are used as an image library for pedestrian searching;
step two: pedestrian search
Using the query image to find t pieces of pedestrian image data with the same identity as the query image in the image library, firstly using a pedestrian re-identification model finally obtained in a training stage to extract the features of the query image and the images in the image library, and then calculating the cosine distance between the features of the query image and the features of the images in the image library to obtain a distance matrix and an image pair corresponding relation;
step three: reordering
And reordering the distance matrixes by using an expanded query algorithm, namely firstly sequencing the distance matrixes in the second step to obtain t images with the minimum distance from the query images, then summing and averaging the characteristics of the t images and the query images to be used as the characteristics of a new query image, then calculating the cosine distance between the characteristics of the new query image and the characteristics of the images in the image library to obtain a new distance matrix and an image pair corresponding relation, and finally sequencing the distance matrix, wherein the first t images are t pedestrian image data which are finally found in the image library and have the same identity with the query images.
CN202011267881.5A 2020-11-13 2020-11-13 Cross-domain identification method of key personnel search system based on class center self-adaption Active CN112784674B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011267881.5A CN112784674B (en) 2020-11-13 2020-11-13 Cross-domain identification method of key personnel search system based on class center self-adaption

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011267881.5A CN112784674B (en) 2020-11-13 2020-11-13 Cross-domain identification method of key personnel search system based on class center self-adaption

Publications (2)

Publication Number Publication Date
CN112784674A CN112784674A (en) 2021-05-11
CN112784674B true CN112784674B (en) 2022-07-15

Family

ID=75750488

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011267881.5A Active CN112784674B (en) 2020-11-13 2020-11-13 Cross-domain identification method of key personnel search system based on class center self-adaption

Country Status (1)

Country Link
CN (1) CN112784674B (en)

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109961051B (en) * 2019-03-28 2022-11-15 湖北工业大学 Pedestrian re-identification method based on clustering and block feature extraction
CN110942025A (en) * 2019-11-26 2020-03-31 河海大学 Unsupervised cross-domain pedestrian re-identification method based on clustering
CN111639561A (en) * 2020-05-17 2020-09-08 西北工业大学 Unsupervised pedestrian re-identification method based on category self-adaptive clustering
CN111738172B (en) * 2020-06-24 2021-02-12 中国科学院自动化研究所 Cross-domain target re-identification method based on feature counterstudy and self-similarity clustering

Also Published As

Publication number Publication date
CN112784674A (en) 2021-05-11

Similar Documents

Publication Publication Date Title
CN111259786B (en) Pedestrian re-identification method based on synchronous enhancement of appearance and motion information of video
CN111126360B (en) Cross-domain pedestrian re-identification method based on unsupervised combined multi-loss model
CN110414368B (en) Unsupervised pedestrian re-identification method based on knowledge distillation
CN108960140B (en) Pedestrian re-identification method based on multi-region feature extraction and fusion
CN109961051B (en) Pedestrian re-identification method based on clustering and block feature extraction
CN107133569B (en) Monitoring video multi-granularity labeling method based on generalized multi-label learning
CN111709311B (en) Pedestrian re-identification method based on multi-scale convolution feature fusion
CN108520226B (en) Pedestrian re-identification method based on body decomposition and significance detection
CN111666851B (en) Cross domain self-adaptive pedestrian re-identification method based on multi-granularity label
CN110717411A (en) Pedestrian re-identification method based on deep layer feature fusion
CN111881714A (en) Unsupervised cross-domain pedestrian re-identification method
CN111639564B (en) Video pedestrian re-identification method based on multi-attention heterogeneous network
CN111582178B (en) Vehicle weight recognition method and system based on multi-azimuth information and multi-branch neural network
CN111027377B (en) Double-flow neural network time sequence action positioning method
CN112819065A (en) Unsupervised pedestrian sample mining method and unsupervised pedestrian sample mining system based on multi-clustering information
Javad Shafiee et al. Embedded motion detection via neural response mixture background modeling
Li et al. A review of deep learning methods for pixel-level crack detection
CN115100709A (en) Feature-separated image face recognition and age estimation method
CN116704611A (en) Cross-visual-angle gait recognition method based on motion feature mixing and fine-granularity multi-stage feature extraction
Sadique et al. Content-based image retrieval using color layout descriptor, gray-level co-occurrence matrix and k-nearest neighbors
CN114780767A (en) Large-scale image retrieval method and system based on deep convolutional neural network
CN113627237A (en) Late-stage fusion face image clustering method and system based on local maximum alignment
CN112784674B (en) Cross-domain identification method of key personnel search system based on class center self-adaption
CN111291785A (en) Target detection method, device, equipment and storage medium
CN115049894A (en) Target re-identification method of global structure information embedded network based on graph learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant