CN112784674B - Cross-domain identification method of key personnel search system based on class center self-adaption - Google Patents
Cross-domain identification method of key personnel search system based on class center self-adaption Download PDFInfo
- Publication number
- CN112784674B CN112784674B CN202011267881.5A CN202011267881A CN112784674B CN 112784674 B CN112784674 B CN 112784674B CN 202011267881 A CN202011267881 A CN 202011267881A CN 112784674 B CN112784674 B CN 112784674B
- Authority
- CN
- China
- Prior art keywords
- pedestrian
- data
- image
- images
- cluster
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/41—Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
- G06V20/42—Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items of sport video content
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2415—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/46—Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/50—Context or environment of the image
- G06V20/52—Surveillance or monitoring of activities, e.g. for recognising suspicious objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Multimedia (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Computational Linguistics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Molecular Biology (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Health & Medical Sciences (AREA)
- Evolutionary Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Human Computer Interaction (AREA)
- Probability & Statistics with Applications (AREA)
- Image Analysis (AREA)
Abstract
The invention provides a cross-domain identification method of a key personnel search system based on class center self-adaptation, which comprises two stages of training and testing: the invention not only overcomes the problem that the training and testing performance of the deep learning model is sharply reduced in different data fields, but also clusters more data of the unmarked data fields through a certain learning strategy and applies the data to the training of the pedestrian re-identification model, so that the key personnel search system has better generalization performance and stronger cross-domain identification capability.
Description
Technical Field
The invention relates to a cross-domain identification method of a key personnel search system based on class center self-adaptation, and belongs to the field of machine learning.
Background
The rapid development of various industries is being deeply influenced by artificial intelligence, and the artificial intelligence accumulates knowledge and achievements as much as the sea of tobacco through the development of nearly 70 years and is widely applied. Under the development of the internet, big data and supercomputers, the development of AI technology also enters a new stage with characteristics of deep learning, cross-border fusion, man-machine cooperation and the like. Computer vision is the most rapidly developing and widely applied technical direction in the field of artificial intelligence.
Pedestrian re-recognition, also known as pedestrian re-recognition, refers to a technique for retrieving an image of a pedestrian crossing equipment after a monitored pedestrian image is given by using a computer vision technique, and is also regarded as a sub-problem of image retrieval. The pedestrian re-identification can make up the visual limitation of the current fixed camera, can be combined with pedestrian detection and pedestrian tracking technologies, and can be widely applied to the fields of intelligent video monitoring, intelligent security and the like. With the development of deep learning in recent years, deep learning technology is widely applied to a pedestrian re-identification task. Currently, many deep learning-based methods have improved the performance of homologous pedestrian re-identification to a very high level.
However, the performance of the deep learning models is reduced catastrophically once the deep learning models are put on a new non-homologous data set, which is a major problem when the human re-identification technology is put into practical use. For the key personnel searching system, data collected by different cameras in different scenes and different time needs to be processed, and video clips appearing in key personnel to be searched are found. Because figure shapes, illumination, environments and the like in pictures of different cameras and scenes are changed, the problem that cross-domain pedestrian re-identification is achieved when a model is generalized to different domains is solved, namely the model can be generalized to a target domain after the model is trained in a source domain, wherein data of the source domain has marking information, and data of the target domain has no marking information. The cross-domain problem arises because of inconsistencies in the distribution between the data sets, which is referred to as domain drift or domain bias. In addition, in practical application scenes, the cost of marking data in each new scene is very high, and cross-camera marking is more needed for marking the pedestrian re-identification, so that more marking cost is needed. Therefore, a cross-domain re-identification method for pedestrian data is needed to solve the above problems faced by the key person search system, and improve the usability and robustness of the system.
Disclosure of Invention
The technical problem of the invention is solved: the method overcomes the defects of the prior art, provides a key personnel search system cross-domain identification method based on class center self-adaptation, can efficiently utilize large-scale unlabeled data to train a pedestrian re-identification model, still keeps performance unattenuated in a cross-domain scene, and has high-accuracy cross-domain identification capability.
In the invention, under the background of a cross-domain pedestrian re-identification task of key personnel search, a key personnel search system with high cross-domain pedestrian re-identification capability is constructed based on a class center self-adaptive strategy and a deep learning model, and the technical problems to be solved by the application are as follows: 1. pedestrian labeling on the monitoring data is high in cost, and a general pedestrian re-identification model cannot efficiently use a large amount of data without pedestrian labeling to train the model; 2. the common key personnel search system only has better pedestrian re-identification capability in a single data source or a plurality of data sources with higher similarity, and cannot effectively solve the problem of pedestrian re-identification between different data sources with larger difference.
The technical problem to be solved by the invention is as follows: the cross-domain re-recognition method of the key personnel search system based on the class center self-adaptive strategy is provided, the problem that training and testing performance of a deep learning model in different data domains is rapidly reduced is solved, more data of unmarked data domains are clustered through a certain learning strategy and are used for training a pedestrian re-recognition model, and the key personnel search system has better generalization performance and stronger cross-domain recognition capability.
The technical scheme is as follows:
the cross-domain identification method of the key personnel search system based on class center self-adaptation comprises the following steps:
a training stage:
the method comprises the following steps: new data collection and data pre-processing
In the running process of the pedestrian monitoring equipment, the data collection module continuously collects new pedestrian video data collected by different cameras; the pedestrian video data are respectively stored according to different collection scenes, and the pedestrian video data collected in the same scene are stored in an independent database; for pedestrian video data of the same scene, after the number of new data reaches a threshold value, entering the subsequent step;
step two: pedestrian detection
Inputting pedestrian video data to be detected, detecting pedestrians for each frame of image in the pedestrian video data by using a real-time target detection model realized based on a region proposal network, and judging the positions of the pedestrians in each frame of image to obtain a pedestrian detection frame;
step three: pedestrian tracking
Extracting the features of the pedestrian images in the pedestrian detection frames of each frame of image by using a convolutional neural network as apparent features, matching the pedestrian detection frames of the previous and next frames by using the apparent features of the pedestrian images in the pedestrian detection frames between each frame of image, realizing the tracking of the same pedestrian in the continuous video frames, cutting out the matched pedestrian images according to the positions of the pedestrian detection frames, and finally obtaining a pedestrian image cluster formed by one or more continuous video frames of each pedestrian appearing in input pedestrian video data;
step four: pedestrian image cluster key frame extraction
The pedestrian image clusters obtained in the step three are formed by continuous video frames, and the pedestrian image difference of adjacent continuous video frames is very small, so that the key frame extraction algorithm is used for extracting the key frame of each pedestrian image cluster to obtain the key frame in the pedestrian image cluster, and a new non-redundant pedestrian image cluster is formed; each key frame is very similar to the frames around the key frame, and one key frame can meet the subsequent use requirement;
when extracting the key frame, firstly extracting image characteristics of the pedestrian image cluster by using a convolutional neural network, then calculating the distance between adjacent video frames according to the image characteristics, and marking the video frames with the distance larger than a threshold value as the key frames;
step five: pedestrian image clustering
Extracting features from the pedestrian image clusters obtained in the fourth step by using a convolutional neural network, obtaining a distance matrix by calculating cosine distances between the features, clustering the distance matrix by using a density-based spatial clustering algorithm with noise to obtain N classes, and assuming that N images are shared in the pedestrian image clusters for clustering, the value range of N is 1 to N;
when clustering is carried out, firstly, the minimum number m of samples belonging to a class and the minimum distance e between samples in the class are set, then, clusters are searched for each data in a distance matrix according to the minimum distance e, and for a certain sample c in the distance matrix, if the number of samples contained in a neighborhood taking c as the center e as the radius is more than m, a cluster taking c as a core object is created; then, adding the sample with the distance less than e from any sample in the cluster into the cluster, and continuously iterating the process until each sample finds the cluster or does not belong to any cluster;
step six: class center adaptive strategy-based pedestrian re-recognition training model
In order to train by using unlabeled data, the invention innovatively provides a class center projection transduction layer which is applied to a class center self-adaptive learning mechanism and is used for replacing a full connection layer used for supervised classification; the center-like projection conversion layer is improved from a full-connection layer, the projection matrix of the layer is consistent with the previous full-connection layer for labeled data, and the projection matrix is formed by simulating anchors by using respective center-like projection layers for unlabeled data;
x shapeLRepresenting a labeled dataset, χ, containing M classesURepresenting the unmarked data set, namely the data obtained by the fifth step; firstly setting a threshold value, wherein the threshold value is smaller than the threshold value, a small batch is set, and in a training turn, a small batch of data is constructedIncluding annotated data thereinAnd unlabeled datap represents the number of data marked in the small batch of data, and q represents the number of data not marked in the small batch of data;randomly selected from the marked data; whileRandomly selecting r unmarked clusters from unmarked data, and randomly selecting s samples from each cluster to construct, namely q is r.s; it is noted that for each oneThe selected r clusters are all dynamically changed in the small batch of data; thus, after this small batch of data B is fed into the network, the features extracted before the class-centric projection turn-around layer are denoted as f ═ fL,fU]T∈R(p+q)·DWhere D is the feature dimension, fLAnd fURespectively representing feature vectors of marked data and unmarked data; the projection matrix of the center-like projection transduction layer is represented as W ═ WM,Wr]∈R(M +r)·(p+q)Where the first M columns represent anchors with labeled data and the remaining r columns represent class center vectors for selected unlabeled dataIn the case of a small batch of data,the calculation formula of (c) is as follows:
the self-adaptive coefficient alpha is the average size of the class center of the labeled data cluster, and M is the number of the classes of the labeled data; therefore, the output y ═ W without the bias term can be obtained by the quasi-center projection transduction layerTf, then transmitting y into a softmax loss layer; the function of the adaptive coefficient alpha is to normalize the class center of the unlabeled data; for stability and fast convergence of training, a suitable scale factor is required for scaling, so that the activation mapping y for unlabeled dataUActivation mapping y similar to annotation dataL(ii) a In fact, the l2 norm inherently provides a reasonable a priori scale for each class center to map the input features fLTo an output yLActivation of (2); therefore, the similar scaling is carried out by multiplying the class center of the unlabeled data by the adaptive coefficient alpha, so that the activation of the unlabeled data and the activation of the labeled data have similar distribution, thereby ensuring the stability of training and fast receivingAstringency;
training a pedestrian re-recognition model based on the class center self-adaptive strategy, and converging when the size of the loss function is not fluctuated any more to obtain the pedestrian re-recognition model with high-efficiency recognition capability;
step seven: model compression quantization
Compressing and quantizing the pedestrian re-identification model obtained in the step six by using a deep learning inference optimizer TensrT, wherein the compression comprises transverse combination for combining convolution, bias and activation layers into an independent structure and longitudinal combination for combining layers with the same structure but different weights into a wider layer; the quantization is to map the 32-bit floating point type data of the compressed model into 8-bit integer data, finally reduce the size of the pedestrian re-identification model to one fourth of the original size, and increase the testing speed by 3 to 5 times;
finally obtaining a pedestrian re-identification model for testing;
and (3) a testing stage:
the method comprises the following steps: data pre-processing
In the data preprocessing, the pedestrian video data to be recognized are processed in the same stage as the training stage, namely, the pedestrian detection and the pedestrian tracking are required to be completed so as to obtain data which can be directly input into the pedestrian re-recognition model, and the data are used as an image library for searching pedestrians;
step two: pedestrian search
Pedestrian searching requires finding t pieces of pedestrian image data with the same identity as the query image in an image library by using the query image; firstly, feature extraction is carried out on a query image and images in an image library by using a pedestrian re-recognition model finally obtained in a training stage, and then the cosine distance between the features of the query image and the features of the images in the image library is calculated to obtain a distance matrix and an image pair corresponding relation;
step three: reordering
Reordering the distance matrix by using an extended query algorithm; the distance matrixes in the second step are sequenced to obtain t images with the minimum distance from the query image, the t images and the features of the query image are summed and averaged to serve as the features of a new query image, the cosine distances of the features of the new query image and the features of the images in the image library are calculated to obtain a new distance matrix and the corresponding relation of the image pairs, and finally the distance matrixes are sequenced, wherein the first t images are the t pedestrian image data which are finally searched in the image library and have the same identity as the query image.
Compared with the prior art, the invention has the advantages and effects that:
(1) the invention innovatively provides a class center projection transduction layer to be applied to a class center self-adaptive learning mechanism to replace a full connection layer for supervised classification. By detecting and clustering a large amount of non-labeled cross-domain pedestrian data in a real scene, dynamically estimating class centers after each small batch of non-labeled data are clustered, and adding the class centers as new anchors into a projection matrix of a re-identification model class center projection conversion layer, the learning capacity of a pedestrian re-identification model on the cross-domain data is enhanced. The invention overcomes the defect that the cross-domain data identification performance of a common key personnel search system is sharply reduced, and has strong cross-domain re-identification capability.
(2) In a transverse comparison mode, the method learns the characteristics of richer data domains by using the class center self-adaptive strategy, and has better cross-domain identification capability compared with other classical pedestrian re-identification methods. In a longitudinal comparison mode, the method and the device effectively combine the unmarked data and the marked data for learning instead of only considering one of the unmarked data and the marked data, and improve the utilization efficiency and the value of the pedestrian monitoring data.
Drawings
FIG. 1 is a flow chart of a key person searching system based on a class center adaptive strategy.
Detailed Description
For a better understanding of the present invention, some concepts will be explained.
1. The labeled pedestrian dataset used in the present invention is from Market-1501 dataset published by the university of qinghua, which comprises 1501 pedestrians and 32668 detected rectangular boxes of pedestrians photographed by 6 cameras (of which 5 high-definition cameras and 1 low-definition camera).
Softmax normalized exponential function which "compresses" a K-dimensional vector z containing arbitrary real numbers into another K-dimensional real vector σ (z), each element ranging between (0,1) and all element sums being 1, calculated as:
As shown in FIG. 1, the method for identifying the cross-domain key personnel search system based on class center self-adaptation of the invention comprises two stages of training and testing, and comprises the following implementation steps:
a training stage:
the method comprises the following steps: new data collection and data pre-processing
In the running process of the pedestrian monitoring equipment, the data collection module continuously collects new pedestrian video data collected by different cameras; the pedestrian video data are respectively stored according to different acquisition scenes, and the pedestrian video data acquired in the same scene are stored in an independent database; for pedestrian video data of the same scene, after the number of new data reaches a threshold value, entering a subsequent step;
step two: pedestrian detection
Inputting pedestrian video data to be detected, detecting pedestrians in each frame of image in the pedestrian video data by using a real-time target detection model realized based on a regional proposal network, and judging the positions of the pedestrians in each frame of image to obtain a pedestrian detection frame;
step three: pedestrian tracking
Extracting the features of the pedestrian images in the pedestrian detection frames of each frame of image by using a convolutional neural network as apparent features, matching the pedestrian detection frames of the previous and next frames by using the apparent features of the pedestrian images in the pedestrian detection frames between each frame of image, realizing the tracking of the same pedestrian in the continuous video frames, cutting out the matched pedestrian images according to the positions of the pedestrian detection frames, and finally obtaining a pedestrian image cluster formed by one or more continuous video frames of each pedestrian appearing in input pedestrian video data;
step four: pedestrian image cluster key frame extraction
The pedestrian image clusters obtained in the step three are formed by continuous video frames, and the pedestrian image difference of adjacent continuous video frames is very small, so that the key frame extraction algorithm is used for extracting the key frame of each pedestrian image cluster to obtain the key frame in the pedestrian image cluster, and a new non-redundant pedestrian image cluster is formed; each key frame is very similar to the frames around the key frame, and one key frame can meet the subsequent use requirement;
when extracting key frames, firstly extracting image characteristics of pedestrian image clusters by using a convolutional neural network, then calculating the distance between adjacent video frames according to the image characteristics, and marking the video frames with the distance larger than a threshold value as the key frames;
step five: pedestrian image clustering
Extracting features from the pedestrian image clusters obtained in the fourth step by using a convolutional neural network, obtaining a distance matrix by calculating cosine distances between the features, clustering the distance matrix by using a density-based spatial clustering algorithm with noise to obtain N classes, and assuming that N images are shared in the pedestrian image clusters for clustering, the value range of N is 1 to N;
when clustering is carried out, firstly, the minimum sample number m belonging to a class and the minimum distance e between samples in the class are set, then, clusters are searched for each data in a distance matrix according to the minimum distance e, and for a certain sample c in the distance matrix, if the number of samples contained in a neighborhood taking c as a center e as a radius is more than m, a cluster taking c as a core object is created; then, adding the sample with the distance less than e from any sample in the cluster into the cluster, and continuously iterating the process until each sample finds the cluster or does not belong to any cluster;
step six: class center adaptive strategy-based pedestrian re-recognition training model
In order to train by using unlabeled data, the invention innovatively provides a class center projection transfer layer to be applied to a class center self-adaptive learning mechanism to replace a full connection layer used for supervised classification before; the center-like projection conversion layer is improved from a full-connection layer, the projection matrix of the layer is consistent with the previous full-connection layer for labeled data, and the projection matrix is formed by simulating anchors by using respective center-like projection layers for unlabeled data;
let's ChiLRepresenting a labeled data set, χ, containing M classesURepresenting the unmarked data set, namely the data obtained by the fifth step; firstly, setting a threshold value, wherein the threshold value is smaller than the threshold value, a small batch is set, and in a training turn, constructing data of the small batchIncluding annotated data thereinAnd unlabeled datap represents the number of data marked in the small batch of data, and q represents the number of data unmarked in the small batch of data;randomly selected from the annotated data; and thenRandomly selecting r unmarked clusters from unmarked data, and randomly selecting s samples from each cluster to construct, namely q is r.s; it should be noted that, for each small batch of data, the selected r clusters are dynamically changed; thus, this small batch of data B is extracted after being fed into the network, before the class-centric projection transduction layerThe characteristic is expressed as f ═ fL,fU]T∈R(p+q)·DWhere D is the feature dimension, fLAnd fURespectively labeling feature vectors of data and unlabeled data; the projection matrix of the class center projection transduction layer is represented as W ═ WM,Wr]∈R(M+r)·(p+q)Where the first M columns represent anchors with labeled data and the remaining r columns represent class center vectors for selected unlabeled dataIn a small amount of the data of a lot,the calculation formula of (a) is as follows:
the self-adaptive coefficient alpha is the average size of the class center of the labeled data cluster, and M is the number of the classes of the labeled data; therefore, the output y ═ W without the bias term can be obtained by the quasi-center projection transduction layerTf, then transferring y into a softmax loss layer; the function of the self-adaptive coefficient alpha is to normalize the class center of the unlabeled data; for stability and fast convergence of training, a suitable scale factor is required for scaling, so that the activation mapping y for unlabeled dataUActivation mapping y similar to annotation dataL(ii) a In fact, the l2 norm inherently provides a reasonable a priori scale for each class center to map the input features fLTo an output yLActivation of (2); therefore, the similar scale scaling with the labeled data is carried out by multiplying the class center of the unlabeled data by the adaptive coefficient alpha, so that the activation of the unlabeled data and the activation of the labeled data have similar distribution, and the stability and the quick convergence of the training are ensured;
training a pedestrian re-recognition model based on the above class center self-adaptive strategy, and converging when the size of the loss function does not fluctuate any more to obtain the pedestrian re-recognition model with high-efficiency recognition capability;
step seven: model compression quantization
Compressing and quantizing the pedestrian re-identification model obtained in the step six by using a deep learning inference optimizer TensrT, wherein the compression comprises transverse combination for combining convolution, bias and activation layers into an independent structure and longitudinal combination for combining layers with the same structure but different weights into a wider layer; the quantization is to map the data of the 32-bit floating point number type of the compressed model into 8-bit integer data; finally, the size of the pedestrian re-identification model can be reduced to one fourth of the original size, and the testing speed is improved by 3 to 5 times;
finally obtaining a pedestrian re-identification model for testing;
and (3) a testing stage:
the method comprises the following steps: data pre-processing
In the data preprocessing, pedestrian video data to be recognized are processed in the same stage as a training stage, namely pedestrian detection and pedestrian tracking are required to be completed so as to obtain data which can be directly input into a pedestrian re-recognition model, and the data are used as an image library for pedestrian searching;
step two: pedestrian search
Pedestrian searching requires finding t pieces of pedestrian image data with the same identity as the query image in an image library by using the query image; firstly, feature extraction is carried out on a query image and images in an image library by using a pedestrian re-recognition model finally obtained in a training stage, and then the cosine distance between the features of the query image and the features of the images in the image library is calculated to obtain a distance matrix and an image pair corresponding relation;
step three: reordering
Reordering the distance matrix by using an extended query algorithm; the distance matrixes in the second step are sequenced to obtain t images with the minimum distance from the query image, the t images and the features of the query image are summed and averaged to serve as the features of a new query image, the cosine distances of the features of the new query image and the features of the images in the image library are calculated to obtain a new distance matrix and the corresponding relation of the image pairs, and finally the distance matrixes are sequenced, wherein the first t images are the t pedestrian image data which are finally searched in the image library and have the same identity as the query image.
The invention innovatively provides a class center projection transduction layer to be applied to a class center self-adaptive learning mechanism to replace a full connection layer for supervised classification. By detecting and clustering a large amount of non-labeled cross-domain pedestrian data in a real scene, dynamically estimating class centers after each small batch of non-labeled data are clustered, and adding the class centers as new anchors into a projection matrix of a re-identification model class center projection transition layer, the non-labeled data and the labeled data are effectively combined for learning. The method overcomes the defect that the cross-domain data identification performance of a common key personnel search system is sharply reduced, and has strong cross-domain re-identification capability.
The cross-domain re-identification method can be applied to searching key personnel in a real scene, can improve the system performance by fully utilizing a large amount of unlabeled pedestrian data, and has excellent cross-domain re-identification effect.
In a word, the method for cross-domain re-identification of the key personnel search system based on the class center self-adaptive strategy is utilized, so that the key personnel search system has strong generalization and cross-domain re-identification capability while solving the problem of performance reduction of cross-domain pedestrian re-identification.
Portions of the invention not described in detail are well within the skill of the art.
The above examples are provided for the purpose of describing the present invention only and are not intended to limit the scope of the present invention. The scope of the invention is defined by the appended claims. Various equivalent substitutions and modifications can be made without departing from the spirit and principles of the invention, and are intended to be included within the scope of the invention.
Claims (1)
1. A key personnel search system cross-domain identification method based on class center self-adaptation is characterized by comprising two stages of training and testing:
the training phase is implemented as follows:
the method comprises the following steps: new data collection and data pre-processing
In the running process of the pedestrian monitoring equipment, the data collection module continuously collects new pedestrian video data collected by different cameras, wherein the pedestrian video data are respectively stored according to different collection scenes, and the pedestrian video data collected in the same scene are stored in an independent database; for pedestrian video data of the same scene, after the number of new data reaches a threshold value, entering the subsequent step;
step two: pedestrian detection
Inputting pedestrian video data to be detected, detecting pedestrians in each frame of image in the pedestrian video data by using a real-time target detection model realized based on a regional proposal network, and judging the positions of the pedestrians in each frame of image to obtain a pedestrian detection frame;
step three: pedestrian tracking
Extracting the features of the pedestrian images in the pedestrian detection frames of each frame of image by using a convolutional neural network as apparent features, matching the pedestrian detection frames of the previous and next frames by using the apparent features of the pedestrian images in the pedestrian detection frames between each frame of image, realizing the tracking of the same pedestrian in the continuous video frames, cutting the matched pedestrian images according to the positions of the pedestrian detection frames, and finally obtaining a pedestrian image cluster formed by one or more continuous video frames of each pedestrian appearing in the input pedestrian video data;
step four: pedestrian image cluster key frame extraction
Extracting a key frame of each pedestrian image cluster by using a key frame extraction algorithm to obtain key frames in the pedestrian image clusters and form a new non-redundant pedestrian image cluster;
when extracting the key frame, firstly extracting image characteristics of the pedestrian image cluster by using a convolutional neural network, then calculating the distance between adjacent video frames according to the image characteristics, and marking the video frames with the distance larger than a threshold value as the key frames;
step five: pedestrian image clustering
Extracting features from the pedestrian image clusters obtained in the fourth step by using a convolutional neural network, obtaining a distance matrix by calculating cosine distances between the features, clustering the distance matrix by using a density-based spatial clustering algorithm with noise to obtain N classes, and assuming that N images are shared in the pedestrian image clusters for clustering, the value range of N is 1 to N;
when clustering is carried out, firstly, the minimum number m of samples belonging to a class and the minimum distance e between samples in the class are set, then, clusters are searched for each data in a distance matrix according to the minimum distance e, and for a certain sample c in the distance matrix, if the number of samples contained in a neighborhood taking c as the center e as the radius is more than m, a cluster taking c as a core object is created; then, adding the sample with the distance less than e from any sample in the cluster into the cluster, and continuously iterating the process until each sample finds the cluster or does not belong to any cluster;
step six: class center adaptive strategy-based pedestrian re-recognition training model
The method comprises the steps that a convolutional neural network is used for constructing a pedestrian re-identification network framework, a class center projection transfer layer is used for replacing a common full-connection layer for supervised classification, the class center projection transfer layer is improved from the full-connection layer, a projection matrix of the layer is consistent with the full-connection layer for labeled data, and the projection matrix is formed by simulating anchors by using respective class centers of the labeled data;
let's ChiLRepresenting a labeled dataset, χ, containing M classesURepresenting the unlabeled data set, namely the data obtained by the step five; setting a threshold value, wherein the threshold value is smaller than the threshold value, a small batch is set, and in a training turn, constructing data of the small batchIncluding labeled data thereinAnd unlabeled datap represents the amount of data marked in the small batch of data, q represents the amount of data not marked in the small batch of data,randomly selected from the marked data; whileThe method is constructed by randomly selecting r unmarked clusters from unmarked data and then randomly selecting s samples from each cluster, namely q is r.s, after the small batch of data B is sent into a network, the characteristic extracted before a class center projection transduction layer is expressed as f, and the output y without bias items is obtained by the class center projection transduction layerTf is used for classification, and then y is transmitted into a softmax loss layer;
training a pedestrian re-recognition model based on the class center self-adaptive strategy, and converging when the size of the loss function is not fluctuated any more to obtain the pedestrian re-recognition model with high-efficiency recognition capability;
step seven: model compression quantization
Compressing and quantizing the pedestrian re-identification model obtained in the step six by using a deep learning inference optimizer TensrT, wherein the compression comprises transverse combination for combining convolution, bias and activation layers into an independent structure and longitudinal combination for combining layers with the same structure but different weights into a wider layer; the quantization is to map the data of 32-bit floating point number type of the compressed model into 8-bit integer data, finally reduce the size of the pedestrian re-identification model to one fourth of the original size, and improve the testing speed by 3 to 5 times;
finally obtaining a pedestrian re-identification model for testing;
the test phase is implemented as follows:
the method comprises the following steps: data pre-processing
In the data preprocessing, pedestrian video data to be recognized are processed in the same stage as a training stage, namely pedestrian detection and pedestrian tracking need to be completed, data which can be directly input into a pedestrian re-recognition model are obtained, and the data are used as an image library for pedestrian searching;
step two: pedestrian search
Using the query image to find t pieces of pedestrian image data with the same identity as the query image in the image library, firstly using a pedestrian re-identification model finally obtained in a training stage to extract the features of the query image and the images in the image library, and then calculating the cosine distance between the features of the query image and the features of the images in the image library to obtain a distance matrix and an image pair corresponding relation;
step three: reordering
And reordering the distance matrixes by using an expanded query algorithm, namely firstly sequencing the distance matrixes in the second step to obtain t images with the minimum distance from the query images, then summing and averaging the characteristics of the t images and the query images to be used as the characteristics of a new query image, then calculating the cosine distance between the characteristics of the new query image and the characteristics of the images in the image library to obtain a new distance matrix and an image pair corresponding relation, and finally sequencing the distance matrix, wherein the first t images are t pedestrian image data which are finally found in the image library and have the same identity with the query images.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011267881.5A CN112784674B (en) | 2020-11-13 | 2020-11-13 | Cross-domain identification method of key personnel search system based on class center self-adaption |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011267881.5A CN112784674B (en) | 2020-11-13 | 2020-11-13 | Cross-domain identification method of key personnel search system based on class center self-adaption |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112784674A CN112784674A (en) | 2021-05-11 |
CN112784674B true CN112784674B (en) | 2022-07-15 |
Family
ID=75750488
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011267881.5A Active CN112784674B (en) | 2020-11-13 | 2020-11-13 | Cross-domain identification method of key personnel search system based on class center self-adaption |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112784674B (en) |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109961051B (en) * | 2019-03-28 | 2022-11-15 | 湖北工业大学 | Pedestrian re-identification method based on clustering and block feature extraction |
CN110942025A (en) * | 2019-11-26 | 2020-03-31 | 河海大学 | Unsupervised cross-domain pedestrian re-identification method based on clustering |
CN111639561A (en) * | 2020-05-17 | 2020-09-08 | 西北工业大学 | Unsupervised pedestrian re-identification method based on category self-adaptive clustering |
CN111738172B (en) * | 2020-06-24 | 2021-02-12 | 中国科学院自动化研究所 | Cross-domain target re-identification method based on feature counterstudy and self-similarity clustering |
-
2020
- 2020-11-13 CN CN202011267881.5A patent/CN112784674B/en active Active
Also Published As
Publication number | Publication date |
---|---|
CN112784674A (en) | 2021-05-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111259786B (en) | Pedestrian re-identification method based on synchronous enhancement of appearance and motion information of video | |
CN111126360B (en) | Cross-domain pedestrian re-identification method based on unsupervised combined multi-loss model | |
CN110414368B (en) | Unsupervised pedestrian re-identification method based on knowledge distillation | |
CN108960140B (en) | Pedestrian re-identification method based on multi-region feature extraction and fusion | |
CN109961051B (en) | Pedestrian re-identification method based on clustering and block feature extraction | |
CN107133569B (en) | Monitoring video multi-granularity labeling method based on generalized multi-label learning | |
CN111709311B (en) | Pedestrian re-identification method based on multi-scale convolution feature fusion | |
CN108520226B (en) | Pedestrian re-identification method based on body decomposition and significance detection | |
CN111666851B (en) | Cross domain self-adaptive pedestrian re-identification method based on multi-granularity label | |
CN110717411A (en) | Pedestrian re-identification method based on deep layer feature fusion | |
CN111881714A (en) | Unsupervised cross-domain pedestrian re-identification method | |
CN111639564B (en) | Video pedestrian re-identification method based on multi-attention heterogeneous network | |
CN111582178B (en) | Vehicle weight recognition method and system based on multi-azimuth information and multi-branch neural network | |
CN111027377B (en) | Double-flow neural network time sequence action positioning method | |
CN112819065A (en) | Unsupervised pedestrian sample mining method and unsupervised pedestrian sample mining system based on multi-clustering information | |
Javad Shafiee et al. | Embedded motion detection via neural response mixture background modeling | |
Li et al. | A review of deep learning methods for pixel-level crack detection | |
CN115100709A (en) | Feature-separated image face recognition and age estimation method | |
CN116704611A (en) | Cross-visual-angle gait recognition method based on motion feature mixing and fine-granularity multi-stage feature extraction | |
Sadique et al. | Content-based image retrieval using color layout descriptor, gray-level co-occurrence matrix and k-nearest neighbors | |
CN114780767A (en) | Large-scale image retrieval method and system based on deep convolutional neural network | |
CN113627237A (en) | Late-stage fusion face image clustering method and system based on local maximum alignment | |
CN112784674B (en) | Cross-domain identification method of key personnel search system based on class center self-adaption | |
CN111291785A (en) | Target detection method, device, equipment and storage medium | |
CN115049894A (en) | Target re-identification method of global structure information embedded network based on graph learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |