CN112784674B

CN112784674B - Cross-domain identification method of key personnel search system based on class center self-adaption

Info

Publication number: CN112784674B
Application number: CN202011267881.5A
Authority: CN
Inventors: 冷彪; 李子涵
Original assignee: Beihang University
Current assignee: Beihang University
Priority date: 2020-11-13
Filing date: 2020-11-13
Publication date: 2022-07-15
Anticipated expiration: 2040-11-13
Also published as: CN112784674A

Abstract

The invention provides a cross-domain identification method of a key personnel search system based on class center self-adaptation, which comprises two stages of training and testing: the invention not only overcomes the problem that the training and testing performance of the deep learning model is sharply reduced in different data fields, but also clusters more data of the unmarked data fields through a certain learning strategy and applies the data to the training of the pedestrian re-identification model, so that the key personnel search system has better generalization performance and stronger cross-domain identification capability.

Description

Cross-domain identification method of key personnel search system based on class center self-adaption

Technical Field

The invention relates to a cross-domain identification method of a key personnel search system based on class center self-adaptation, and belongs to the field of machine learning.

Background

The rapid development of various industries is being deeply influenced by artificial intelligence, and the artificial intelligence accumulates knowledge and achievements as much as the sea of tobacco through the development of nearly 70 years and is widely applied. Under the development of the internet, big data and supercomputers, the development of AI technology also enters a new stage with characteristics of deep learning, cross-border fusion, man-machine cooperation and the like. Computer vision is the most rapidly developing and widely applied technical direction in the field of artificial intelligence.

Pedestrian re-recognition, also known as pedestrian re-recognition, refers to a technique for retrieving an image of a pedestrian crossing equipment after a monitored pedestrian image is given by using a computer vision technique, and is also regarded as a sub-problem of image retrieval. The pedestrian re-identification can make up the visual limitation of the current fixed camera, can be combined with pedestrian detection and pedestrian tracking technologies, and can be widely applied to the fields of intelligent video monitoring, intelligent security and the like. With the development of deep learning in recent years, deep learning technology is widely applied to a pedestrian re-identification task. Currently, many deep learning-based methods have improved the performance of homologous pedestrian re-identification to a very high level.

However, the performance of the deep learning models is reduced catastrophically once the deep learning models are put on a new non-homologous data set, which is a major problem when the human re-identification technology is put into practical use. For the key personnel searching system, data collected by different cameras in different scenes and different time needs to be processed, and video clips appearing in key personnel to be searched are found. Because figure shapes, illumination, environments and the like in pictures of different cameras and scenes are changed, the problem that cross-domain pedestrian re-identification is achieved when a model is generalized to different domains is solved, namely the model can be generalized to a target domain after the model is trained in a source domain, wherein data of the source domain has marking information, and data of the target domain has no marking information. The cross-domain problem arises because of inconsistencies in the distribution between the data sets, which is referred to as domain drift or domain bias. In addition, in practical application scenes, the cost of marking data in each new scene is very high, and cross-camera marking is more needed for marking the pedestrian re-identification, so that more marking cost is needed. Therefore, a cross-domain re-identification method for pedestrian data is needed to solve the above problems faced by the key person search system, and improve the usability and robustness of the system.

Disclosure of Invention

The technical problem of the invention is solved: the method overcomes the defects of the prior art, provides a key personnel search system cross-domain identification method based on class center self-adaptation, can efficiently utilize large-scale unlabeled data to train a pedestrian re-identification model, still keeps performance unattenuated in a cross-domain scene, and has high-accuracy cross-domain identification capability.

In the invention, under the background of a cross-domain pedestrian re-identification task of key personnel search, a key personnel search system with high cross-domain pedestrian re-identification capability is constructed based on a class center self-adaptive strategy and a deep learning model, and the technical problems to be solved by the application are as follows: 1. pedestrian labeling on the monitoring data is high in cost, and a general pedestrian re-identification model cannot efficiently use a large amount of data without pedestrian labeling to train the model; 2. the common key personnel search system only has better pedestrian re-identification capability in a single data source or a plurality of data sources with higher similarity, and cannot effectively solve the problem of pedestrian re-identification between different data sources with larger difference.

The technical problem to be solved by the invention is as follows: the cross-domain re-recognition method of the key personnel search system based on the class center self-adaptive strategy is provided, the problem that training and testing performance of a deep learning model in different data domains is rapidly reduced is solved, more data of unmarked data domains are clustered through a certain learning strategy and are used for training a pedestrian re-recognition model, and the key personnel search system has better generalization performance and stronger cross-domain recognition capability.

The technical scheme is as follows:

the cross-domain identification method of the key personnel search system based on class center self-adaptation comprises the following steps:

a training stage:

the method comprises the following steps: new data collection and data pre-processing

In the running process of the pedestrian monitoring equipment, the data collection module continuously collects new pedestrian video data collected by different cameras; the pedestrian video data are respectively stored according to different collection scenes, and the pedestrian video data collected in the same scene are stored in an independent database; for pedestrian video data of the same scene, after the number of new data reaches a threshold value, entering the subsequent step;

step two: pedestrian detection

Inputting pedestrian video data to be detected, detecting pedestrians for each frame of image in the pedestrian video data by using a real-time target detection model realized based on a region proposal network, and judging the positions of the pedestrians in each frame of image to obtain a pedestrian detection frame;

step three: pedestrian tracking

Extracting the features of the pedestrian images in the pedestrian detection frames of each frame of image by using a convolutional neural network as apparent features, matching the pedestrian detection frames of the previous and next frames by using the apparent features of the pedestrian images in the pedestrian detection frames between each frame of image, realizing the tracking of the same pedestrian in the continuous video frames, cutting out the matched pedestrian images according to the positions of the pedestrian detection frames, and finally obtaining a pedestrian image cluster formed by one or more continuous video frames of each pedestrian appearing in input pedestrian video data;

step four: pedestrian image cluster key frame extraction

The pedestrian image clusters obtained in the step three are formed by continuous video frames, and the pedestrian image difference of adjacent continuous video frames is very small, so that the key frame extraction algorithm is used for extracting the key frame of each pedestrian image cluster to obtain the key frame in the pedestrian image cluster, and a new non-redundant pedestrian image cluster is formed; each key frame is very similar to the frames around the key frame, and one key frame can meet the subsequent use requirement;

when extracting the key frame, firstly extracting image characteristics of the pedestrian image cluster by using a convolutional neural network, then calculating the distance between adjacent video frames according to the image characteristics, and marking the video frames with the distance larger than a threshold value as the key frames;

step five: pedestrian image clustering

Extracting features from the pedestrian image clusters obtained in the fourth step by using a convolutional neural network, obtaining a distance matrix by calculating cosine distances between the features, clustering the distance matrix by using a density-based spatial clustering algorithm with noise to obtain N classes, and assuming that N images are shared in the pedestrian image clusters for clustering, the value range of N is 1 to N;

when clustering is carried out, firstly, the minimum number m of samples belonging to a class and the minimum distance e between samples in the class are set, then, clusters are searched for each data in a distance matrix according to the minimum distance e, and for a certain sample c in the distance matrix, if the number of samples contained in a neighborhood taking c as the center e as the radius is more than m, a cluster taking c as a core object is created; then, adding the sample with the distance less than e from any sample in the cluster into the cluster, and continuously iterating the process until each sample finds the cluster or does not belong to any cluster;

step six: class center adaptive strategy-based pedestrian re-recognition training model

In order to train by using unlabeled data, the invention innovatively provides a class center projection transduction layer which is applied to a class center self-adaptive learning mechanism and is used for replacing a full connection layer used for supervised classification; the center-like projection conversion layer is improved from a full-connection layer, the projection matrix of the layer is consistent with the previous full-connection layer for labeled data, and the projection matrix is formed by simulating anchors by using respective center-like projection layers for unlabeled data;

x shape^LRepresenting a labeled dataset, χ, containing M classes^URepresenting the unmarked data set, namely the data obtained by the fifth step; firstly setting a threshold value, wherein the threshold value is smaller than the threshold value, a small batch is set, and in a training turn, a small batch of data is constructed

Including annotated data therein

And unlabeled data

p represents the number of data marked in the small batch of data, and q represents the number of data not marked in the small batch of data;

randomly selected from the marked data; while

Randomly selecting r unmarked clusters from unmarked data, and randomly selecting s samples from each cluster to construct, namely q is r.s; it is noted that for each oneThe selected r clusters are all dynamically changed in the small batch of data; thus, after this small batch of data B is fed into the network, the features extracted before the class-centric projection turn-around layer are denoted as f ═ f^L,f^U]^T∈R^(p+q)·DWhere D is the feature dimension, f^LAnd f^URespectively representing feature vectors of marked data and unmarked data; the projection matrix of the center-like projection transduction layer is represented as W ═ W^M,W^r]∈R^(M ^+r)·(p+q)Where the first M columns represent anchors with labeled data and the remaining r columns represent class center vectors for selected unlabeled data

In the case of a small batch of data,

the calculation formula of (c) is as follows:

the self-adaptive coefficient alpha is the average size of the class center of the labeled data cluster, and M is the number of the classes of the labeled data; therefore, the output y ═ W without the bias term can be obtained by the quasi-center projection transduction layer^Tf, then transmitting y into a softmax loss layer; the function of the adaptive coefficient alpha is to normalize the class center of the unlabeled data; for stability and fast convergence of training, a suitable scale factor is required for scaling, so that the activation mapping y for unlabeled data^UActivation mapping y similar to annotation data^L(ii) a In fact, the l2 norm inherently provides a reasonable a priori scale for each class center to map the input features f^LTo an output y^LActivation of (2); therefore, the similar scaling is carried out by multiplying the class center of the unlabeled data by the adaptive coefficient alpha, so that the activation of the unlabeled data and the activation of the labeled data have similar distribution, thereby ensuring the stability of training and fast receivingAstringency;

training a pedestrian re-recognition model based on the class center self-adaptive strategy, and converging when the size of the loss function is not fluctuated any more to obtain the pedestrian re-recognition model with high-efficiency recognition capability;

step seven: model compression quantization

Compressing and quantizing the pedestrian re-identification model obtained in the step six by using a deep learning inference optimizer TensrT, wherein the compression comprises transverse combination for combining convolution, bias and activation layers into an independent structure and longitudinal combination for combining layers with the same structure but different weights into a wider layer; the quantization is to map the 32-bit floating point type data of the compressed model into 8-bit integer data, finally reduce the size of the pedestrian re-identification model to one fourth of the original size, and increase the testing speed by 3 to 5 times;

finally obtaining a pedestrian re-identification model for testing;

and (3) a testing stage:

the method comprises the following steps: data pre-processing

In the data preprocessing, the pedestrian video data to be recognized are processed in the same stage as the training stage, namely, the pedestrian detection and the pedestrian tracking are required to be completed so as to obtain data which can be directly input into the pedestrian re-recognition model, and the data are used as an image library for searching pedestrians;

step two: pedestrian search

Pedestrian searching requires finding t pieces of pedestrian image data with the same identity as the query image in an image library by using the query image; firstly, feature extraction is carried out on a query image and images in an image library by using a pedestrian re-recognition model finally obtained in a training stage, and then the cosine distance between the features of the query image and the features of the images in the image library is calculated to obtain a distance matrix and an image pair corresponding relation;

step three: reordering

Reordering the distance matrix by using an extended query algorithm; the distance matrixes in the second step are sequenced to obtain t images with the minimum distance from the query image, the t images and the features of the query image are summed and averaged to serve as the features of a new query image, the cosine distances of the features of the new query image and the features of the images in the image library are calculated to obtain a new distance matrix and the corresponding relation of the image pairs, and finally the distance matrixes are sequenced, wherein the first t images are the t pedestrian image data which are finally searched in the image library and have the same identity as the query image.

Compared with the prior art, the invention has the advantages and effects that:

(1) the invention innovatively provides a class center projection transduction layer to be applied to a class center self-adaptive learning mechanism to replace a full connection layer for supervised classification. By detecting and clustering a large amount of non-labeled cross-domain pedestrian data in a real scene, dynamically estimating class centers after each small batch of non-labeled data are clustered, and adding the class centers as new anchors into a projection matrix of a re-identification model class center projection conversion layer, the learning capacity of a pedestrian re-identification model on the cross-domain data is enhanced. The invention overcomes the defect that the cross-domain data identification performance of a common key personnel search system is sharply reduced, and has strong cross-domain re-identification capability.

(2) In a transverse comparison mode, the method learns the characteristics of richer data domains by using the class center self-adaptive strategy, and has better cross-domain identification capability compared with other classical pedestrian re-identification methods. In a longitudinal comparison mode, the method and the device effectively combine the unmarked data and the marked data for learning instead of only considering one of the unmarked data and the marked data, and improve the utilization efficiency and the value of the pedestrian monitoring data.

Drawings

FIG. 1 is a flow chart of a key person searching system based on a class center adaptive strategy.

Detailed Description

For a better understanding of the present invention, some concepts will be explained.

1. The labeled pedestrian dataset used in the present invention is from Market-1501 dataset published by the university of qinghua, which comprises 1501 pedestrians and 32668 detected rectangular boxes of pedestrians photographed by 6 cameras (of which 5 high-definition cameras and 1 low-definition camera).

Softmax normalized exponential function which "compresses" a K-dimensional vector z containing arbitrary real numbers into another K-dimensional real vector σ (z), each element ranging between (0,1) and all element sums being 1, calculated as:

wherein

As shown in FIG. 1, the method for identifying the cross-domain key personnel search system based on class center self-adaptation of the invention comprises two stages of training and testing, and comprises the following implementation steps:

a training stage:

In the running process of the pedestrian monitoring equipment, the data collection module continuously collects new pedestrian video data collected by different cameras; the pedestrian video data are respectively stored according to different acquisition scenes, and the pedestrian video data acquired in the same scene are stored in an independent database; for pedestrian video data of the same scene, after the number of new data reaches a threshold value, entering a subsequent step;

step two: pedestrian detection

Inputting pedestrian video data to be detected, detecting pedestrians in each frame of image in the pedestrian video data by using a real-time target detection model realized based on a regional proposal network, and judging the positions of the pedestrians in each frame of image to obtain a pedestrian detection frame;

step three: pedestrian tracking

step four: pedestrian image cluster key frame extraction

when extracting key frames, firstly extracting image characteristics of pedestrian image clusters by using a convolutional neural network, then calculating the distance between adjacent video frames according to the image characteristics, and marking the video frames with the distance larger than a threshold value as the key frames;

step five: pedestrian image clustering

when clustering is carried out, firstly, the minimum sample number m belonging to a class and the minimum distance e between samples in the class are set, then, clusters are searched for each data in a distance matrix according to the minimum distance e, and for a certain sample c in the distance matrix, if the number of samples contained in a neighborhood taking c as a center e as a radius is more than m, a cluster taking c as a core object is created; then, adding the sample with the distance less than e from any sample in the cluster into the cluster, and continuously iterating the process until each sample finds the cluster or does not belong to any cluster;

In order to train by using unlabeled data, the invention innovatively provides a class center projection transfer layer to be applied to a class center self-adaptive learning mechanism to replace a full connection layer used for supervised classification before; the center-like projection conversion layer is improved from a full-connection layer, the projection matrix of the layer is consistent with the previous full-connection layer for labeled data, and the projection matrix is formed by simulating anchors by using respective center-like projection layers for unlabeled data;

let's Chi^LRepresenting a labeled data set, χ, containing M classes^URepresenting the unmarked data set, namely the data obtained by the fifth step; firstly, setting a threshold value, wherein the threshold value is smaller than the threshold value, a small batch is set, and in a training turn, constructing data of the small batch

Including annotated data therein

And unlabeled data

p represents the number of data marked in the small batch of data, and q represents the number of data unmarked in the small batch of data;

randomly selected from the annotated data; and then

Randomly selecting r unmarked clusters from unmarked data, and randomly selecting s samples from each cluster to construct, namely q is r.s; it should be noted that, for each small batch of data, the selected r clusters are dynamically changed; thus, this small batch of data B is extracted after being fed into the network, before the class-centric projection transduction layerThe characteristic is expressed as f ═ f^L,f^U]^T∈R^(p+q)·DWhere D is the feature dimension, f^LAnd f^URespectively labeling feature vectors of data and unlabeled data; the projection matrix of the class center projection transduction layer is represented as W ═ W^M,W^r]∈R^(M+r)·(p+q)Where the first M columns represent anchors with labeled data and the remaining r columns represent class center vectors for selected unlabeled data

In a small amount of the data of a lot,

the calculation formula of (a) is as follows:

the self-adaptive coefficient alpha is the average size of the class center of the labeled data cluster, and M is the number of the classes of the labeled data; therefore, the output y ═ W without the bias term can be obtained by the quasi-center projection transduction layer^Tf, then transferring y into a softmax loss layer; the function of the self-adaptive coefficient alpha is to normalize the class center of the unlabeled data; for stability and fast convergence of training, a suitable scale factor is required for scaling, so that the activation mapping y for unlabeled data^UActivation mapping y similar to annotation data^L(ii) a In fact, the l2 norm inherently provides a reasonable a priori scale for each class center to map the input features f^LTo an output y^LActivation of (2); therefore, the similar scale scaling with the labeled data is carried out by multiplying the class center of the unlabeled data by the adaptive coefficient alpha, so that the activation of the unlabeled data and the activation of the labeled data have similar distribution, and the stability and the quick convergence of the training are ensured;

training a pedestrian re-recognition model based on the above class center self-adaptive strategy, and converging when the size of the loss function does not fluctuate any more to obtain the pedestrian re-recognition model with high-efficiency recognition capability;

step seven: model compression quantization

Compressing and quantizing the pedestrian re-identification model obtained in the step six by using a deep learning inference optimizer TensrT, wherein the compression comprises transverse combination for combining convolution, bias and activation layers into an independent structure and longitudinal combination for combining layers with the same structure but different weights into a wider layer; the quantization is to map the data of the 32-bit floating point number type of the compressed model into 8-bit integer data; finally, the size of the pedestrian re-identification model can be reduced to one fourth of the original size, and the testing speed is improved by 3 to 5 times;

finally obtaining a pedestrian re-identification model for testing;

and (3) a testing stage:

the method comprises the following steps: data pre-processing

In the data preprocessing, pedestrian video data to be recognized are processed in the same stage as a training stage, namely pedestrian detection and pedestrian tracking are required to be completed so as to obtain data which can be directly input into a pedestrian re-recognition model, and the data are used as an image library for pedestrian searching;

step two: pedestrian search

step three: reordering

The invention innovatively provides a class center projection transduction layer to be applied to a class center self-adaptive learning mechanism to replace a full connection layer for supervised classification. By detecting and clustering a large amount of non-labeled cross-domain pedestrian data in a real scene, dynamically estimating class centers after each small batch of non-labeled data are clustered, and adding the class centers as new anchors into a projection matrix of a re-identification model class center projection transition layer, the non-labeled data and the labeled data are effectively combined for learning. The method overcomes the defect that the cross-domain data identification performance of a common key personnel search system is sharply reduced, and has strong cross-domain re-identification capability.

The cross-domain re-identification method can be applied to searching key personnel in a real scene, can improve the system performance by fully utilizing a large amount of unlabeled pedestrian data, and has excellent cross-domain re-identification effect.

In a word, the method for cross-domain re-identification of the key personnel search system based on the class center self-adaptive strategy is utilized, so that the key personnel search system has strong generalization and cross-domain re-identification capability while solving the problem of performance reduction of cross-domain pedestrian re-identification.

Portions of the invention not described in detail are well within the skill of the art.

The above examples are provided for the purpose of describing the present invention only and are not intended to limit the scope of the present invention. The scope of the invention is defined by the appended claims. Various equivalent substitutions and modifications can be made without departing from the spirit and principles of the invention, and are intended to be included within the scope of the invention.

Claims

1. A key personnel search system cross-domain identification method based on class center self-adaptation is characterized by comprising two stages of training and testing:

the training phase is implemented as follows:

In the running process of the pedestrian monitoring equipment, the data collection module continuously collects new pedestrian video data collected by different cameras, wherein the pedestrian video data are respectively stored according to different collection scenes, and the pedestrian video data collected in the same scene are stored in an independent database; for pedestrian video data of the same scene, after the number of new data reaches a threshold value, entering the subsequent step;

step two: pedestrian detection

step three: pedestrian tracking

Extracting the features of the pedestrian images in the pedestrian detection frames of each frame of image by using a convolutional neural network as apparent features, matching the pedestrian detection frames of the previous and next frames by using the apparent features of the pedestrian images in the pedestrian detection frames between each frame of image, realizing the tracking of the same pedestrian in the continuous video frames, cutting the matched pedestrian images according to the positions of the pedestrian detection frames, and finally obtaining a pedestrian image cluster formed by one or more continuous video frames of each pedestrian appearing in the input pedestrian video data;

step four: pedestrian image cluster key frame extraction

Extracting a key frame of each pedestrian image cluster by using a key frame extraction algorithm to obtain key frames in the pedestrian image clusters and form a new non-redundant pedestrian image cluster;

step five: pedestrian image clustering

The method comprises the steps that a convolutional neural network is used for constructing a pedestrian re-identification network framework, a class center projection transfer layer is used for replacing a common full-connection layer for supervised classification, the class center projection transfer layer is improved from the full-connection layer, a projection matrix of the layer is consistent with the full-connection layer for labeled data, and the projection matrix is formed by simulating anchors by using respective class centers of the labeled data;

let's Chi^LRepresenting a labeled dataset, χ, containing M classes^URepresenting the unlabeled data set, namely the data obtained by the step five; setting a threshold value, wherein the threshold value is smaller than the threshold value, a small batch is set, and in a training turn, constructing data of the small batch

Including labeled data therein

And unlabeled data

p represents the amount of data marked in the small batch of data, q represents the amount of data not marked in the small batch of data,

randomly selected from the marked data; while

The method is constructed by randomly selecting r unmarked clusters from unmarked data and then randomly selecting s samples from each cluster, namely q is r.s, after the small batch of data B is sent into a network, the characteristic extracted before a class center projection transduction layer is expressed as f, and the output y without bias items is obtained by the class center projection transduction layer^Tf is used for classification, and then y is transmitted into a softmax loss layer;

step seven: model compression quantization

Compressing and quantizing the pedestrian re-identification model obtained in the step six by using a deep learning inference optimizer TensrT, wherein the compression comprises transverse combination for combining convolution, bias and activation layers into an independent structure and longitudinal combination for combining layers with the same structure but different weights into a wider layer; the quantization is to map the data of 32-bit floating point number type of the compressed model into 8-bit integer data, finally reduce the size of the pedestrian re-identification model to one fourth of the original size, and improve the testing speed by 3 to 5 times;

finally obtaining a pedestrian re-identification model for testing;

the test phase is implemented as follows:

the method comprises the following steps: data pre-processing

In the data preprocessing, pedestrian video data to be recognized are processed in the same stage as a training stage, namely pedestrian detection and pedestrian tracking need to be completed, data which can be directly input into a pedestrian re-recognition model are obtained, and the data are used as an image library for pedestrian searching;

step two: pedestrian search

Using the query image to find t pieces of pedestrian image data with the same identity as the query image in the image library, firstly using a pedestrian re-identification model finally obtained in a training stage to extract the features of the query image and the images in the image library, and then calculating the cosine distance between the features of the query image and the features of the images in the image library to obtain a distance matrix and an image pair corresponding relation;

step three: reordering

And reordering the distance matrixes by using an expanded query algorithm, namely firstly sequencing the distance matrixes in the second step to obtain t images with the minimum distance from the query images, then summing and averaging the characteristics of the t images and the query images to be used as the characteristics of a new query image, then calculating the cosine distance between the characteristics of the new query image and the characteristics of the images in the image library to obtain a new distance matrix and an image pair corresponding relation, and finally sequencing the distance matrix, wherein the first t images are t pedestrian image data which are finally found in the image library and have the same identity with the query images.