CN111126249A

CN111126249A - Pedestrian re-identification method and device combining big data and Bayes

Info

Publication number: CN111126249A
Application number: CN201911327696.8A
Authority: CN
Inventors: 李宁; 张斯尧; 罗茜; 王思远; 蒋杰; 张�诚; 李乾; 谢喜林; 黄晋
Original assignee: Shenzhen Jiuling Software Technology Co ltd
Current assignee: Shenzhen Jiuling Software Technology Co ltd
Priority date: 2019-12-20
Filing date: 2019-12-20
Publication date: 2020-05-08

Abstract

The invention discloses a pedestrian re-identification method and device combining big data and Bayes, wherein the method comprises the following steps: carrying out distributed training on a pedestrian re-identification system model by utilizing a pedestrian image database; re-identifying and re-ordering the query object and a plurality of candidate objects in the ranking list based on Bayesian query expansion; performing PTGAN processing on the re-ordered query object and the candidate object; inputting the query object and the candidate objects subjected to the PTGAN processing into a trained Bayesian model, calculating the real matching probability of each candidate object according to the image distance in the training data, and reordering the candidate objects; adjusting parameter values of target parameters of the pedestrian re-identification system model according to the reasoning clue model; and inputting the image to be recognized into the trained pedestrian re-recognition system model, and searching out the pedestrian image with the highest similarity. The invention solves the problems of high cross-camera retrieval difficulty and low re-identification accuracy of the pedestrian re-identification method in the prior art.

Description

Pedestrian re-identification method and device combining big data and Bayes

Technical Field

The invention relates to the technical field of computer vision and smart cities, in particular to a pedestrian re-identification method and device combining big data and Bayes, a terminal device and a computer readable medium.

Background

With the continuous development of artificial intelligence, computer vision and hardware technology, video image processing technology has been widely applied to intelligent city systems.

Pedestrian Re-identification (Person Re-identification) is also called pedestrian Re-identification, abbreviated Re-ID. The method is a technology for judging whether a specific pedestrian exists in an image or a video sequence by utilizing a computer vision technology. Is widely considered as a sub-problem for image retrieval. Given a monitored pedestrian image, the pedestrian image is retrieved across the device. Due to the difference between different camera devices and the characteristic of rigidity and flexibility of pedestrians, the appearance is easily affected by wearing, size, shielding, posture, visual angle and the like, so that the pedestrian re-identification becomes a hot topic which has research value and is very challenging in the field of computer vision.

Currently, although the detection capability of pedestrian re-identification has been significantly improved, many challenging problems have not been completely solved in practical situations: such as in complex scenes, differences in light, changes in perspective and pose, a large number of pedestrians in a surveillance camera network, etc. Under the conditions, the cross-camera retrieval is difficult generally, meanwhile, the marking work in the early stage of video image sample training is expensive, a large amount of manpower is consumed, the existing algorithm cannot achieve the expected effect generally, and the re-recognition accuracy is low.

Disclosure of Invention

In view of the above, the present invention provides a pedestrian re-identification method, apparatus, terminal device and computer readable medium combining big data and bayesian, which can improve the accuracy of pedestrian re-identification under different cameras, and solve the problems of large cross-camera search difficulty and low re-identification accuracy of the pedestrian re-identification method in the prior art.

The first aspect of the embodiment of the invention provides a pedestrian re-identification method combining big data and Bayes, which comprises the following steps:

performing distributed training on a pedestrian re-recognition system model by using a pedestrian image database to obtain the trained pedestrian re-recognition system model, wherein the pedestrian image database comprises a plurality of matching image groups, and each matching image group comprises at least two matching images;

inputting the query object into the pedestrian re-identification system model to obtain a ranking list of a plurality of candidate objects;

re-identifying and re-ordering the query object and a plurality of candidate objects in the ranking list based on Bayesian query expansion;

performing PTGAN processing on the re-ordered query objects and candidate objects to realize the migration of a background difference area on the premise of keeping the foreground of the pedestrian unchanged;

inputting the query object and the candidate objects subjected to the PTGAN processing into a trained Bayesian model, calculating the real matching probability of each candidate object according to the image distance in training data, and reordering the candidate objects;

carrying out multi-dimensional feature extraction on the query object subjected to PTGAN processing and the candidate object subjected to reordering and determining a reasoning clue model;

adjusting the reasoning cue model by using a reasoning algorithm and determining a final reasoning cue model;

adjusting the parameter value of the target parameter of the pedestrian re-identification system model according to the reasoning clue model;

and inputting the image to be recognized into the trained pedestrian re-recognition system model, and searching out the pedestrian image with the highest similarity.

Further, the method for performing distributed training on the pedestrian re-recognition system model by using the pedestrian image database to obtain the trained pedestrian re-recognition system model comprises the following steps:

iteratively training the pedestrian re-identification system model by increasing batch size using a plurality of processors;

performing iterative training on the pedestrian re-recognition system model according to a linear scaling and preheating strategy algorithm;

applying adaptation rate scaling (LARS) uses a different learning rate for each layer of the network in the pedestrian re-identification system model.

Further, performing bayesian query expansion-based re-identification reordering on the query object and the plurality of candidate objects in the ranking list, comprising:

training a Bayesian model by utilizing a pedestrian image database to obtain a trained Bayesian model;

predicting the real matching probability of each candidate object through the trained Bayesian model according to the distance between the query object and the plurality of candidate object images;

and performing query expansion according to the real matching probability of each candidate object, and generating a new ranking list through the query expansion.

Further, the multi-dimensional feature extraction and inference cue model determination for the query object after PTGAN processing and the candidate object after reordering includes:

extracting the appearance characteristics of the pedestrian;

extracting facial features of the pedestrian;

and constructing a positioning branch Markov chain according to the time and the positioning characteristics of the pedestrian at different video heads, and training a reasoning clue model according to the positioning branch Markov chain.

A second aspect of the embodiments of the present invention provides a pedestrian re-identification apparatus combining big data and bayes, which is characterized by including:

the distributed training module is used for carrying out distributed training on a pedestrian re-identification system model by utilizing a pedestrian image database to obtain the trained pedestrian re-identification system model, wherein the pedestrian image database comprises a plurality of matching image groups, and each matching image group comprises at least two matching images;

the ranking list acquisition module is used for inputting the query object into the pedestrian re-identification system model to obtain a ranking list of a plurality of candidate objects;

the re-identification module is used for re-identifying and re-ordering the query object and a plurality of candidate objects in the ranking list based on Bayesian query expansion;

the PTGAN processing module is used for carrying out PTGAN processing on the reordered query objects and candidate objects to realize the migration of a background difference area on the premise of keeping the foreground of the pedestrian unchanged;

the training module is used for inputting the query object and the candidate objects subjected to the PTGAN processing into a trained Bayesian model, calculating the real matching probability of each candidate object according to the image distance in the training data, and reordering the candidate objects;

the reasoning clue module is used for carrying out multi-dimensional feature extraction on the query object subjected to the PTGAN processing and the candidate object subjected to the reordering and determining a reasoning clue model;

the reasoning clue adjusting module is used for adjusting the reasoning clue model by using a reasoning algorithm and determining the final reasoning clue model;

the model adjusting module is used for adjusting the parameter value of the target parameter of the pedestrian re-identification system model according to the reasoning clue model;

and the recognition module is used for searching out the pedestrian image with the highest similarity by inputting the image to be recognized into the trained pedestrian re-recognition system model.

Further, the distributed training module comprises:

a processor addition module for iteratively training the pedestrian re-identification system model by increasing a batch size using a plurality of processors;

the batch algorithm module is used for carrying out iterative training on the pedestrian re-identification system model according to a linear scaling and preheating strategy algorithm;

a learning rate adjustment module to apply adaptation rate scaling (LARS) to use a different learning rate for each layer of the network in the pedestrian re-identification system model.

Further, the re-identification module comprises:

the Bayes training module is used for training a Bayes model by utilizing a pedestrian image database to obtain a trained Bayes model;

the prediction module is used for predicting the real matching probability of each candidate object through the trained Bayesian model according to the distance between the query object and the plurality of candidate object images;

and the query expansion module is used for performing query expansion according to the real matching probability of each candidate object and generating a new ranking list through the query expansion.

Further, the inference cue module comprises:

the appearance extraction module is used for extracting appearance characteristics of pedestrians;

the face extraction module is used for extracting facial features of pedestrians;

and the positioning branch module is used for constructing a positioning branch Markov chain according to the time and the positioning characteristics of the pedestrian at different video heads and training a reasoning clue model according to the positioning branch Markov chain.

A third aspect of the embodiments of the present invention provides a terminal device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the steps of the pedestrian re-identification method combining big data and bayes when executing the computer program.

A fourth aspect of the embodiments of the present invention provides a computer-readable medium, where a computer program is stored, and when the computer program is processed and executed, the steps of the pedestrian re-identification method combining big data and bayes are implemented.

In the embodiment of the invention, the model training speed is greatly improved by carrying out distributed training on the model of the pedestrian re-recognition system, and meanwhile, the accuracy of pedestrian re-recognition under complex conditions is improved and the robustness of the system is improved by re-recognition reordering based on Bayesian query expansion and PTGAN processing. The pedestrian re-identification method solves the problems that the cross-camera retrieval difficulty is high and the re-identification accuracy rate is low in the pedestrian re-identification method in the prior art.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.

FIG. 1 is a flow chart of a pedestrian re-identification method combining big data and Bayes according to an embodiment of the present invention;

FIG. 2 is a comparison graph of real-time conversion effects of different pedestrian re-identification methods provided by the embodiment of the invention;

fig. 3 is a schematic structural diagram of a pedestrian re-identification device combining big data and bayes according to an embodiment of the present invention;

FIG. 4 is a detailed structure diagram of a distributed training module according to an embodiment of the present invention;

FIG. 5 is a detailed block diagram of a re-recognition module provided in an embodiment of the present invention;

FIG. 6 is a diagram of a detailed structure of an inference hint module provided by an embodiment of the present invention;

fig. 7 is a schematic diagram of a terminal device according to an embodiment of the present invention.

Detailed Description

In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system structures, techniques, etc. in order to provide a thorough understanding of the embodiments of the invention. It will be apparent, however, to one skilled in the art that the present invention may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present invention with unnecessary detail.

In order to explain the technical means of the present invention, the following description will be given by way of specific examples.

Referring to fig. 1, fig. 1 is a flowchart of a pedestrian re-identification method combining big data and bayes according to an embodiment of the present invention. As shown in fig. 1, the pedestrian re-identification method combining big data and bayes of the present embodiment includes the following steps:

step S102, performing distributed training on the pedestrian re-recognition system model by using a pedestrian image database to obtain the trained pedestrian re-recognition system model, wherein the pedestrian image database comprises a plurality of matching image groups, and each matching image group comprises at least two matching images.

step 1, iteratively training a pedestrian re-recognition system model by increasing batch size by using a plurality of processors.

By means of an iterative algorithm, the algorithm is expanded and used to more processors, and more pedestrian image data are loaded in each iteration, so that the total training time is reduced;

generally, larger batches will, to a certain extent, speed up a single GPU. The reason is that the low-level matrix computation library will be more efficient. For training the Res-Net 50 model using ImageNet, the optimal batch size for each GPU is 512. If it is desired to use many GPUs and have each GPU active, a larger batch size is required. For example, if there are 16 GPUs, then the batch size should be set to 16 × 512 — 8192. Ideally, if the total number of accesses is fixed, and the batch size is linearly increased as the number of processors increases, the number of modified SGD (random gradient descent) iterations decreases linearly, the time cost per iteration remains the same, and thus the total time decreases linearly with the number of processors.

A specific modified Stochastic Gradient Descent (SGD) iterative algorithm is as follows: let w represent the weight of DNN, X represent the training data, n be the number of samples in X, and Y represent the label of training data X. Let us order x_iSample of X, (X)_iW) is x_iAnd its label y_i(i ∈ {1, 2...., n)) calculated losses. The present invention uses a loss function like a cross-entropy function. The goal of DNN training is to minimize the loss function in equation (1). The formula is as follows:

in the t-th iteration, the algorithm of the present invention uses forward and backward propagation to find the gradient of the loss function versus the weight. This gradient is then used to update the weights, with equation (2) for updating the weights according to the gradient as follows:

where η is the learning rate, the algorithm of the present invention sets the batch size at the t-th iteration to B_tAnd B is_tThe size of (a) is b. The weights may then be updated based on equation (3) below:

this method is called small batch random gradient descent. To simplify the expression, we can say that the update rule in equation (4) represents that we use weightsGradient of (2)

Update the weight w_tIs w_t+1。

By using the method, iteration is carried out, and processors are used as much as possible, so that the training time can be reduced linearly and greatly.

And 2, performing iterative training on the pedestrian re-recognition system model according to a linear scaling and preheating strategy algorithm.

When training large batches, it is necessary to ensure that, with the same number of time periods (epochs) being run, a test accuracy comparable to that of small batches is achieved. Here we fix the number of time periods (epochs) because: statistically, a time period (epoch) means that the algorithm will touch the entire data set once; and, computationally, the number of fixed time periods (epochs) means the number of fixed floating-point operations. The method for training a large batch comprises two technologies:

(1) linear scaling increasing the batch from B to kB should also increase the learning rate from η to k η.

(2) Warm-up strategy if a larger learning rate is used (η), it should start with a small value of η and then increase it to a large value of η for the first few time periods (epoch).

With the linear scaling and warm-up strategy, relatively large batch data images can be used to a certain extent.

And 3, applying adaptive scaling (LARS) to use different learning rates for each layer of the network in the pedestrian re-identification system model.

And correspondingly training the large-batch training levels by applying adaptive scaling (LARS) to obtain a final quick training model.

In order to improve the accuracy of mass training, the method uses a new update Learning Rate (LR) rule. The stand-alone case must be considered here, the use of which

The weights are updated. Using the data parallel approach, multiple machine versions can be handled in the same way.

Each layer has its own weight w and gradient

The standard SGD algorithm uses the same LR (η) for all layers, however, from routine experimentation, it can be observed that different layers may require different LR. for the reasons | | w | |2 and

the ratio between layers is very different.

The basic LR rule is defined in equation (1). l is a scaling factor, where l is set to 0.001 in AlexNet and ResNet training.y is a user adjustment parameter.a good y is usually between [1, 50 ].

A local LR for each of the learnable parameters is obtained,

the true LR for each layer was obtained at η ═ γ × α;

by passing

Updating the gradient;

by passing

Updating an acceleration item a;

the weights are updated with w-a.

Using this method preheat (warmup), the same accuracy as the benchmark can be achieved with SGD with large batches. To extend to larger batch sizes (e.g., 32k), the Local Response Normalization (LRN) needs to be changed to Batch Normalization (BN). The inventive method adds BN after each convolutional layer. LARS can help the ResNet-50 maintain high test accuracy. The current methods (linear scaling and pre-heating) are much less accurate for batch sizes of 16k and 32 k.

And step S104, inputting the query object into the pedestrian re-identification system model to obtain a ranking list of a plurality of candidate objects.

The pedestrian re-identification system model can be any existing MA-CNN or RNN and other pedestrian re-identification network models.

And step S106, carrying out re-identification and re-ordering on the query object and the candidate objects in the ranking list based on Bayesian query expansion.

step one, training a Bayesian model by utilizing a pedestrian image database to obtain a trained Bayesian model;

predicting the real matching probability of each candidate object through the trained Bayes model according to the distance between the query object and the plurality of candidate object images;

and step three, performing query expansion according to the real matching probability of each candidate object, and generating a new ranking list through the query expansion.

The BQE generates a new query using information from the initial gallery rank list for re-retrieving gallery images. Specifically, the data set is divided into three parts, query, gallery and training data. In the off-line process, Bayesian posterior estimation training is firstly carried out on training data. Given the distance metric, the bayesian model can predict the true match probability of the candidate object. In online retrieval, an initial rank list may be obtained by computing the similarity of the query to the gallery images. And calculating the real matching probability of each candidate object by using a Bayesian model according to the sorting table. Then, the image features in the initial rank list with high probability are merged with the original query, thereby initiating a new query to perform another round of retrieval. After a new round of retrieval, the search engine,the query expansion process can be reused to obtain an iterative algorithm. Formally, each image is represented by a d-dimensional feature vector, expressed as x ∈ R^d. Order to

In order to be a training set, the training set,

is a gallery set. Then, training images are recorded

Query image q and gallery images

Are respectively represented as

l^qAnd

is provided with

For query image q and gallery images

The distance between them. The initial rank list is then expressed as

Wherein

Thus, the initial rank list may be reordered based on an offline trained bayesian model.

And step S108, performing PTGAN processing on the reordered query objects and candidate objects, and realizing the migration of the background difference area on the premise of keeping the foreground of the pedestrian unchanged.

Ptgan (person Transfer gan) is a generative countermeasure network aimed at Re-identifying Re-ID problems. In the invention, the biggest characteristic of the PTGAN is to realize the migration of the difference of the background area on the premise of ensuring the unchanged foreground of the pedestrian as much as possible. First, the loss function of the PTGAN network consists of two parts:

wherein L is_StyleRepresenting the loss of the generated style, or domain difference loss, is whether the generated image resembles a new dataset style. L is_IDThe loss of identity representing the generated image is to verify that the generated image is the same person as the original image. λ there₁Is a weight that balances the two losses. These two losses are defined as follows:

firstly, the Loss function (Loss) of the PTGAN is divided into two parts; the first part is L_StyleThe concrete formula is as follows:

wherein the content of the first and second substances,

represents a loss of standard antagonism, L_CycRepresenting a loss of periodic consistency, A, B is a two frame GAN processed image, let G be the image a to B style mapping function,

for the style mapping function of B to a, λ 2 is the weight of segmentation loss and identity loss.

The above parts are all normal losses of PTGAN in order to ensure that the difference area (domain) of the generated picture and the desired data set is the same.

Secondly, in order to ensure that the foreground is not changed in the process of image migration, PSP is firstly usedNet carries out foreground segmentation on the video image to obtain a mask (mask layer) area. Generally, conventional generation of countermeasure networks such as CycleGAN is not used for Re-ID tasks, and therefore there is no need to ensure that the identity information of the foreground object is unchanged, with the result that the foreground may be of poor quality such as blurred, and worse, the appearance of pedestrians may change. To solve this problem, the present invention proposes L_IDLoss, foreground extracted by PSPNet, this foreground is a mask, and the final loss of identity information is:

wherein, M (a) and M (b) are two divided foreground mask layers, and the identity information Loss function (Loss) can restrain the foreground of the pedestrian to keep unchanged as much as possible in the migration process.

Wherein G (a) is a pedestrian image transferred in the image a,

is the pedestrian image that is shifted in the image b,

is a distribution of the data of a,

for the data distribution of B, M (a) and M (B) are two divided mask regions.

Fig. 2 shows a comparison graph of real-time conversion effects of different pedestrian re-identification methods, wherein the first row of pictures is pictures to be converted, and the fourth row shows the result of PTGAN conversion, and it can be seen that the image quality generated by PTGAN is higher than that of the third row of pictures using Cycle-GAN conversion results. For example, the appearance of the person remains unchanged and the style is effectively transferred to another camera. Shadows, road markings and backgrounds are automatically generated, similar to the effect of another camera. Meanwhile, PTGAN can handle the noise segmentation result generated by PSPNet well. The algorithm provided by the invention can intuitively ensure the identity information of the pedestrian compared with the traditional annular generation countermeasure network (cycleGAN).

Step S110, inputting the query object and the candidate object after the PTGAN processing into a trained Bayesian model, calculating the real matching probability of each candidate object according to the image distance in the training data, and reordering the candidate objects.

In essence, the Bayesian model represents a match score distribution of true matches and false matches. The model is created on a training set and applied during testing to estimate the probability that the top image really matches the query.

For each image x in the ranked list returned from the pedestrian re-identification system, there is a distance calculated from the learning metric. Since images with smaller distance to the query will be listed at the top, it is necessary to know whether the image list can be reordered using the top image to improve performance. It is important to select candidate images because a false match will have an adverse effect on performance. Given the distance P (x | d (x, q)) between the query and the image, the present invention distinguishes candidates by distance, since images of the same or dissimilar features typically have significantly different distance ranges. The invention adopts a Bayesian model to estimate the probability of the image correlation in the sorting table.

In particular, for query q and gallery image x, based on the distance between the two images, the probability that the two images belong to the same feature is typically computed, i.e.

According to Bayes' theorem, the probability can be rewritten as follows:

wherein

Can be calculated by the following formula,

the present invention utilizes training data to estimate probabilities. Can be used directly

And

to calculate

And

an approximation of (d). To calculate

And

it is necessary to calculate the distance between each image in the training data and use the distance range instead of the exact value of the distance. The invention extends the distance

Value) into M intervals and then calculating the number of candidates in each interval. Suppose that

In the [0.2,0.3 ]]Within this interval, then

May be calculated by dividing the candidate number by the frequency of the red bars in the interval.

The calculation method of (2) is similar. The number of intervals M is selected according to the size of the data set. In practice, if the distance of the test stageAbove the upper limit (or below the lower limit) of the training phase, the result of the upper (or lower) limit is used.

For query expansion, a new query will be raised to reorder the candidates. Only K high probability candidates are merged into a new query, where K is less than or equal to the number of true matches and K < n. The value of K is adjusted according to actual conditions. The strategy of feature pooling is diverse.

There are two simple strategies for query expansion: average Query Expansion (AQE) and Maximum Query Expansion (MQE). For both methods, the features of the query image and the top candidate are fused using the average pool and the maximum pool, respectively. For AQE, the extended query is calculated as:

a disadvantage of these strategies is that their effectiveness depends to a large extent on the quality of the initial sorting table and the value of the parameter k. When the initial ranking table is not satisfied or the value of k is large, a new query will be constructed using false matches, which will affect the accuracy of the query.

To overcome this deficiency, the present invention assigns different weights to each candidate in feature pooling. Then, an extended probe q for the initial query q is computed by combining the first K images and the query q into one with the probability_new. Here, the invention simply uses an average pool with weights, where the weights are probabilities. The formula is as follows:

finally, the distance is calculated using this new query and the initial ranking list is rearranged. The result is then subjected to more iterations. Typically, the expanded query will produce a better ranked list, and thus a better query may be produced. The present invention may repeatedly perform the process of generating a ranked list, a feature pool, and a query expansion. By repeating BQE, the effect will be enhanced. T is expressed as the number of iterations.

Assume that the sizes of the training set and the gallery set are M and N, respectively. Complexity for Bayesian model

And (4) performing off-line calculation. For the query expansion process, probabilities need to be computed and new queries constructed. The time complexity of generating a new query is

Where K is the number of pooled images. Since the parameter K is less than the true match number and K ≦ N, the complexity may be limited to

Then using the complexity

The pair-wise distance of the trust image is calculated. Results are obtained, for a query, with a computational complexity of

And step S112, performing multi-dimensional feature extraction on the query object subjected to the PTGAN processing and the candidate object subjected to the reordering, and determining a reasoning clue model.

The present invention uses appearance, face and possible destination cues, with features of each timestamp being extracted individually for all detections across cameras.

Appearance-based attributes are first extracted from human detection, which capture the traits and characteristics of an individual in the form of appearance. Common to the image representations is the Convolutional Neural Network (CNN). The present invention uses an AlexNet model pre-trained on ImageNet as an extractor of appearance characteristics. This is done by removing the top output layer and using the activation of the last fully connected layer as a feature (length 4096). The AlexNet architecture includes five convolutional layers, three fully connected layers, and three largest pool layers immediately following the first, second, and fifth convolutional layers. The first convolution layer has 96 filters with size of 11 × 11The second layer has 256 filters of size 5 x 5, the third, fourth and fifth layers are connected to each other without any interference pool and have 384/384 and 256 filters of size 3 x 3, respectively. Fully connected layer L learning nonlinear function

Wherein

W and b are implicit observations of the input data Xi, each with its own weight and offset, and f is a corrective linear unit that activates the hidden layer. Based on the steps, the pedestrian in the continuous frame video images of each time stamp is subjected to appearance feature extraction.

Secondly, facial features are extracted, and face biometric identification is an established biometric identification technology for identity identification and verification. The face morphology can be used for re-recognition because it is essentially a non-contact biometric and can be extracted remotely. The invention extracts facial features from the facial bounding box using a VGG-16 model pre-trained on ImageNet. This is done by removing the top output layer and using the activation of the last fully connected layer as a facial feature (length 4096). VGG-16 is a convolutional neural network, the structure of which is composed of 13 convolutional layers and 3 fully-connected layers, and the filter size is 3 x 3. The pool will be applied between convolution layers with a 2 x 2 pixel window, with a step of 2. The average subtraction of the training set is used as a pre-processing step.

At the same time, the present invention describes a position constraint that is linear in nature and predicts the most likely path inside the camera and between passing cameras. For re-identification and tracking in multiple cameras, knowledge about possible destinations is treated as a priori judgment that someone is present in another camera field of view. Typically, the transition probability distribution is modeled by learning repetitive patterns that occur in the camera network. A person exiting a camera view from a particular grid space is likely to enter another camera view from another particular grid space. The invention models the state transition probability distribution as a Markov chain, each camera view is divided into N states, and the total number of the states N is N multiplied by k assuming that k cameras exist. A Markov chain is described as an n x n transition probability matrix p, with each entry in the interval 0,1 and the sum of the entries for each row adding up to 1.

Thus, using the Markov property, state S is transformed_iAnd S_jThe probability distribution of transitions between is estimated as:

and after the multi-scale feature extraction is carried out, training a reasoning clue model.

And step S114, adjusting the reasoning clue model by using a reasoning algorithm and determining the final reasoning clue model.

In each time step, the problem of re-identification can be represented by a correlation matrix, where each row represents a previously seen entity and the column contains the currently active entity. The task of best associating each row with a column can be expressed as a linear programming problem, based on the characteristics or attributes of the related entities, as follows:

s.t W∈[0，1]，W1＝1，1^TW＝1

where p is the correlation matrix or probability matrix for storing the matching probabilities of the associated entities and w is the weight matrix to be optimized. Fig. 3 describes how the proposed inference algorithm works on the relevance matrix P. The match probability in the correlation matrix is the cosine distance of each mid-level attribute and the facial feature calculated separately using the pre-trained Alexnet and VGG-16 models, or the location score, i.e., the transition probability model of the possible movement patterns between entities.

The effect of constraint w1 ═ 1 is to normalize the match probabilities between columns and force them to sum to 1 for each previous entity. From the expression of this constraint, it is clear that there is only one maximum for each prior entity's associated probability set. This means that each previous entity can be associated with at most one current entity. Thus, the values of the selection weight matrix w are essentially reduced by assigning a value of 1 to the best association, and therefore, computing the best possible association is equivalent to a greedy approach to selecting the maximum match probability in order. And finally, determining a final reasoning clue model by combining the constraint conditions of each feature extraction.

The overall objective function can be expressed as:

where Θ represents a parameter in the inference model. L is₁，L₂And L₃Representing face, appearance, classification loss in localization branches, respectively. Lambda [ alpha ]₁，λ₂，λ₃Representing the weight of the corresponding penalty.

And step S116, adjusting the parameter value of the target parameter of the pedestrian re-identification system model according to the reasoning clue model.

And step S118, inputting the image to be recognized into the trained pedestrian re-recognition system model, and searching out the pedestrian image with the highest similarity.

Referring to fig. 3, fig. 3 is a block diagram of a pedestrian re-identification apparatus combining big data and bayes according to an embodiment of the present invention. As shown in fig. 3, the big data and bayesian combined pedestrian re-recognition 20 of the present embodiment includes a rank distributed training module 202, a rank list obtaining module 204, a re-recognition module 206, a PTGAN processing module 208, a training module 210, an inference cue module 212, an inference cue adjustment module 214, a model adjustment module 216, and a recognition module 218. The distributed training module 202, the ranking list obtaining module 204, the re-recognition module 206, the PTGAN processing module 208, the training module 210, the inference cue module 212, the inference cue adjustment module 214, the model adjustment module 216, and the recognition module 218 are respectively configured to perform the specific methods in S102, S104, S106, S108, S110, S112, S114, S116, and S118 in fig. 1, and the details can be referred to the related introduction of fig. 1 and are only briefly described here:

the distributed training module 202 is configured to perform distributed training on a pedestrian re-identification system model by using a pedestrian image database to obtain a trained pedestrian re-identification system model, where the pedestrian image database includes a plurality of matching image groups, and each matching image group includes at least two matching images;

the ranking list acquisition module 204 is used for inputting the query object into the pedestrian re-identification system model to obtain a ranking list of a plurality of candidate objects;

a re-identification module 206, configured to perform re-identification and re-ordering on the query object and the candidate objects in the ranking list based on bayesian query expansion;

the PTGAN processing module 208 is configured to perform PTGAN processing on the reordered query objects and candidate objects, so as to implement migration of a background difference region on the premise that a pedestrian foreground is not changed;

the training module 210 is configured to input the query object and the candidate objects subjected to the PTGAN processing into a trained bayesian model, calculate a true matching probability of each candidate object according to an image distance in training data, and reorder the candidate objects;

the reasoning clue module 212 is used for performing multi-dimensional feature extraction on the query object subjected to the PTGAN processing and the candidate object subjected to the reordering and determining a reasoning clue model;

a reasoning thread adjusting module 214, configured to adjust the reasoning thread model using a reasoning algorithm and determine a final reasoning thread model;

the model adjusting module 216 is configured to adjust a parameter value of a target parameter of the pedestrian re-identification system model according to the inference cue model;

and the recognition module 218 is used for searching out the pedestrian image with the highest similarity by inputting the image to be recognized into the trained pedestrian re-recognition system model.

Further, referring to fig. 4, the distributed training module 202 includes:

a processor adding module 2021 for iteratively training the pedestrian re-recognition system model by increasing a batch size using a plurality of processors;

the batch algorithm module 2022 is configured to perform iterative training on the pedestrian re-recognition system model according to a linear scaling and preheating strategy algorithm;

a learning rate adjustment module 2023 to apply an adaptation rate scaling (LARS) to use a different learning rate for each layer of the network in the pedestrian re-identification system model.

Further, referring to fig. 5, the re-identification module 206 includes:

a bayesian training module 2061, configured to train a bayesian model using the pedestrian image database to obtain the trained bayesian model;

a prediction module 2062, configured to predict, according to distances between the query object and the multiple candidate object images, a true matching probability of each candidate object through the trained bayesian model;

a query expansion module 2063, configured to perform query expansion according to the true matching probability of each candidate object, and generate a new ranking list through the query expansion.

Further, referring to fig. 6, the inference cue module 212 includes:

the appearance extraction module 2121 is used for extracting appearance characteristics of pedestrians;

a face extraction module 2122 for extracting facial features of the pedestrian;

and the positioning branch module 2123 is used for constructing a positioning branch Markov chain according to the time and the positioning characteristics of the pedestrian at different video heads and training a reasoning clue model according to the positioning branch Markov chain.

Fig. 7 is a schematic diagram of a terminal device according to an embodiment of the present invention. As shown in fig. 7, the terminal device 10 of this embodiment includes: a processor 100, a memory 101 and a computer program 102 stored in said memory 101 and executable on said processor 100, such as a program for pedestrian re-identification with big data and bayes combined. The processor 100, when executing the computer program 102, implements the steps in the above-described method embodiments, for example, the steps of S102, S104, S106, S108, S110, S112, S114, S116, S118 shown in fig. 1. Alternatively, the processor 100, when executing the computer program 102, implements the functions of the modules/units in the apparatus embodiments described above, such as the functions of the distributed training module 202, the ranking list obtaining module 204, the re-recognition module 206, the PTGAN processing module 208, the training module 210, the inference thread module 212, the inference thread adjusting module 214, the model adjusting module 216, and the recognition module 218 shown in fig. 7.

Illustratively, the computer program 102 may be partitioned into one or more modules/units that are stored in the memory 101 and executed by the processor 100 to implement the present invention. The one or more modules/units may be a series of computer program instruction segments capable of performing specific functions, which are used to describe the execution process of the computer program 102 in the terminal device 10. For example, rank distributed training module 202, rank list acquisition module 204, re-recognition module 206, PTGAN processing module 208, training module 210, inference cue module 212, inference cue adjustment module 214, model adjustment module 216, and recognition module 218. (modules in the virtual device), the specific functions of each module are as follows:

The terminal device 10 may be a computing device such as a desktop computer, a notebook, a palm computer, and a cloud server. Terminal device 10 may include, but is not limited to, a processor 100, a memory 101. Those skilled in the art will appreciate that fig. 7 is merely an example of a terminal device 10 and does not constitute a limitation of terminal device 10 and may include more or fewer components than shown, or some components may be combined, or different components, e.g., the terminal device may also include input-output devices, network access devices, buses, etc.

The Processor 100 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory 101 may be an internal storage unit of the terminal device 10, such as a hard disk or a memory of the terminal device 10. The memory 101 may also be an external storage device of the terminal device 10, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, which are provided on the terminal device 10. Further, the memory 101 may also include both an internal storage unit of the terminal device 10 and an external storage device. The memory 101 is used for storing the computer program and other programs and data required by the terminal device 10. The memory 101 may also be used to temporarily store data that has been output or is to be output.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-mentioned functions. Each functional unit and module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working processes of the units and modules in the system may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

In the embodiments provided in the present invention, it should be understood that the disclosed apparatus/terminal device and method may be implemented in other ways. For example, the above-described embodiments of the apparatus/terminal device are merely illustrative, and for example, the division of the modules or units is only one logical division, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated modules/units, if implemented in the form of software functional units and sold or used as separate products, may be stored in a computer readable storage medium. Based on such understanding, all or part of the flow of the method according to the embodiments of the present invention may also be implemented by a computer program, which may be stored in a computer-readable storage medium, and when the computer program is executed by a processor, the steps of the method embodiments may be implemented. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like. It should be noted that the computer readable medium may contain content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media does not include electrical carrier signals and telecommunications signals as is required by legislation and patent practice.

The above-mentioned embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present invention, and are intended to be included within the scope of the present invention.

Claims

1. A pedestrian re-identification method combining big data and Bayes is characterized by comprising the following steps:

2. The big data and Bayes combined pedestrian re-identification method according to claim 1, wherein the step of performing distributed training on a pedestrian re-identification system model by using a pedestrian image database to obtain the trained pedestrian re-identification system model comprises:

3. The big data and Bayes combined pedestrian re-identification method according to claim 1, wherein re-identification re-ranking based on Bayesian query expansion is performed on the query object and the plurality of candidate objects in the ranking list, and comprises:

4. The big data and Bayes combined pedestrian re-identification method according to claim 3, wherein the performing multidimensional feature extraction on the query object subjected to the PTGAN processing and the candidate object subjected to the reordering and determining the inference cue model comprises:

extracting the appearance characteristics of the pedestrian;

extracting facial features of the pedestrian;

5. A pedestrian re-identification device combining big data and Bayes is characterized by comprising the following components:

6. The big data and Bayes combined pedestrian re-recognition apparatus as recited in claim 5, wherein the distributed training module comprises:

7. The big data and Bayes combined pedestrian re-identification device according to claim 5, wherein the re-identification module comprises:

8. The big data and Bayesian combined pedestrian re-identification device according to claim 5, wherein the inference clue module comprises:

9. A terminal device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the method according to any of claims 1-4 when executing the computer program.

10. A computer-readable medium, in which a computer program is stored which, when being processed and executed, carries out the steps of the method according to any one of claims 1 to 4.