CN103942563A

CN103942563A - Multi-mode pedestrian re-identification technology

Info

Publication number: CN103942563A
Application number: CN201410125981.2A
Authority: CN
Inventors: 赵志诚; 刘凯; 苏菲; 赵衍运; 庄伯金
Original assignee: Beijing University of Posts and Telecommunications
Current assignee: Beijing University of Posts and Telecommunications
Priority date: 2014-03-31
Filing date: 2014-03-31
Publication date: 2014-07-23

Abstract

The present application discloses a multi-modal pedestrian re-identification technology, which includes the following steps: Step 1, intercepting the foreground image containing the target from the first image and the second image respectively captured by the first camera and the second camera, wherein, The second image corresponds to a known target; Step 2, extract color features and texture features respectively from the intercepted foreground image, and cascade the color features and texture features to form image features; Step 3, input the image features To the Hash projection model, calculate the similarity of the target in the first image and the second image; step 4, if the calculated similarity is greater than a predetermined threshold, then determine the target of the first image as the known object corresponding to the second image Target.

Description

A kind of multi-modal pedestrian is identification technique again

Technical field

The invention belongs to image model identification field, relate in particular to pedestrian based on anchor node and multi-modal Hash projection identification technique again.

Background technology

Along with the appearance of computer vision technique and increasing rapidly of computer computation ability, Novel monitoring video system has obtained swift and violent development.Simultaneously along with the variation of video surveillance applications, when monitoring on a large scale scene, single camera is difficult to meet monitoring requirement because visual field is limited, therefore, use a plurality of camera supervised scenes on a large scale to become the important development trend of video monitoring.Between the visual field of a plurality of video cameras, exist overlapping, can usage level camera calibration and the space time information of combining target.But overlapping when not existing between the visual field of a plurality of video cameras, be that pedestrian moves while there is " territory, time blind area " and " caecitas spatialis region " in camera field of view, for the continuity that guarantees that pedestrian follows the tracks of, need to carry out identity consistency checking to the target in different visual fields.

In video monitoring system on a large scale, under different cameras visual field, the consistency checking of the consistency checking of different time one skilled in the art identity or same camera field of view different time one skilled in the art identity becomes needs the major issue that solves.We are called pedestrian in monitor video identification problem again by the problems referred to above.Current pedestrian again identification technique, using pedestrian's clothing outward appearance as judgment basis, and supposes that pedestrian wears clothes and do not change in monitor procedure, then identification technique is carried out the identification of pedestrian's identity by outward appearance similarity matching.

At present, pedestrian again identification mainly contain following a few class technical scheme:

Technical scheme (1)

This scheme attempts to extract from original image the feature that more stable (having stability) has again the property distinguished simultaneously, stability refers to that same person should be the same (stable) in this feature in the same time not, the property distinguished refer to different people (not in the same time or synchronization) this feature should be different.In this technical scheme, from original image, designing the feature that meets above-mentioned requirements is crucial problem.Wherein typical example is Symmetry Driven Accumulation of Local Feature[list of references 1] (the symmetrical accumulation method of local feature), by pedestrian, the health in image detects the method, then health is divided into head, trunk and lower limb from vertical direction, horizontal direction is divided into left and right half body.After removing head, whole health has been divided into left trunk, right trunk, left leg, four parts of right leg.Then for every part, we extract " hsv color histogram ", " maximum stable color region (MSCR) " and " repeating image block " as characteristics of image.The most above-mentioned 3 kinds of image feature vectors of all splicing of connecting, forms the proper vector of whole health.This feature extracting method combines spatial information.

The shortcoming of technical scheme (1)

First: in order to guarantee that extracted feature has stability and the property distinguished, the design of feature needs artificial experience and trial repeatedly.

Second: actual pedestrian, again in identification problem, the parameter configuration of different video cameras is different, and the illumination condition of the visual field of video camera is different, and the shooting angle of same person under different visual fields is different, and may be subject to block interference not identical yet.Under the shooting condition of this complexity, be difficult to design a kind of above-mentioned feature with stability and the property distinguished.

Technical scheme (2)

This technical scheme no longer focuses in the design of image primitive character, but by the method for metric learning, image primitive character is carried out to projection, makes the feature projecting meet stability and the property distinguished.The primitive character of supposing two pictures is expressed as x ∈ R ^dwith y ∈ R ^d, the direct range of two primitive characters (Euclidean distance) is

d_{E} (x, y) = {| | x - y | |}_{2}^{2} - - - (1)

Metric learning method attempts to find a projection matrix L ∈ R ^{d * r}, then use this matrix to carry out projection to primitive character and calculate projection Euclidean distance afterwards

d_{L} (x, y) = {| | Lx - Ly | |}_{2}^{2} - - - (2)

How to obtain the key issue that a good projection matrix L is metric learning.[non-patent document 2] attempts same person image distance to be less than the maximum probability of different people image distance.[non-patent document 3] uses for reference classical metric learning method large margin nearest neighbor(maximal clearance nearest neighbor search) and improve according to identification particular problem again.Method based on metric learning can obtain better performance than technical scheme one.

The shortcoming of technical scheme (2)

Although attempting, with rational projection matrix, original image is carried out to linear projection, this technical scheme guarantees that the feature after projection meets stability and the property distinguished.But because two image takings of needs couplings are in different video cameras, cause existing between picture each species diversity, (parameter configuration of video camera is different, the illumination condition of the visual field of video camera is different, the shooting angle of same person under different visual fields is different, and may be subject to block interference not identical) this species diversity causes two pictures can be seen as in different mode.In this case, only use a metric matrix to be not enough to the Projection Character in two mode to carry out distance calculating in same space.

Summary of the invention

Pedestrian, again in identification problem, the image of different cameras is in different Modal Spaces, but existing technical scheme is only used single Metric Projections matrix, cannot complete the similarity measurement function of cross-module state.In order to overcome above problem, the present invention proposes multi-modal pedestrian based on anchor node and Hash projection identification technique again.

The present invention proposes multi-modal pedestrian based on anchor node and Hash projection identification technique again, belong to pattern-recognition and field of intelligent monitoring, be applied to detection and identification across specific pedestrian target in camera video monitor network.This method combine anchor node dimensionality reduction, Hash projection with across modal technique, first use anchor node projection to carry out Feature Dimension Reduction, then use different Hash functions that the feature of different cameras image is projected to respectively in identical Hamming space, form binary features, finally in Hamming space, carry out the tolerance of similarity.In video surveillance network, different cameras parameter is different, image illumination condition of living in, and the shooting condition such as block and be also not quite similar in the external world, and this image that causes same person to be taken under different cameras has the different forms of expression, in different Modal Spaces.The method can effectively overcome the problem that different cameras picture that Modal Space difference causes can not directly mate.; The XOR of binary features calculates can effectively improve pedestrian's real-time of identification system again.The method has been used anchor node shadow casting technique to carry out Feature Dimension Reduction simultaneously, compares with PCA technology, has avoided svd step, has reduced assessing the cost of method.

In the present invention, the image of taking for different cameras, different Hash functions are used to Projection Character, and primitive character is projected to unified Hamming space from different Modal Spaces.Then in Hamming space, carry out the calculating of Hamming distance.Not only can improve pedestrian's recognition performance of identification again, can also effectively reduce retrieval time, improve the practicality of system.

According to embodiments of the invention, a kind of multi-modal target discrimination method is again provided, comprise the following steps: the foreground image that step 1, the first image of taking separately from the first video camera and the second video camera and the second image, intercepting comprises target respectively, wherein, the second image is corresponding to known target; Step 2, from intercepted foreground image, extract respectively color characteristic and textural characteristics, and by color characteristic and textural characteristics cascade, form characteristics of image; Step 3, described characteristics of image is input to Hash projection model, calculates the similarity of the target in the first image and the second image; If the similarity that step 4 is calculated is greater than predetermined threshold, it by the target discrimination of the first image, is the corresponding known target of the second image.

Beneficial effect of the present invention is mainly the following aspects:

1. for the feature of different cameras image, use different Hash projection functions, can by the Feature Mapping in different modalities in same Hamming space, then carry out the calculating of distance, improved pedestrian's recognition performance of identification system again.

2., after primitive character projects to Hamming space, the vector distance in original real number field calculates and has become the binary XOR in Hamming space, can effectively improve pedestrian's real-time of identification system again.

3. by anchor node projection, carry out Feature Dimension Reduction, avoided the svd of conventional P CA Feature Dimension Reduction, lowered pedestrian's computation complexity of identification system again.

Accompanying drawing explanation

Fig. 1 is pedestrian's functional block diagram of identification system more according to an embodiment of the invention;

Fig. 2 is image characteristics extraction schematic diagram according to an embodiment of the invention;

Fig. 3 is the structured flowchart of multi-modal according to an embodiment of the invention Hash projection.

Embodiment

Below, by reference to the accompanying drawings the enforcement of technical scheme is described in further detail.

Fig. 1 is pedestrian's functional block diagram of identification system more according to an embodiment of the invention.Pedestrian's discrimination method is again described according to an embodiment of the invention with reference to the accompanying drawings, and the method mainly comprises the following steps:

The first step: the pedestrian in camera review is positioned

Pedestrian location refers to the position of determining pedestrian in whole video monitoring image, and the whether accurate performance on whole system of pedestrian location has important impact.Prospect of the application background separation technology of the present invention is carried out pedestrian location.First carry out mixed Gauss model and carry out background modeling, then use the method for background subtraction using the target as sport foreground (pedestrian) location out.A rectangle frame that comprises pedestrian target of final acquisition is as pedestrian detection result, and follow-up feature extraction operation is carried out in this rectangle frame.

Second step: pedestrian's image primitive character extracts

The rectangular image that comprises pedestrian target of locating acquisition for pedestrian, carries out feature extraction, finally obtains the characteristics of image of 5895 dimensions.Concrete feature extracting method is as described below.

Can adopt existing method to extract the primitive character of pedestrian's image (rectangle frame).For example, use bilinearity difference approach that the rectangular image that comprises pedestrian is normalized to 128*48 pixel, and the image after normalization is divided into the image block of a plurality of 16*24 pixel sizes, wherein,, there is the overlapping region of 12 pixels the overlapping region that has 8 pixels between the adjacent image block of level between vertical adjacent image block.Like this, original image is divided into 45(15*3 altogether) individual image block.

For each image block, carry out the extraction of color characteristic and textural characteristics.Color characteristic comprises RGB, HSV and YCbCr totally 9 passages, and the color characteristic of each passage is quantified as the color histogram of 8 dimensions, and the distribution situation of presentation video piece in color space obtains altogether 9 histograms.Textural characteristics is used local binary patterns, forms the texture histogram of 59 dimensions, and local binary patterns has the remarkable advantages such as rotational invariance and gray scale unchangeability, fully Description Image local grain.Local binary patterns computing method are referring to [list of references 4].Like this, the characteristic dimension of each image block is 8*9+59=131.Finally, the color characteristic of all image blocks and textural characteristics are connected, the characteristic dimension that obtains whole image is 131*45=5895.Fig. 2 is the schematic diagram of color and texture feature extraction.

The 3rd step: carry out Feature Dimension Reduction by anchor node projection

Due to the primitive character dimension of image too high (5895 dimension), if directly use can cause follow-up operation to consume a large amount of computing times, therefore, sending into pedestrian again before identification system, original image feature is carried out to dimensionality reduction, finally obtain the feature of 150 dimensions.The present invention uses anchor node shadow casting technique [list of references 7] to carry out Feature Dimension Reduction.The mathematic(al) representation of anchor node projection is:

z (x) = \frac{[\exp (- \frac{D^{2} (x, u_{1})}{t}), . . ., \exp (- \frac{D^{2} (x, u_{m})}{t})]}{Σ_{j = 1}^{m} \exp (- \frac{D^{2} (x, u_{j})}{t})} - - - (3)

X ∈ R wherein ⁵⁸⁹⁵the original image feature that represents 5895 dimensions, represent 150 anchor nodes, D () represents Euclidean distance.T represents normaliztion constant.Z (x) ∈ R ¹⁵⁰represent 150 dimensional features of original image feature x through obtaining after projection, be called anchor node feature.Can find out, anchor node projection can be projected as the primitive character of 5895 dimensions the anchor node feature of 150 dimensions, because 150 much smaller than 5895, so anchor node projection has realized Feature Dimension Reduction.In the present invention the selection of anchor node whether rationally can direct effect characteristics dimensionality reduction effect quality whether, in the present invention, we carry out K-means cluster for all primitive characters, cluster centre number is chosen as 150, then using 150 cluster centres of K-means as anchor node, all like this anchor nodes can relatively be evenly distributed in whole primitive character space, make Feature Dimension Reduction have robustness.

The 4th step: the measurement of similarity between feature

In order to judge whether two images belong to a people, two low dimensional features (being obtained by the 3rd step) corresponding to image are sent into Hash projection model (training process of Hash projection model will illustrate below), obtain the similarity of two images, judge that according to this whether the identity of pedestrian in two images is consistent.

Particularly, it is considered herein that two images that come from different cameras are in different modalities space, therefore first we project to respectively unified Hamming space by two low dimensional features, form respectively binary features, then calculate two Hamming distances between binary features from, finally take Hamming distance from calculating two similarities between feature for basic.Fig. 3 is the structured flowchart of the method.

Below this measuring similarity process is described in detail.

Hash projection is to use Hash function that raw data is projected to a kind of technology [list of references 5] in Hamming space.Suppose that original data space is X ∈ R ^d, x ∈ X is the data in X space, H={-1, and+1} is Hamming space, h (x) ∈ H is that data x is through the result after hash projection.Hash function definition is

h(x)＝sgn(p ^Tx+a)∈{-1,+1} (4)

Wherein { 1 ,+1} represents sign function to sgn () ∈, p ^tthe transposition that represents projection vector p, a represents side-play amount (scalar).

Because the result of hash projection is binary data-1 and+1, therefore can define data x and the similarity function of y under Hash function h () condition

Two images of the definition of above similarity function s (x, y) based on to be compared are in same Modal Space.But pedestrian, again in identification problem, the picture that different cameras photographs is sentenced different Modal Spaces, and therefore above-mentioned similarity function can not be directly used in pedestrian's measuring similarity of identification again.

In order to overcome above problem, the present invention proposes the Hash projection of cross-module state and similarity function.Suppose that two video camera photographic images feature x and y are respectively in X space and Y space.Two kinds of different Hash function h _xand h (x) _y(y) respectively the feature in these two spaces is carried out to Hash projection,

h _X(x)＝sgn(p ^Tx+a)∈{-1,+1}

h _Y(y)＝sgn(q ^Ty+b)∈{-1,+1} (6)

Wherein, p ^t, q ^tthe transposition that represents respectively projection vector p, q, a, b represent side-play amount (scalar).

By Hash projection, characteristics of image x and y are projected to respectively in identical Hamming space, and corresponding similarity function is rewritten as

Above a pair of Hash function h _x() and h _y() only can represent two kinds of similarities (s (x, y)=+ 1 represents that x is similar with y, and s (x, y)=-1 represents x and y dissmilarity), and in order to portray better the similarity degree of x and y, as example, the present invention can introduce 50 pairs of Hash functions (every pair of Hash function has projection vector p, q, side-play amount separately), and the similarity function of x and y is rewritten as:

\begin{matrix} s (x, y) = Σ_{l = 1}^{50} h_{Xl} (x) h_{Yl} (y) \\ = Σ_{l = 1}^{50} sgn (p_{l}^{T} z (x) + a_{l}) sgn (q_{l}^{T} z (y) + b_{l}) \end{matrix} - - - (8)

In addition, consider that different Hash functions are not identical to the effect of playing in measuring similarity, we are that every a pair of Hash function is set a weight α _l, formula (8) is further rewritten as:

\begin{matrix} s (x, y) = Σ_{l = 1}^{50} α_{l} h_{Xl} {(z (x)) h}_{Yl} (z (y)) \\ = Σ_{l = 1}^{50} α_{l} sgn (p_{l}^{T} z (x) + a_{l}) sgn (q_{l}^{T} z (y) + b_{l}) \end{matrix} - - - (9)

Sum up said process below.

Suppose in A video camera and photograph an image Q, in B video camera, photograph N and open image find the G the most similar to Q, the pedestrian who occurs in Q (or other target) is carried out to judging identity, wherein, the N photographing in B video camera opens image in the corresponding target classification (for example, certain pedestrian's identity) of every image.Use step 1～3 obtain described image characteristic of correspondence x and then use formula (9) to calculate similarity use formula (10) is found out the y with x similarity maximum ^*,

y^{*} = \max_{y} s (x, y_{i}) - - - (10)

Afterwards, by y ^*the classification information of corresponding image (from B video camera) is as recognition result.

Below, the parameter training process of Hash projection model is described.

In order to guarantee that formula (9) can carry out rational similarity measurement, need to be to parameter wherein reasonably arrange, therefore, use training data image (in image, pedestrian's identity is known) to carry out the study of parameter.Suppose and have 316 pairs of training samples s(x _k, y _k) { 1 ,+1} shows that x and y belong to same person (s (x to ∈ _k, y _k)=+ 1) or belong to different people (s (x _k, y _k)=-1).Reasonably cross-module state Hash projection function should have following character:

1) through after projection, belong between the feature of different people (thering is different clothing outward appearances) and have larger distance,

2), through after projection, belong between the feature of same person (thering is identical clothing outward appearance) and have less distance.

The method of use based on AdaBoost carried out parameter training to 50 pairs of Hash functions.Training process be input as 316 pairs of training samples and corresponding label and 150 anchor nodes.50 iteration of whole training process experience, in iteration, first determine optimum projection vector p each time _l, q _lwith side-play amount a _l, b _l, then calculate Hash function to weight, final updating sample weights (for next iteration is prepared).Training process is output as the projection vector of 50 pairs of Hash functions and side-play amount and corresponding Hash function to weight.For the l time iterative process, be described below:

Shown in objective function formula (11), by maximizing objective function, obtain optimum projection vector p _l, q _lwith side-play amount a _l, b _l.

\begin{matrix} Φ_{l} = Σ_{k = 1}^{K} s (x_{k}, y_{k}) h_{Xl} (z (x_{k})) h_{Yl} (z (y_{k})) \\ = Σ_{k = 1}^{K} s (x_{k}, y_{k}) sgn ({p_{l}}^{T} z (x_{k}) + a_{l}) sgn ({q_{l}}^{T} z (y_{k}) + b_{l}) \end{matrix} - - - (11)

(1) training { p _l, q _l}

In optimizing process, the problem of bringing in order to overcome sign function, simplifies formula (11)

\begin{matrix} {\hat{Φ}}_{l} = Σ_{k = 1}^{K} s (x_{k}, y_{k}) sgn ({p_{l}}^{T} z (x_{k}) + a_{l}) sgn ({q_{l}}^{T} z (y_{k}) + b_{l}) \\ = Σ_{k = 1}^{K} s (x_{k}, y_{k}) ({p_{l}}^{T} z (x_{k}) + a_{l}) ({q_{l}}^{T} z (y_{k}) + b_{l}) \\ = Σ_{k = 1}^{K} s (x_{k}, y_{k}) ({p_{l}}^{T} \overset{&OverBar;}{z} (x_{k})) ({q_{l}}^{T} \overset{&OverBar;}{z} (y_{k})) \\ = Σ_{k = 1}^{K} ϵ_{lk} ({p_{l}}^{T} \overset{&OverBar;}{z} (x_{k})) ({q_{l}}^{T} \overset{&OverBar;}{z} (y_{k})) \\ = {p_{l}}^{T} (Σ_{k = 1}^{K} ϵ_{lk} \overset{&OverBar;}{z} (x) {\overset{&OverBar;}{z}}^{T} (y)) q_{l} \\ = {p_{l}}^{T} Σ_{l} q_{l} \end{matrix} - - - (12)

ε wherein _lk=s (x _k, y _k), the z (x of centralization _k), the z (y of centralization _k).According to [list of references 6], p _land q _lshould be in Σ _lin the Projection Character space of matrix.Suppose with respectively Σ _lfront 50 left eigenvectors and 50 right proper vectors, so p _land q _lcan use with linear combination carry out approximate representation:

p_{l} = Σ_{m = 1}^{50} ζ_{m} u_{m}, q_{l} = Σ_{m = 1}^{50} ξ_{m} v_{m} . - - - (13)

Wherein, with be respectively with linear coefficient.

In order to reduce computation complexity, select at random the projection weight of 3000 pairs 50 dimensions use formula (14) to obtain N to projection vector then select to make objective function obtain peaked projection vector to optimal result the most.

{p_{l}^{*}, q_{l}^{*}} = \max_{{p_{l}, q_{l}}} {\hat{Φ}}_{l} - - - (14)

(2) training { a _l, b _l}

Obtain projection vector to after, objective function becomes

\overset{&OverBar;}{Φ} = Σ_{k = 1}^{K} s (x_{k}, y_{k}) sgn (p_{l}^{* T} z (x_{k}) + a_{l}) sgn (q_{l}^{* T} z (y_{k}) + b_{l}) - - - (15)

Find below and can make (a, b) combination of objective function maximum as optimum side-play amount pair.Particularly, (a, b) two-dimensional space is carried out to uniform grid and turn to 100 * 100 grid, common property raw 10000 (a, b) combination, then based on each (a, b) combination calculating target function, and select the combination that can maximize objective function as optimum side-play amount pair.

{a_{l}^{*}, b_{l}^{*}} = \max_{{a_{l}, b_{l}}} {\overset{&OverBar;}{Φ}}_{l} - - - (16)

More than describe the training process of l to Hash projection function, for all Hash functions (totally 50 pairs), used AdaBoost method to carry out joint training.In whole process, for every a pair of Hash projection function, add sample weights to objective function

\begin{matrix} Φ_{l} = Σ_{k = 1}^{K} ω_{l} (x_{k}, y_{k}) s (x_{k}, y_{k}) h_{Xl} (z (x_{k})) h_{Yl} (z (y_{k})) \\ = Σ_{k = 1}^{K} ω_{l} (x_{k}, y_{k}) s (x_{k}, y_{k}) sgn ({p_{l}}^{T} z (x_{k}) + a_{l}) sgn ({q_{l}}^{T} z (y_{k}) + b_{l}) \end{matrix} - - - (17)

ω wherein _l(x _k, y _k) be the weight of k to sample.

(3) training { α _l}

The right weight calculation formula of Hash function is

α_{1} = \frac{1}{2} \ln (1 + Φ_{l}) - \frac{1}{2} \ln (1 - Φ_{l}) - - - (18)

List of references list

1、Michela Farenzena,Loris Bazzani,Alessandro Perina,Vittorio Murino,and Marco Cristani,“Person re-identification by symmetry-driven accumulation of local features,”in Computer Vision and Pattern Recognition(CVPR),2010IEEE Conference on.IEEE,2010,pp.2360–2367.

2、Wei-Shi Zheng,Shaogang Gong,and Tao Xiang,“Person reidentification by probabilistic relative distance comparison,”in Computer Vision and Pattern Recognition(CVPR),2011IEEE Conference on.IEEE,2011,pp.649–656.

3、Mert Dikmen,Emre Akbas,Thomas S Huang,and Narendra Ahuja,“Pedestrian recognition with a learned metric,”in Computer Vision–ACCV2010,pp.501–512.Springer,2011.

4、T.Ojala,M. and D.Harwood(1994),"Performance evaluation of texture measures with classification based on Kullback discrimination of distributions",Proceedings of the12th IAPR International Conference on Pattern Recognition(ICPR1994),vol.1,pp.582-585

5.A.Torralba,R.Fergus,et al.,“Small codes and large image databases for recognition,”in Computer Vision and Pattern Recognition(CVPR),2008IEEE Conference on.IEEE,2008,

6.M/Bronstein,M.M.Bronstein,et al.,“The video genome,”arXiv preprint arXiv:1003.5320,2010.

7.Liu W,Wang J,Ji R,et al.Supervised hashing with kernels[C]//Computer Vision and Pattern Recognition(CVPR),2012IEEE Conference on.IEEE,2012:2074-2081.

For fear of the description that makes this instructions, be limited to miscellaneous, in description in this manual, may the processing such as omission, simplification, accommodation have been carried out to the part ins and outs that can obtain in above-mentioned list of references or other prior art data, this is understandable for a person skilled in the art, and this can not affect the open adequacy of this instructions.At this, above-mentioned list of references is herein incorporated by reference and in full.

In sum, those skilled in the art will appreciate that the above embodiment of the present invention can be made various modifications, modification and be replaced, it all falls into the protection scope of the present invention limiting as claims.

Claims

1. A multimodal target re-identification method, comprising the following steps:

Step 1. Intercepting a foreground image containing a target from the first image and the second image respectively captured by the first camera and the second camera, wherein the second image corresponds to a known target;

Step 2, extracting color features and texture features respectively from the intercepted foreground image, and cascading the color features and texture features to form image features;

Step 3, the image feature is input to the Hash projection model, calculate the similarity of the target in the first image and the second image;

Step 4. If the calculated similarity is greater than a predetermined threshold, determine that the object in the first image is a known object corresponding to the second image.

2. The object re-identification method according to claim 1, wherein the foreground image containing the object is defined by a circumscribed rectangular frame of the object.

3. The object re-identification method according to claim 2, wherein the color features include 9 color histograms corresponding to the 9 channels of RGB, HSV and YCbCr, which represent each pixel in the foreground image The distribution of values under each channel,

Wherein, the value range under each channel is quantized to 8 values, thereby forming 9 8-dimensional color histograms, as the color features,

Wherein, the local binary mode is used to calculate the foreground image to obtain a 59-dimensional texture histogram as the texture feature.

4. The target re-identification method according to claim 3, wherein, after the step 2, further comprising:

Step 21. Use anchor node projection to perform dimensionality reduction on the color feature and the texture feature. The formula is as follows:

z z ((x x)) = = \frac{[[exp exp ((- - \frac{{D D.}^{22} ((x x,, {u u}_{11}))}{t t})),, . . . . . .,, exp exp ((- - \frac{{D D.}^{22} ((x x,, {u u}_{m m}))}{t t}))]]}{{Σ Σ}_{j j = = 11}^{m m} exp exp ((- - \frac{{D D.}^{22} ((x x,, {u u}_{j j}))}{t t}))} - - - - - - ((33))

Wherein, x represents the original image feature after the color feature and the texture feature are concatenated, Represents m anchor nodes, D(·,·) represents the Euclidean distance, t represents the normalization constant, z(x) represents the low-dimensional feature obtained after the original image feature x is projected through the anchor node.

5. The target re-identification method according to claim 4, wherein said step 3 comprises:

Step 31. Calculate the similarity s(x, y) of the respective targets of the first and second cameras by the following formula:

in,

h _X (x)=sgn(p ^T x+a)∈{-1,+1}

h _Y (y)=sgn(q ^T y+b)∈{-1,+1} (6)

Among them, p ^T and q ^T respectively represent the transposition of projection vectors p and q in the Hash projection model, a and b represent the offset in the Hash projection model, and sgn() is a sign function.

6. The target re-identification method according to claim 4, wherein said step 3 comprises:

s the s ((x x,, y the y)) = = {Σ Σ}_{l l = = 11}^{5050} {α α}_{l l} sgn sgn (({p p}_{l l}^{T T} z z ((x x)) + + {a a}_{l l})) sgn sgn (({q q}_{l l}^{T T} z z ((y the y)) + + {b b}_{l l})) - - - - - - ((99))

in, respectively represent the transposition of the projection vectors p and q in the 50 Hash projection models, a _l and b _l represent the offsets in the 50 Hash projection models, sgn() is a sign function, and α _l represents each Hash projection The weights of the model.

7. The target re-identification method according to claim 5 or 6, wherein the second camera is one or more second cameras, and the second images are a plurality of second images captured by the second cameras respectively. images, each of the plurality of second images corresponds to a different target,

Wherein, in step 3, calculating the similarity of the target in the first image and the second image includes: respectively calculating the similarity of the target in the first image and the target in each second image to obtain a plurality of similarities,

Said step 4 includes:

Step 41: If the largest similarity among the plurality of similarities is greater than the predetermined threshold, determine the known object corresponding to the second image corresponding to the largest similarity as the object corresponding to the first image.

8. target re-identification method according to claim 6, wherein, described Hash projection model obtains by following steps training:

Step 51, through the following formula, the training is obtained

{{{p p}_{l l}^{* *},, {q q}_{l l}^{* *}}} = = \underset{{{{p p}_{l l},, {q q}_{l l}}}}{max max} {\overset{^^}{Φ Φ}}_{l l},,

in,

{\hat{Φ}}_{l} = {p_{l}}^{T} Σ_{l} q_{l},

Step 52, through the following formula, the training is obtained

After obtaining the projection vector pair, the objective function becomes

{{{a a}_{l l}^{* *},, {b b}_{l l}^{* *}}} = = \underset{{{{a a}_{l l},, {b b}_{l l}}}}{max max} {\overset{&OverBar; &OverBar;}{Φ Φ}}_{l l} - - - - - - ((1616))

in,

\overset{&OverBar; &OverBar;}{Φ Φ} = = {Σ Σ}_{k k = = 11}^{K K} s the s (({x x}_{k k},, {y the y}_{k k})) sgn sgn (({p p}_{l l}^{* * T T} z z (({x x}_{k k})) + + {a a}_{l l})) sgn sgn (({q q}_{l l}^{* * T T} z z (({y the y}_{k k})) + + {b b}_{l l})) - - - - - - ((1515))

Step 53, through the following formula, the training is obtained

{α α}_{11} = = \frac{11}{22} ln ln ((11 + + {Φ Φ}_{l l})) - - \frac{11}{22} ln ln ((11 - - {Φ Φ}_{l l})) - - - - - - ((1818)) . .