CN109800794A

CN109800794A - A kind of appearance similar purpose identifies fusion method and system across camera again

Info

Publication number: CN109800794A
Application number: CN201811630037.7A
Authority: CN
Inventors: 赵辉; 陶卫; 吕娜; 何旺贵; 许凌志; 符钦伟; 刘沅秩; 郑超; 冯宇; 冯哲明
Original assignee: Shanghai Jiao Tong University
Current assignee: Shanghai Jiao Tong University
Priority date: 2018-12-27
Filing date: 2018-12-27
Publication date: 2019-05-24
Anticipated expiration: 2038-12-27
Also published as: CN109800794B

Abstract

The invention provides a cross-camera re-identification fusion method for similar-looking targets, which adopts a deep convolutional neural network to extract a global feature map of a picture, and extracts the target's appearance vector on the global feature map according to the target detection result; encodes the camera, Generate a perspective vector containing the observation perspective information; generate the target position vector according to the corresponding detection frame position of the target in the image coordinate system. The three vectors are fused and transformed to generate the target representation vector. The network is trained by optimizing the triplet loss function, and the representation vector for re-identification is learned. During the training process, a combination of offline mining and online mining is used to generate and update the triplet dataset. Finally, the constrained hierarchical clustering algorithm is used to cluster the representation vectors corresponding to the targets in different cameras to realize cross-camera target re-identification. At the same time, a cross-camera re-identification fusion system for similar-looking targets is provided. The present invention improves the accuracy of re-identification.

Description

A kind of appearance similar purpose identifies fusion method and system across camera again

Technical field

The present invention relates to one of mode identification technology weight identification technologies, and in particular, to a kind of appearance is similar Target identifies fusion method and system across camera again.

Background technique

Target identifies again to be referred to: for some or certain given targets, determine in other times or visual angle whether In the presence of the target with same identity.Identify that earliest proposition is chased after in the polyphaser tracking field of target and polyphaser again One of key technology of track.Specifically, when under the visual angle that the same target appears in multiple cameras, it is thus necessary to determine that Duo Geguan The uniqueness for the target identities that survey person is observed.In timing, target is also required to weight in the identity of the identity of different moments Identification technology is safeguarded.Target identifies that especially pedestrian identifies the concern and research obtained in recent years especially again again, not only by Have a wide range of applications field in it, such as monitor, scheduling, security protection etc., due also to its technology for being related at present also relative to not enough at It is ripe, need further to be furtherd investigate.In recent years, weight identification technology is also used in the fields such as bio-identification and image retrieval, such as Face verification etc..

Due to the complexity of application scenarios, the same target possibly is present under different illumination conditions, can also be set because of acquisition Standby, acquisition angles differences and generate biggish difference.In addition to this, the serious shielding under intensive scene, leads to target information It is imperfect.In practical applications, similar or target with common identity often shows biggish variation and difference, without Similar or target with different identity may be similar in certain features and be difficult to differentiate between.These phenomenons are all identified to target again Cause huge difficulty.The key identified again is to extract, and there is the target expression of strong characterization ability and metric objective to indicate it Between the index of similarity or distance.Target expression should have very strong identification, can show the exclusive feature of target, from And different targets is distinguished, Measure Indexes should be able to accurately portray the similarity or distance between target, to realize reliable Verifying and sequencing of similarity.

The recognition methods again of mainstream at present mainly utilizes the appearance information (including color, texture, shape etc.) of target, can The characteristics of largely expressing target.But the appearance of target can usually be influenced by many external factors, thus it is this There is very big unstability in method.In addition, the appearance between individual compares phase for some targets (vehicle of such as same type) Seemingly, difference very little, this can usually make appearance information failure when distinguishing them, to be difficult or can not pass through appearance at all Model realization identifies again.It is less to the heavy Study of recognition of this kind of target with similar appearance at present, belong to blank field.

Summary of the invention

The present invention is directed to current existing heavy recognition methods existing limitation and deficiency in terms of similar purpose identification, proposes A kind of appearance similar purpose identifies fusion method and system across camera again, this method and system be it is a kind of by Viewing-angle information and The object table dendrography learning method and system that position constraint relationship and appearance information blend: in learning objective character representation vector When, the appearance information of target is not only utilized, while the position of target and Viewing-angle information being incorporated among appearance information, and one Learnt with neural network is sent into so that network not only between learning objective appearance similitude, also learnt position and visual angle Relevance, vector is indicated with more the target of identification by similitude and the study of the aspect of relevance two.Using boarding steps Descent method training network is spent, so that the distance indicated between vector in study with common identity target, which is less than, has different identity The distance of target indicated between vector, i.e. inter- object distance are less than between class distance.In the training process using offline excavation and online It excavates the method combined and generates and update ternary group data set, improve training effectiveness and convergence rate, model is avoided to fall into Local optimum.On the basis of obtaining target indicates vector, using the hierarchical clustering based on distance to the mesh from multiple cameras Mark indicates that vector is clustered, and is thought by the target for being polymerized to a kind of with common identity, to realize the identification again across camera.Layer The foundation of secondary cluster is the distance between vector, and the vector being closer is polymerized to one kind by it, in order to avoid that will come from same camera Target be clustered into same identity, the present invention improves the calculation method of distance vector in hierarchical clustering, so that coming from The distance of identical camera indicated between vector is infinity, ensure that the target under the same camera perspective can not be polymerized to one Class improves the accuracy identified again.

The present invention is achieved by the following technical solutions.

According to an aspect of the invention, there is provided a kind of appearance similar purpose identifies fusion method across camera again, packet Include following steps:

Using multiple cameras from different fixed angle synchronous acquisition scene images, target is obtained in sight in different positions Measurement information；

The target in each image is detected using the object detector based on depth convolutional network, output target inspection As a result；

The global characteristics figure of each image is extracted using depth convolutional neural networks, and according to object detection results in the overall situation The local feature figure that target corresponding position is extracted on characteristic pattern, obtains the appearance vector of target；Camera is encoded, packet is generated The visual angle vector of the Viewing-angle information containing observation；According to target, corresponding target detection frame position generates target under image coordinate system Position vector；

By appearance vector, visual angle vector sum position vector carry out Vector Fusion, and by transformation after generate target indicate to Amount；

Depth convolutional neural networks are trained using ternary group data set, study for the target that identifies again indicate to Amount；Wherein, in the training process, using the offline method generation and update triple data excavated and online mining combines Collection；

Vector, which gathers, to be indicated to the target after the corresponding study of target in each image using constraint hierarchy clustering method Class, realization identify again across camera subject.

Preferably, the visual angle vector assigns the vector of a fixed dimension to each observation visual angle, and vector is using random first Beginning metaplasia by gradient descent method at and being continued to optimize in the training process.

Preferably, the position vector, by the transverse and longitudinal coordinate in the target detection frame upper left corner and the lower right corner respectively according to figure The width and height of picture normalize, and arrangement generates in sequence.

Preferably, by target detection frame top left corner apex (x₁, y₁) and bottom right angular vertex (x₂, y₂) x, the direction y coordinate point Not divided by the width w of image and height h, normalized apex coordinate (x ' is obtained_i, y '_i):

x’_i=x_i/w

y’_i=y_i/h

I=1 in formula, 2, so that coordinate value between 0~1, obtains the normalized upper left corner and lower right corner apex coordinate (x’₁, y '₁), (x '₂, y '₂), it is then arranged in order, obtains position vector b=[x '₁, y '₁, x '₂, y '₂]。

Preferably, the Vector Fusion method are as follows: first visual angle vector, position vector and appearance vector are spliced, Splicing sequence is unlimited；Spliced vector indicates vector by full connecting-type network and by the final target of normalized output.

Preferably, described using the offline method excavated and online mining combines are as follows: to use the multiple cameras of synchronization Collected target configuration triple；To giving the target observed under some camera, other cameras observe with the target Target with common identity is positive sample, has the target of different identity for negative sample with the target；First excavated using offline Method generates all triples as initial data set, after carrying out initial training to depth convolutional neural networks, is dug using online Pick method is continuously evaluated and removes in data set and easily divide sample, reduces training area, complete the generation and more of data set Newly.

Preferably, described easily sample to be divided to be the sample that triple loss function is 0.

Preferably, the constraint hierarchy clustering method is the calculating of distance between distance and class cluster between indicating vector based on target Method, in which:

Target indicates distance between vector are as follows:

IfThe corresponding target of i-th of target indicates vector in the camera for being c for number, then target indicates vector Between distance

Distance between class cluster are as follows: the target that distance is farthest in two class clusters indicates the distance between vector.

According to another aspect of the present invention, provide a kind of appearance similar purpose identifies emerging system across camera again, Include:

Image capture module: it including multiple cameras, from different fixed angle synchronous acquisition scene images, is regarded between camera Wild range mutually covers；

Module of target detection: in the image acquired using the object detector based on depth convolutional network to each camera Target detected, export target detection frame；

Detection block normalizes module: by the abscissa in the target detection frame upper left corner and the lower right corner and ordinate respectively divided by The width and height of image, obtain normalization detection block, and the vertex abscissa and ordinate of the normalization detection block be 0~ Dimensionless decimal between 1；

Appearance vector generation module: obtaining whole characteristics of image figure of each camera acquisition using depth convolutional network, then Local feature figure is extracted according to normalization detection block coordinate on whole characteristics of image figure；

Position vector generation module: according to target, corresponding target detection frame position generates target under image coordinate system Position vector；

Visual angle vector generation module: encoding camera, generates the visual angle vector comprising observation Viewing-angle information；

Vector Fusion module: splicing appearance vector, position vector and visual angle vector, and spliced vector passes through Full connecting-type network simultaneously indicates vector by the final target of normalized output；

Triple generation module: triple number is generated and updated using the offline method combined with online mining of excavating According to collection；

Network training module: being trained depth convolutional neural networks using ternary group data set, and study is for knowing again Other target indicates vector；

Cluster Analysis module: vector, which clusters, to be indicated to the target after the corresponding study of target in each image, is realized It is identified again across camera subject.

Preferably, the appearance similar purpose identifies emerging system across camera again, further includes following any one or appoints It anticipates multinomial feature:

Described image acquisition module includes 4 cameras, is respectively arranged the corner location in scene；The mesh occurred in visual field Mark can be captured by 4 cameras simultaneously.

The target detection frame is the minimum level boundary rectangle of target under image coordinate system, wherein target detection frame Vertex abscissa and ordinate unit are pixel；RoI region of the target in image coordinate system is marked according to target detection frame； That is, when the top left corner apex coordinate of target detection frame is (x₁, y₁), lower right corner apex coordinate be (x₂, y₂) when, the region RoI is (x₁, y₁)-(x₂, y₂)。

Compared with prior art, the invention has the following beneficial effects:

Appearance similar purpose provided by the present invention identifies fusion method and system across camera again, and the visual angle of target is believed Breath and location information and appearance information are fed together depth network and carry out clarification of objective extraction, on the one hand improve the Shandong of feature Stick, identification of attaching most importance to increase information source；On the other hand reducing target indicates dependence of the vector to target appearance information, so that Weight identification technology can be extended to the similar target of appearance using upper.In learning process, using offline triple excavate and Line triple excavates the method combined, avoids the training effectiveness that model is improved while model falls into local optimum.Separately Outside, the present invention is added to constraint to the cluster of general level, and the observation object avoided under same view angle is polymerized to a kind of mistake The accidentally generation of situation, improves the accuracy of cluster, and then improves weight recognition performance.

Detailed description of the invention

Upon reading the detailed description of non-limiting embodiments with reference to the following drawings, other feature of the invention, Objects and advantages will become more apparent upon:

Fig. 1 is a kind of work for identifying fusion method again across camera of appearance similar purpose provided by one embodiment of the invention Make flow chart；

Fig. 2 (a) is that the appearance similar purpose of multiple target polyphaser in one embodiment of the invention identifies fusion across camera again System schematic；

Fig. 2 (b) is to be formed by target detection frame schematic diagram under module of target detection in one embodiment of the invention；

Fig. 3 is weight identification process schematic diagram in one embodiment of the invention；

Fig. 4 is that each vector splices schematic diagram in one embodiment of the invention；

Fig. 5 is that triple constitutes schematic diagram in one embodiment of the invention；

It is labeled data collection exploitation marking software (sample marking software) surface chart that Fig. 6, which is in one embodiment of the invention,.

Specific embodiment

Elaborate below to the embodiment of the present invention: the present embodiment carries out under the premise of the technical scheme of the present invention Implement, the detailed implementation method and specific operation process are given.It should be pointed out that those skilled in the art For, without departing from the inventive concept of the premise, various modifications and improvements can be made, these belong to guarantor of the invention Protect range.

Embodiment

As shown in Figure 1, present embodiments provide a kind of appearance similar object identifies fusion method, including figure across camera again Shape acquisition, target detection, detection block normalization, generate appearance vector, generate position vector, generate visual angle vector, Vector Fusion, Generate triple, network training and clustering step.

It is specific as follows:

(1)Image Acquisition: use multiple cameras from visual field model between different fixed angle synchronous acquisition scene image, camera Enclose mutual covering, target can appear in simultaneously in the visual field of multiple cameras, thus obtain target have in different positions compared with For complete observation information, while target being made to occur the optimal position of camera as far as possible.

In this step, the module of target detection as shown in Fig. 2 (a) can be used, which is arranged 4 cameras altogether, point It is not arranged in the corner location of scene；Occur 8 targets in visual field altogether, can be captured simultaneously by 4 cameras.

(2)Target detection: using the object detector based on depth convolutional network to the target in each camera image into Row detection, exports the detection block of target, and detection block is the minimum level boundary rectangle of target under image coordinate system, the horizontal seat in vertex Mark is pixel with ordinate unit.For example, being formed by under a certain camera such as Fig. 2 in the module of target detection as shown in Fig. 2 (a) (b) target detection frame shown in, top left corner apex (x₁, y₁) and bottom right angular vertex (x₂, y₂).It is drawn according to this target detection frame The region RoI (Region ofInterest) of target in image coordinate system out.

The image of one target of the capture of some camera as shown in Fig. 2 (b), the upper left corner of the horizontal detection block of output Apex coordinate is (x₁, y₁), lower right corner apex coordinate be (x₂, y₂).The region RoI can be taken as (x₁, y₁)-(x₂, y₂)。

(3)Detection block normalization: by the abscissa in the target detection frame upper left corner and the lower right corner and ordinate respectively divided by figure The width and height of picture, to obtain new normalization detection block, the vertex abscissa and ordinate of the normalization detection block are equal For the dimensionless decimal between 0~1.

The direction the x width and the direction y height of detection block are respectively w and h, then the apex coordinate for normalizing detection block is respectively (x’_i, y '₁), (x '₂, y '₂), in which:

x’_i=x_i/w

y’_i=y_i/h

In formula, i=1,2, coordinate value x₁′、x₂′、y₁′、y₂' value be respectively positioned between 0~1.

(4)Generate appearance vector: feature extraction is used as using depth convolutional network (such as VGGNet, ResNet etc.) first Device carries out the operations such as a series of convolution, pond to the picture of each camera, obtains the further feature figure of one layer of convolutional layer, the spy Sign figure has image overall semantic information；Then target pair is extracted on further feature figure according to the detection block that target detection exports The local feature figure of position is answered, which has different scales according to the size of detection block；In order to obtain unification The target appearance vector of regular length convenient for operation between subsequent vector and compares, and carries out RoIPooling to local characteristic pattern Operation exports the normalizing aspect vector with unified regular length；This feature vector passes through the mapping of subsequent full articulamentum Final appearance vector is exported afterwards.In this step, when extracting the local feature figure of target corresponding position, whole image is first calculated Characteristic pattern, then on characteristic pattern according to normalization detection block coordinate extract local feature figure.

(5)Generate position vector: camera is fixed, target is in the situation of ground motion, the same target appears in difference Position vector in camera image has determining relationship, and this relationship, which can be used as, judges whether two observed objects have phase With one of the foundation of identity.Present invention introduces the new concepts of position vector, using the position of target in the picture as target One feature is for identifying again.The position vector of target is defined as the upper left of the normalization detection block of the target under image coordinate system The coordinate arrangement at angle and bottom right angular vertex.

The transverse and longitudinal coordinate for normalizing the detection block upper left corner and the lower right corner is arranged in sequence, generates position vector b= [x’₁, y '₁, x '₂, y '₂]。

(6)Generate visual angle vector: when determining whether the middle target of two images has same identity, the angle of object observing Spending (visual angle) is also a very important factor.For example, pattern is often different when same target is from front and back, but In the case where visual angle determines, the feature of interest observed from all angles has a determining relationship, this relationship naturally and Visual angle is related.The new concept of the present embodiment introducing visual angle vector: designing a two dimension first, and matrix column number is equal to the number of camera Amount, the line number of matrix generally take 4,8,16 etc., and each element of matrix is filled with random number, and random number, which can be, obeys Gauss point Number between cloth or equally distributed 0 to 1；The coding of the corresponding camera of each column of matrix, is defined as the visual angle of the camera Vector；The visual angle vector is continued to optimize by gradient descent method in the training process.

For example, if there is M camera in system, number 0,1 ..., M-1, for i-th of camera, one-hot is compiled CodeIt is one in addition to the M dimensional vector that the (i-1)-th dimension is that 1 its codimension is 0, defines camera matrixTo matrix V random initializtion (such as according to standardized normal distribution), a column v of matrix_iThe coding for indicating (i-1)-th camera, then have v_i=Vcⁱ, Wherein vi is the visual angle vector generated, optimizes the vector in network training process.

(7)Vector Fusion: appearance vector, position vector and visual angle vector are merged to generate final target and indicate Vector.First vector is spliced, as shown in Figure 4.Spliced vector is by full connecting-type network and passes through normalized output Final target indicates vector.

(8)Generate triple: in order to learn to have the target of stronger characterization ability to indicate vector, need to feature extraction net Network is trained.To giving the target observed under some camera, other cameras observe with its mesh with common identity It is designated as positive sample, is negative sample with its target with different identity, accordingly generates ternary group data set.The present embodiment use from The method that line excavates and online mining combines generates the triple of training network, convergence of the composition of triple to network training Property has vital effect.Network in the present embodiment needs relative positional relationship of the learning objective between camera, using same Target configuration triple in one moment multiple camera acquired images.To giving the target observed under some camera, What his camera observed is positive sample with its target with common identity, is negative sample with its target with different identity.

Specifically, as shown in figure 5, setting t moment camera c observes that identity is expressed as O as the target of l_{T, c, l}It is if selecting it Reference sample, to another observed object O_{T ', c ', l '}, when meeting t '=t, when c ' ≠ c, l '=l, O_{T ', c ', l '}For positive sample, when full When sufficient t '=t, c ' ≠ c, l ' ≠ l, O_{T ', c ', l '}For negative sample.All triples at each moment are first generated according to above-mentioned rule, The triple of different moments is put together into composing training collection.At training initial stage, model falls into locally optimal solution in order to prevent, first Model is trained using samples all in training set, since training set is larger, model convergence rate is slower.By several wheels After (wheel indicates traversal all samples of training set) training, then online triple method for digging is taken, specifically, each small quantities of Easily point (loss function 0) sample in this batch is removed from training set after the completion of amount (mini-batch) training, to guarantee The sample will not be collected in subsequent sampling, in this way, difficulty divides sample and half difficult point in training set with trained continuous progress The ratio of sample is continuously increased, and improves training effectiveness and convergence rate, while still easily dividing sample comprising part in training set, can To avoid model collapsing.

(9)Network training: in order to learn to indicate vector with the target of strong characterization ability, feature extraction network is instructed Practice.To giving the target observed under some camera, what other cameras observed is positive sample with its target with common identity This, is negative sample with its target with different identity, accordingly generates ternary group data set

For example, sample marking software is as shown in fig. 6, detection block b corresponding to certain target in given image/and image, The camera numbers c of image, neural network forecast target indicate vector e=f (I, b, c) ∈ R^d, f () is a transforming function transformation function, refers to feature Extract network.Certain target observed (reference sample) on given specific imageAnother is selected to have phase with it With the observed object (positive sample) of identityWith its observed object (negative sample) with different identity Then triple loss function are as follows:

Wherein i is ternary deck label, []₊=max (0), a, p, n respectively indicate anchor, positive, Negative, m are the intervals of positive negative sample.Triple lose so that with different identity target indicate vector between distance with Certain interval is greater than the distance between the vector with common identity.

(10)Clustering: hierarchical clustering is constantly merged and is decomposed to class cluster according to the distance between class cluster, identical The distance between the corresponding vector of different target in camera is smaller sometimes, if not adding constraint, may miss and gather them At one kind.In order to solve this problem, the present embodiment using constraint hierarchy clustering method to learn target expression vector into Row cluster, to realize the target identities identification across camera.Distance between definition vector of the present invention are as follows: when two targets indicate vector When from different cameral, Euclidean distance of the distance between they between vector, when two targets indicate that vectors come from identical camera When, the distance between they are positive infinite.In addition, distance is between the point of two class cluster lie farthest aways between the present invention defines class cluster Distance, when the vector in identical camera is polymerized to one kind, distance is positive infinite, this constraint subtracts to a certain extent The generation of cluster mistake is lacked.

For example, settingThe corresponding characterization vector of i-th of target in the camera for being c for number, definition vectorBetween away from From

In addition, distance between class cluster is using the distance between the point of two class cluster lie farthest aways, when in identical camera When vector is polymerized to one kind, distance be positive it is infinite, this constraint reduce to a certain extent cluster mistake generation.

Hierarchical clustering: after the completion of training, it is corresponding that the target that all cameras detect is calculated using trained network model Expression vector.Characterization vector is clustered using hierarchical clustering polymerization, wherein the distance between vector calculates according to the following formula:

Distance between class cluster takes the distance between the point of two class cluster lie farthest aways.Cluster is that similar target thinks with phase Same identity, to complete to identify again.

In the present embodiment:

Provided appearance similar purpose identifies fusion method across camera again, and main thought is to regard the observation of target Angle and positional relationship indicate vector for learning objective in conjunction with appearance information, and use in the training process is offline and online three Tuple excavates the method generation and more new data set combined, and using constraint hierarchical clustering to the object table from different cameral Show that vector is clustered, is identified again to realize across camera subject.

The method that the learning objective indicates vector introduces and encodes the visual angle vector sum of Viewing-angle information and encode mesh The position vector of mark location information in the picture, and by visual angle vector sum position vector and the appearance for encoding target appearance information Vector Fusion, which generates target, indicates vector.

The visual angle vector, the vector of a fixed dimension is assigned to each observation visual angle, and vector uses random initializtion It generates, and is continued to optimize in the training process by gradient descent method.

The position vector, by the transverse and longitudinal coordinate in the detection block upper left corner and the lower right corner respectively according to the width and height of image Degree normalization, and arrangement generates in sequence.It specifically will test frame top left corner apex (x₁, y₁) and bottom right angular vertex (x₂, y₂) x, the coordinate in the direction y respectively divided by the width w of image and height h,

x’_i=x_i/w

y’_i=y_i/h

I=1 in formula, 2, so that coordinate value between 0~1, obtains normalized apex coordinate (x '₁, y '₁), (x '₂, y’₂), it is then arranged in order, then position vector b=[x '₁, y '₁, x '₂, y '₂]。

The Vector Fusion method, first to visual angle vector, position vector, appearance vector is spliced, as shown in figure 4, Splicing sequence is unlimited.Spliced vector indicates vector by full connecting-type network and by the final target of normalized output.

The triple method for digging is referred to using the collected target configuration ternary of the multiple cameras of synchronization Group, and data set is constantly updated in the training process.To giving the target observed under some camera, what other cameras observed It is positive sample with its target with common identity, is negative sample with its target with different identity.First use off-line method All triples are generated as initial data set using online mining method, that is, to be continuously evaluated after carrying out initial training to model And easily point (loss function 0) sample in training set is removed, reduce training area.

The constraint hierarchy clustering method, be based on improved object vector and class cluster distance computing method, target to Distance calculating method is as follows between amount:

IfThe corresponding expression vector of i-th of target in the camera for being c for number, then vectorBetween distance

Distance is the distance between the vector that distance is farthest in two class clusters between class cluster.

A kind of appearance similar object provided by the above embodiment of the present invention identifies fusion method across camera again, involved by And technical solution has diversity and versatility.The above embodiment of the present invention is by the observation visual angle of target and positional relationship and appearance Information is merged for extracting feature vector.Picture global characteristics figure is extracted using depth convolutional neural networks, and is examined according to target Survey the appearance vector that result extracts target on global characteristics figure；Camera is encoded, is generated comprising observation Viewing-angle information Visual angle vector；According to target, corresponding detection block position generates the position vector of target under image coordinate system.By these three to Amount fusion simultaneously generates target expression vector after transformation.Network is trained by optimizing triple loss function, is learnt Expression vector for identifying again is generated and is updated using the method that offline excavation and online mining combine in the training process Ternary group data set.Finally the corresponding expression vector of target in different cameras is carried out using the hierarchical clustering algorithm of belt restraining Cluster, realization identify again across camera subject.

Specific embodiments of the present invention are described above.It is to be appreciated that the invention is not limited to above-mentioned Particular implementation, those skilled in the art can make various deformations or amendments within the scope of the claims, this not shadow Ring substantive content of the invention.

Claims

1. a kind of appearance similar purpose identifies fusion method across camera again characterized by comprising

Using multiple cameras from different fixed angle synchronous acquisition scene images, target is obtained in observation letter in different positions Breath；

The target in each image is detected using the object detector based on depth convolutional network, exports target detection knot Fruit；

The global characteristics figure of each image is extracted using depth convolutional neural networks, and according to object detection results in global characteristics The local feature figure that target corresponding position is extracted on figure, obtains the appearance vector of target；Camera is encoded, is generated comprising seeing Survey the visual angle vector of Viewing-angle information；According to target, corresponding target detection frame position generates the position of target under image coordinate system Vector；

Appearance vector, visual angle vector sum position vector are subjected to Vector Fusion, and generate target after transformation to indicate vector；

Depth convolutional neural networks are trained using ternary group data set, target of the study for identifying again indicates vector； Wherein, in the training process, ternary group data set is generated and updated using the offline method combined with online mining of excavating；

Vector, which clusters, to be indicated to the target after the corresponding study of target in each image using constraint hierarchy clustering method, it is real Now identified again across camera subject.

2. appearance similar purpose according to claim 1 identifies fusion method across camera again, which is characterized in that the view Angular amount assigns the vector of a fixed dimension to each observation visual angle, and vector is generated using random initializtion, and in training process In continued to optimize by gradient descent method.

3. appearance similar purpose according to claim 1 identifies fusion method across camera again, which is characterized in that institute's rheme Vector is set, is normalized respectively according to the width of image and height by the transverse and longitudinal coordinate in the target detection frame upper left corner and the lower right corner, And arrangement generates in sequence.

4. appearance similar purpose according to claim 3 identifies fusion method across camera again, which is characterized in that by target Detection block top left corner apex (x₁, y₁) and bottom right angular vertex (x₂, y₂) x, the direction y coordinate is respectively divided by the width w and height of image H is spent, normalized apex coordinate (x ' is obtained_i, y '_i):

x’_i=x_i/w

y’_i=y_i/h

I=1 in formula, 2, so that coordinate value between 0~1, obtains the normalized upper left corner and lower right corner apex coordinate (x '₁, y’₁), (x '₂, y '₂), it is then arranged in order, obtains position vector b=[x '₁, y '₁, x '₂, y '₂]。

5. appearance similar purpose according to claim 1 identifies fusion method across camera again, which is characterized in that it is described to Measure fusion method are as follows: first splice to visual angle vector, position vector and appearance vector, splicing sequence is unlimited；It is spliced Vector indicates vector by full connecting-type network and by the final target of normalized output.

6. appearance similar purpose according to claim 1 identifies fusion method across camera again, which is characterized in that described to adopt With the offline method excavated and online mining combines are as follows: use the collected target configuration ternary of the multiple cameras of synchronization Group；To giving the target observed under some camera, what other cameras observed is with target that the target has common identity Positive sample has the target of different identity for negative sample with the target；First all triples are generated using offline method for digging to make It using online mining method, that is, is continuously evaluated and goes after carrying out initial training to depth convolutional neural networks for initial data set Except what data were concentrated easily divides sample, training area is reduced, the generation and update of data set are completed.

7. appearance similar purpose according to claim 6 identifies fusion method across camera again, which is characterized in that described easy Dividing sample is the sample that triple loss function is 0.

8. appearance similar purpose according to claim 1 identifies fusion method across camera again, which is characterized in that it is described about Beam hierarchy clustering method is the calculation method of distance between distance and class cluster between indicating vector based on target, in which:

Target indicates distance between vector are as follows:

IfThe corresponding target of i-th of target indicates vector in the camera for being c for number, then target indicates vectorBetween Distance

9. a kind of appearance similar purpose identifies emerging system across camera again characterized by comprising

Image capture module: including multiple cameras, from visual field model between different fixed angle synchronous acquisition scene image, camera Enclose mutual covering；

Module of target detection: the mesh in image that each camera is acquired using the object detector based on depth convolutional network Mark is detected, and target detection frame is exported；

Detection block normalizes module: by the abscissa in the target detection frame upper left corner and the lower right corner and ordinate respectively divided by image Width and height, obtain normalization detection block, it is described normalization detection block vertex abscissa and ordinate be 0~1 it Between dimensionless decimal；

Appearance vector generation module: whole characteristics of image figure of each camera acquisition is obtained using depth convolutional network, then whole It opens and local feature figure is extracted according to normalization detection block coordinate on characteristics of image figure；

Position vector generation module: according to target, corresponding target detection frame position generates the position of target under image coordinate system Set vector；

Vector Fusion module: splicing appearance vector, position vector and visual angle vector, and spliced vector by connecting entirely Direct type network simultaneously indicates vector by the final target of normalized output；

Triple generation module: ternary group data set is generated and updated using the offline method combined with online mining of excavating；

Network training module: being trained depth convolutional neural networks using ternary group data set, what study was used to identify again Target indicates vector；

Cluster Analysis module: vector, which clusters, to be indicated to the target after the corresponding study of target in each image, is realized across phase Machine target identifies again.

10. appearance similar purpose according to claim 9 identifies emerging system across camera again, which is characterized in that also wrap Include following any one or any multinomial feature:

Described image acquisition module includes 4 cameras, is respectively arranged the corner location in scene；The target energy occurred in visual field It is enough to be captured simultaneously by 4 cameras.

The target detection frame is the minimum level boundary rectangle of target under image coordinate system, wherein the vertex of target detection frame Abscissa and ordinate unit are pixel；RoI region of the target in image coordinate system is marked according to target detection frame；That is, working as The top left corner apex coordinate of target detection frame is (x₁, y₁), lower right corner apex coordinate be (x₂, y₂) when, the region RoI is (x₁, y₁)- (x₂, y₂)。