CN109800794A - A kind of appearance similar purpose identifies fusion method and system across camera again - Google Patents

A kind of appearance similar purpose identifies fusion method and system across camera again Download PDF

Info

Publication number
CN109800794A
CN109800794A CN201811630037.7A CN201811630037A CN109800794A CN 109800794 A CN109800794 A CN 109800794A CN 201811630037 A CN201811630037 A CN 201811630037A CN 109800794 A CN109800794 A CN 109800794A
Authority
CN
China
Prior art keywords
target
vector
camera
image
appearance
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201811630037.7A
Other languages
Chinese (zh)
Other versions
CN109800794B (en
Inventor
赵辉
陶卫
吕娜
何旺贵
许凌志
符钦伟
刘沅秩
郑超
冯宇
冯哲明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Jiao Tong University
Original Assignee
Shanghai Jiao Tong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Jiao Tong University filed Critical Shanghai Jiao Tong University
Priority to CN201811630037.7A priority Critical patent/CN109800794B/en
Publication of CN109800794A publication Critical patent/CN109800794A/en
Application granted granted Critical
Publication of CN109800794B publication Critical patent/CN109800794B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Image Analysis (AREA)

Abstract

本发明提供了一种外观相似目标的跨相机重识别融合方法,采用深度卷积神经网络提取图片全局特征图,并根据目标检测结果在全局特征图上提取目标的外观向量;对相机进行编码,生成包含观测视角信息的视角向量;根据目标在图像坐标系下对应的检测框位置生成目标的位置向量。将三个向量融合并经过变换后生成目标表示向量。通过优化三元组损失函数对网络进行训练,学习用于重识别的表示向量,在训练过程中采用离线挖掘和在线挖掘相结合的方法生成和更新三元组数据集。最后采用带约束的层次聚类算法对不同的相机中目标对应的表示向量进行聚类,实现跨相机目标重识别。同时提供了一种外观相似目标的跨相机重识别融合系统。本发明提高了重识别的准确性。

The invention provides a cross-camera re-identification fusion method for similar-looking targets, which adopts a deep convolutional neural network to extract a global feature map of a picture, and extracts the target's appearance vector on the global feature map according to the target detection result; encodes the camera, Generate a perspective vector containing the observation perspective information; generate the target position vector according to the corresponding detection frame position of the target in the image coordinate system. The three vectors are fused and transformed to generate the target representation vector. The network is trained by optimizing the triplet loss function, and the representation vector for re-identification is learned. During the training process, a combination of offline mining and online mining is used to generate and update the triplet dataset. Finally, the constrained hierarchical clustering algorithm is used to cluster the representation vectors corresponding to the targets in different cameras to realize cross-camera target re-identification. At the same time, a cross-camera re-identification fusion system for similar-looking targets is provided. The present invention improves the accuracy of re-identification.

Description

A kind of appearance similar purpose identifies fusion method and system across camera again
Technical field
The present invention relates to one of mode identification technology weight identification technologies, and in particular, to a kind of appearance is similar Target identifies fusion method and system across camera again.
Background technique
Target identifies again to be referred to: for some or certain given targets, determine in other times or visual angle whether In the presence of the target with same identity.Identify that earliest proposition is chased after in the polyphaser tracking field of target and polyphaser again One of key technology of track.Specifically, when under the visual angle that the same target appears in multiple cameras, it is thus necessary to determine that Duo Geguan The uniqueness for the target identities that survey person is observed.In timing, target is also required to weight in the identity of the identity of different moments Identification technology is safeguarded.Target identifies that especially pedestrian identifies the concern and research obtained in recent years especially again again, not only by Have a wide range of applications field in it, such as monitor, scheduling, security protection etc., due also to its technology for being related at present also relative to not enough at It is ripe, need further to be furtherd investigate.In recent years, weight identification technology is also used in the fields such as bio-identification and image retrieval, such as Face verification etc..
Due to the complexity of application scenarios, the same target possibly is present under different illumination conditions, can also be set because of acquisition Standby, acquisition angles differences and generate biggish difference.In addition to this, the serious shielding under intensive scene, leads to target information It is imperfect.In practical applications, similar or target with common identity often shows biggish variation and difference, without Similar or target with different identity may be similar in certain features and be difficult to differentiate between.These phenomenons are all identified to target again Cause huge difficulty.The key identified again is to extract, and there is the target expression of strong characterization ability and metric objective to indicate it Between the index of similarity or distance.Target expression should have very strong identification, can show the exclusive feature of target, from And different targets is distinguished, Measure Indexes should be able to accurately portray the similarity or distance between target, to realize reliable Verifying and sequencing of similarity.
The recognition methods again of mainstream at present mainly utilizes the appearance information (including color, texture, shape etc.) of target, can The characteristics of largely expressing target.But the appearance of target can usually be influenced by many external factors, thus it is this There is very big unstability in method.In addition, the appearance between individual compares phase for some targets (vehicle of such as same type) Seemingly, difference very little, this can usually make appearance information failure when distinguishing them, to be difficult or can not pass through appearance at all Model realization identifies again.It is less to the heavy Study of recognition of this kind of target with similar appearance at present, belong to blank field.
Summary of the invention
The present invention is directed to current existing heavy recognition methods existing limitation and deficiency in terms of similar purpose identification, proposes A kind of appearance similar purpose identifies fusion method and system across camera again, this method and system be it is a kind of by Viewing-angle information and The object table dendrography learning method and system that position constraint relationship and appearance information blend: in learning objective character representation vector When, the appearance information of target is not only utilized, while the position of target and Viewing-angle information being incorporated among appearance information, and one Learnt with neural network is sent into so that network not only between learning objective appearance similitude, also learnt position and visual angle Relevance, vector is indicated with more the target of identification by similitude and the study of the aspect of relevance two.Using boarding steps Descent method training network is spent, so that the distance indicated between vector in study with common identity target, which is less than, has different identity The distance of target indicated between vector, i.e. inter- object distance are less than between class distance.In the training process using offline excavation and online It excavates the method combined and generates and update ternary group data set, improve training effectiveness and convergence rate, model is avoided to fall into Local optimum.On the basis of obtaining target indicates vector, using the hierarchical clustering based on distance to the mesh from multiple cameras Mark indicates that vector is clustered, and is thought by the target for being polymerized to a kind of with common identity, to realize the identification again across camera.Layer The foundation of secondary cluster is the distance between vector, and the vector being closer is polymerized to one kind by it, in order to avoid that will come from same camera Target be clustered into same identity, the present invention improves the calculation method of distance vector in hierarchical clustering, so that coming from The distance of identical camera indicated between vector is infinity, ensure that the target under the same camera perspective can not be polymerized to one Class improves the accuracy identified again.
The present invention is achieved by the following technical solutions.
According to an aspect of the invention, there is provided a kind of appearance similar purpose identifies fusion method across camera again, packet Include following steps:
Using multiple cameras from different fixed angle synchronous acquisition scene images, target is obtained in sight in different positions Measurement information;
The target in each image is detected using the object detector based on depth convolutional network, output target inspection As a result;
The global characteristics figure of each image is extracted using depth convolutional neural networks, and according to object detection results in the overall situation The local feature figure that target corresponding position is extracted on characteristic pattern, obtains the appearance vector of target;Camera is encoded, packet is generated The visual angle vector of the Viewing-angle information containing observation;According to target, corresponding target detection frame position generates target under image coordinate system Position vector;
By appearance vector, visual angle vector sum position vector carry out Vector Fusion, and by transformation after generate target indicate to Amount;
Depth convolutional neural networks are trained using ternary group data set, study for the target that identifies again indicate to Amount;Wherein, in the training process, using the offline method generation and update triple data excavated and online mining combines Collection;
Vector, which gathers, to be indicated to the target after the corresponding study of target in each image using constraint hierarchy clustering method Class, realization identify again across camera subject.
Preferably, the visual angle vector assigns the vector of a fixed dimension to each observation visual angle, and vector is using random first Beginning metaplasia by gradient descent method at and being continued to optimize in the training process.
Preferably, the position vector, by the transverse and longitudinal coordinate in the target detection frame upper left corner and the lower right corner respectively according to figure The width and height of picture normalize, and arrangement generates in sequence.
Preferably, by target detection frame top left corner apex (x1, y1) and bottom right angular vertex (x2, y2) x, the direction y coordinate point Not divided by the width w of image and height h, normalized apex coordinate (x ' is obtainedi, y 'i):
x’i=xi/w
y’i=yi/h
I=1 in formula, 2, so that coordinate value between 0~1, obtains the normalized upper left corner and lower right corner apex coordinate (x’1, y '1), (x '2, y '2), it is then arranged in order, obtains position vector b=[x '1, y '1, x '2, y '2]。
Preferably, the Vector Fusion method are as follows: first visual angle vector, position vector and appearance vector are spliced, Splicing sequence is unlimited;Spliced vector indicates vector by full connecting-type network and by the final target of normalized output.
Preferably, described using the offline method excavated and online mining combines are as follows: to use the multiple cameras of synchronization Collected target configuration triple;To giving the target observed under some camera, other cameras observe with the target Target with common identity is positive sample, has the target of different identity for negative sample with the target;First excavated using offline Method generates all triples as initial data set, after carrying out initial training to depth convolutional neural networks, is dug using online Pick method is continuously evaluated and removes in data set and easily divide sample, reduces training area, complete the generation and more of data set Newly.
Preferably, described easily sample to be divided to be the sample that triple loss function is 0.
Preferably, the constraint hierarchy clustering method is the calculating of distance between distance and class cluster between indicating vector based on target Method, in which:
Target indicates distance between vector are as follows:
IfThe corresponding target of i-th of target indicates vector in the camera for being c for number, then target indicates vector Between distance
Distance between class cluster are as follows: the target that distance is farthest in two class clusters indicates the distance between vector.
According to another aspect of the present invention, provide a kind of appearance similar purpose identifies emerging system across camera again, Include:
Image capture module: it including multiple cameras, from different fixed angle synchronous acquisition scene images, is regarded between camera Wild range mutually covers;
Module of target detection: in the image acquired using the object detector based on depth convolutional network to each camera Target detected, export target detection frame;
Detection block normalizes module: by the abscissa in the target detection frame upper left corner and the lower right corner and ordinate respectively divided by The width and height of image, obtain normalization detection block, and the vertex abscissa and ordinate of the normalization detection block be 0~ Dimensionless decimal between 1;
Appearance vector generation module: obtaining whole characteristics of image figure of each camera acquisition using depth convolutional network, then Local feature figure is extracted according to normalization detection block coordinate on whole characteristics of image figure;
Position vector generation module: according to target, corresponding target detection frame position generates target under image coordinate system Position vector;
Visual angle vector generation module: encoding camera, generates the visual angle vector comprising observation Viewing-angle information;
Vector Fusion module: splicing appearance vector, position vector and visual angle vector, and spliced vector passes through Full connecting-type network simultaneously indicates vector by the final target of normalized output;
Triple generation module: triple number is generated and updated using the offline method combined with online mining of excavating According to collection;
Network training module: being trained depth convolutional neural networks using ternary group data set, and study is for knowing again Other target indicates vector;
Cluster Analysis module: vector, which clusters, to be indicated to the target after the corresponding study of target in each image, is realized It is identified again across camera subject.
Preferably, the appearance similar purpose identifies emerging system across camera again, further includes following any one or appoints It anticipates multinomial feature:
Described image acquisition module includes 4 cameras, is respectively arranged the corner location in scene;The mesh occurred in visual field Mark can be captured by 4 cameras simultaneously.
The target detection frame is the minimum level boundary rectangle of target under image coordinate system, wherein target detection frame Vertex abscissa and ordinate unit are pixel;RoI region of the target in image coordinate system is marked according to target detection frame; That is, when the top left corner apex coordinate of target detection frame is (x1, y1), lower right corner apex coordinate be (x2, y2) when, the region RoI is (x1, y1)-(x2, y2)。
Compared with prior art, the invention has the following beneficial effects:
Appearance similar purpose provided by the present invention identifies fusion method and system across camera again, and the visual angle of target is believed Breath and location information and appearance information are fed together depth network and carry out clarification of objective extraction, on the one hand improve the Shandong of feature Stick, identification of attaching most importance to increase information source;On the other hand reducing target indicates dependence of the vector to target appearance information, so that Weight identification technology can be extended to the similar target of appearance using upper.In learning process, using offline triple excavate and Line triple excavates the method combined, avoids the training effectiveness that model is improved while model falls into local optimum.Separately Outside, the present invention is added to constraint to the cluster of general level, and the observation object avoided under same view angle is polymerized to a kind of mistake The accidentally generation of situation, improves the accuracy of cluster, and then improves weight recognition performance.
Detailed description of the invention
Upon reading the detailed description of non-limiting embodiments with reference to the following drawings, other feature of the invention, Objects and advantages will become more apparent upon:
Fig. 1 is a kind of work for identifying fusion method again across camera of appearance similar purpose provided by one embodiment of the invention Make flow chart;
Fig. 2 (a) is that the appearance similar purpose of multiple target polyphaser in one embodiment of the invention identifies fusion across camera again System schematic;
Fig. 2 (b) is to be formed by target detection frame schematic diagram under module of target detection in one embodiment of the invention;
Fig. 3 is weight identification process schematic diagram in one embodiment of the invention;
Fig. 4 is that each vector splices schematic diagram in one embodiment of the invention;
Fig. 5 is that triple constitutes schematic diagram in one embodiment of the invention;
It is labeled data collection exploitation marking software (sample marking software) surface chart that Fig. 6, which is in one embodiment of the invention,.
Specific embodiment
Elaborate below to the embodiment of the present invention: the present embodiment carries out under the premise of the technical scheme of the present invention Implement, the detailed implementation method and specific operation process are given.It should be pointed out that those skilled in the art For, without departing from the inventive concept of the premise, various modifications and improvements can be made, these belong to guarantor of the invention Protect range.
Embodiment
As shown in Figure 1, present embodiments provide a kind of appearance similar object identifies fusion method, including figure across camera again Shape acquisition, target detection, detection block normalization, generate appearance vector, generate position vector, generate visual angle vector, Vector Fusion, Generate triple, network training and clustering step.
It is specific as follows:
(1)Image Acquisition: use multiple cameras from visual field model between different fixed angle synchronous acquisition scene image, camera Enclose mutual covering, target can appear in simultaneously in the visual field of multiple cameras, thus obtain target have in different positions compared with For complete observation information, while target being made to occur the optimal position of camera as far as possible.
In this step, the module of target detection as shown in Fig. 2 (a) can be used, which is arranged 4 cameras altogether, point It is not arranged in the corner location of scene;Occur 8 targets in visual field altogether, can be captured simultaneously by 4 cameras.
(2)Target detection: using the object detector based on depth convolutional network to the target in each camera image into Row detection, exports the detection block of target, and detection block is the minimum level boundary rectangle of target under image coordinate system, the horizontal seat in vertex Mark is pixel with ordinate unit.For example, being formed by under a certain camera such as Fig. 2 in the module of target detection as shown in Fig. 2 (a) (b) target detection frame shown in, top left corner apex (x1, y1) and bottom right angular vertex (x2, y2).It is drawn according to this target detection frame The region RoI (Region ofInterest) of target in image coordinate system out.
The image of one target of the capture of some camera as shown in Fig. 2 (b), the upper left corner of the horizontal detection block of output Apex coordinate is (x1, y1), lower right corner apex coordinate be (x2, y2).The region RoI can be taken as (x1, y1)-(x2, y2)。
(3)Detection block normalization: by the abscissa in the target detection frame upper left corner and the lower right corner and ordinate respectively divided by figure The width and height of picture, to obtain new normalization detection block, the vertex abscissa and ordinate of the normalization detection block are equal For the dimensionless decimal between 0~1.
The direction the x width and the direction y height of detection block are respectively w and h, then the apex coordinate for normalizing detection block is respectively (x’i, y '1), (x '2, y '2), in which:
x’i=xi/w
y’i=yi/h
In formula, i=1,2, coordinate value x1′、x2′、y1′、y2' value be respectively positioned between 0~1.
(4)Generate appearance vector: feature extraction is used as using depth convolutional network (such as VGGNet, ResNet etc.) first Device carries out the operations such as a series of convolution, pond to the picture of each camera, obtains the further feature figure of one layer of convolutional layer, the spy Sign figure has image overall semantic information;Then target pair is extracted on further feature figure according to the detection block that target detection exports The local feature figure of position is answered, which has different scales according to the size of detection block;In order to obtain unification The target appearance vector of regular length convenient for operation between subsequent vector and compares, and carries out RoIPooling to local characteristic pattern Operation exports the normalizing aspect vector with unified regular length;This feature vector passes through the mapping of subsequent full articulamentum Final appearance vector is exported afterwards.In this step, when extracting the local feature figure of target corresponding position, whole image is first calculated Characteristic pattern, then on characteristic pattern according to normalization detection block coordinate extract local feature figure.
(5)Generate position vector: camera is fixed, target is in the situation of ground motion, the same target appears in difference Position vector in camera image has determining relationship, and this relationship, which can be used as, judges whether two observed objects have phase With one of the foundation of identity.Present invention introduces the new concepts of position vector, using the position of target in the picture as target One feature is for identifying again.The position vector of target is defined as the upper left of the normalization detection block of the target under image coordinate system The coordinate arrangement at angle and bottom right angular vertex.
The transverse and longitudinal coordinate for normalizing the detection block upper left corner and the lower right corner is arranged in sequence, generates position vector b= [x’1, y '1, x '2, y '2]。
(6)Generate visual angle vector: when determining whether the middle target of two images has same identity, the angle of object observing Spending (visual angle) is also a very important factor.For example, pattern is often different when same target is from front and back, but In the case where visual angle determines, the feature of interest observed from all angles has a determining relationship, this relationship naturally and Visual angle is related.The new concept of the present embodiment introducing visual angle vector: designing a two dimension first, and matrix column number is equal to the number of camera Amount, the line number of matrix generally take 4,8,16 etc., and each element of matrix is filled with random number, and random number, which can be, obeys Gauss point Number between cloth or equally distributed 0 to 1;The coding of the corresponding camera of each column of matrix, is defined as the visual angle of the camera Vector;The visual angle vector is continued to optimize by gradient descent method in the training process.
For example, if there is M camera in system, number 0,1 ..., M-1, for i-th of camera, one-hot is compiled CodeIt is one in addition to the M dimensional vector that the (i-1)-th dimension is that 1 its codimension is 0, defines camera matrixTo matrix V random initializtion (such as according to standardized normal distribution), a column v of matrixiThe coding for indicating (i-1)-th camera, then have vi=Vci, Wherein vi is the visual angle vector generated, optimizes the vector in network training process.
(7)Vector Fusion: appearance vector, position vector and visual angle vector are merged to generate final target and indicate Vector.First vector is spliced, as shown in Figure 4.Spliced vector is by full connecting-type network and passes through normalized output Final target indicates vector.
(8)Generate triple: in order to learn to have the target of stronger characterization ability to indicate vector, need to feature extraction net Network is trained.To giving the target observed under some camera, other cameras observe with its mesh with common identity It is designated as positive sample, is negative sample with its target with different identity, accordingly generates ternary group data set.The present embodiment use from The method that line excavates and online mining combines generates the triple of training network, convergence of the composition of triple to network training Property has vital effect.Network in the present embodiment needs relative positional relationship of the learning objective between camera, using same Target configuration triple in one moment multiple camera acquired images.To giving the target observed under some camera, What his camera observed is positive sample with its target with common identity, is negative sample with its target with different identity.
Specifically, as shown in figure 5, setting t moment camera c observes that identity is expressed as O as the target of lT, c, lIt is if selecting it Reference sample, to another observed object OT ', c ', l ', when meeting t '=t, when c ' ≠ c, l '=l, OT ', c ', l 'For positive sample, when full When sufficient t '=t, c ' ≠ c, l ' ≠ l, OT ', c ', l 'For negative sample.All triples at each moment are first generated according to above-mentioned rule, The triple of different moments is put together into composing training collection.At training initial stage, model falls into locally optimal solution in order to prevent, first Model is trained using samples all in training set, since training set is larger, model convergence rate is slower.By several wheels After (wheel indicates traversal all samples of training set) training, then online triple method for digging is taken, specifically, each small quantities of Easily point (loss function 0) sample in this batch is removed from training set after the completion of amount (mini-batch) training, to guarantee The sample will not be collected in subsequent sampling, in this way, difficulty divides sample and half difficult point in training set with trained continuous progress The ratio of sample is continuously increased, and improves training effectiveness and convergence rate, while still easily dividing sample comprising part in training set, can To avoid model collapsing.
(9)Network training: in order to learn to indicate vector with the target of strong characterization ability, feature extraction network is instructed Practice.To giving the target observed under some camera, what other cameras observed is positive sample with its target with common identity This, is negative sample with its target with different identity, accordingly generates ternary group data set
For example, sample marking software is as shown in fig. 6, detection block b corresponding to certain target in given image/and image, The camera numbers c of image, neural network forecast target indicate vector e=f (I, b, c) ∈ Rd, f () is a transforming function transformation function, refers to feature Extract network.Certain target observed (reference sample) on given specific imageAnother is selected to have phase with it With the observed object (positive sample) of identityWith its observed object (negative sample) with different identity Then triple loss function are as follows:
Wherein i is ternary deck label, []+=max (0), a, p, n respectively indicate anchor, positive, Negative, m are the intervals of positive negative sample.Triple lose so that with different identity target indicate vector between distance with Certain interval is greater than the distance between the vector with common identity.
(10)Clustering: hierarchical clustering is constantly merged and is decomposed to class cluster according to the distance between class cluster, identical The distance between the corresponding vector of different target in camera is smaller sometimes, if not adding constraint, may miss and gather them At one kind.In order to solve this problem, the present embodiment using constraint hierarchy clustering method to learn target expression vector into Row cluster, to realize the target identities identification across camera.Distance between definition vector of the present invention are as follows: when two targets indicate vector When from different cameral, Euclidean distance of the distance between they between vector, when two targets indicate that vectors come from identical camera When, the distance between they are positive infinite.In addition, distance is between the point of two class cluster lie farthest aways between the present invention defines class cluster Distance, when the vector in identical camera is polymerized to one kind, distance is positive infinite, this constraint subtracts to a certain extent The generation of cluster mistake is lacked.
For example, settingThe corresponding characterization vector of i-th of target in the camera for being c for number, definition vectorBetween away from From
In addition, distance between class cluster is using the distance between the point of two class cluster lie farthest aways, when in identical camera When vector is polymerized to one kind, distance be positive it is infinite, this constraint reduce to a certain extent cluster mistake generation.
Hierarchical clustering: after the completion of training, it is corresponding that the target that all cameras detect is calculated using trained network model Expression vector.Characterization vector is clustered using hierarchical clustering polymerization, wherein the distance between vector calculates according to the following formula:
Distance between class cluster takes the distance between the point of two class cluster lie farthest aways.Cluster is that similar target thinks with phase Same identity, to complete to identify again.
In the present embodiment:
Provided appearance similar purpose identifies fusion method across camera again, and main thought is to regard the observation of target Angle and positional relationship indicate vector for learning objective in conjunction with appearance information, and use in the training process is offline and online three Tuple excavates the method generation and more new data set combined, and using constraint hierarchical clustering to the object table from different cameral Show that vector is clustered, is identified again to realize across camera subject.
The method that the learning objective indicates vector introduces and encodes the visual angle vector sum of Viewing-angle information and encode mesh The position vector of mark location information in the picture, and by visual angle vector sum position vector and the appearance for encoding target appearance information Vector Fusion, which generates target, indicates vector.
The visual angle vector, the vector of a fixed dimension is assigned to each observation visual angle, and vector uses random initializtion It generates, and is continued to optimize in the training process by gradient descent method.
The position vector, by the transverse and longitudinal coordinate in the detection block upper left corner and the lower right corner respectively according to the width and height of image Degree normalization, and arrangement generates in sequence.It specifically will test frame top left corner apex (x1, y1) and bottom right angular vertex (x2, y2) x, the coordinate in the direction y respectively divided by the width w of image and height h,
x’i=xi/w
y’i=yi/h
I=1 in formula, 2, so that coordinate value between 0~1, obtains normalized apex coordinate (x '1, y '1), (x '2, y’2), it is then arranged in order, then position vector b=[x '1, y '1, x '2, y '2]。
The Vector Fusion method, first to visual angle vector, position vector, appearance vector is spliced, as shown in figure 4, Splicing sequence is unlimited.Spliced vector indicates vector by full connecting-type network and by the final target of normalized output.
The triple method for digging is referred to using the collected target configuration ternary of the multiple cameras of synchronization Group, and data set is constantly updated in the training process.To giving the target observed under some camera, what other cameras observed It is positive sample with its target with common identity, is negative sample with its target with different identity.First use off-line method All triples are generated as initial data set using online mining method, that is, to be continuously evaluated after carrying out initial training to model And easily point (loss function 0) sample in training set is removed, reduce training area.
The constraint hierarchy clustering method, be based on improved object vector and class cluster distance computing method, target to Distance calculating method is as follows between amount:
IfThe corresponding expression vector of i-th of target in the camera for being c for number, then vectorBetween distance
Distance is the distance between the vector that distance is farthest in two class clusters between class cluster.
A kind of appearance similar object provided by the above embodiment of the present invention identifies fusion method across camera again, involved by And technical solution has diversity and versatility.The above embodiment of the present invention is by the observation visual angle of target and positional relationship and appearance Information is merged for extracting feature vector.Picture global characteristics figure is extracted using depth convolutional neural networks, and is examined according to target Survey the appearance vector that result extracts target on global characteristics figure;Camera is encoded, is generated comprising observation Viewing-angle information Visual angle vector;According to target, corresponding detection block position generates the position vector of target under image coordinate system.By these three to Amount fusion simultaneously generates target expression vector after transformation.Network is trained by optimizing triple loss function, is learnt Expression vector for identifying again is generated and is updated using the method that offline excavation and online mining combine in the training process Ternary group data set.Finally the corresponding expression vector of target in different cameras is carried out using the hierarchical clustering algorithm of belt restraining Cluster, realization identify again across camera subject.
Specific embodiments of the present invention are described above.It is to be appreciated that the invention is not limited to above-mentioned Particular implementation, those skilled in the art can make various deformations or amendments within the scope of the claims, this not shadow Ring substantive content of the invention.

Claims (10)

1. a kind of appearance similar purpose identifies fusion method across camera again characterized by comprising
Using multiple cameras from different fixed angle synchronous acquisition scene images, target is obtained in observation letter in different positions Breath;
The target in each image is detected using the object detector based on depth convolutional network, exports target detection knot Fruit;
The global characteristics figure of each image is extracted using depth convolutional neural networks, and according to object detection results in global characteristics The local feature figure that target corresponding position is extracted on figure, obtains the appearance vector of target;Camera is encoded, is generated comprising seeing Survey the visual angle vector of Viewing-angle information;According to target, corresponding target detection frame position generates the position of target under image coordinate system Vector;
Appearance vector, visual angle vector sum position vector are subjected to Vector Fusion, and generate target after transformation to indicate vector;
Depth convolutional neural networks are trained using ternary group data set, target of the study for identifying again indicates vector; Wherein, in the training process, ternary group data set is generated and updated using the offline method combined with online mining of excavating;
Vector, which clusters, to be indicated to the target after the corresponding study of target in each image using constraint hierarchy clustering method, it is real Now identified again across camera subject.
2. appearance similar purpose according to claim 1 identifies fusion method across camera again, which is characterized in that the view Angular amount assigns the vector of a fixed dimension to each observation visual angle, and vector is generated using random initializtion, and in training process In continued to optimize by gradient descent method.
3. appearance similar purpose according to claim 1 identifies fusion method across camera again, which is characterized in that institute's rheme Vector is set, is normalized respectively according to the width of image and height by the transverse and longitudinal coordinate in the target detection frame upper left corner and the lower right corner, And arrangement generates in sequence.
4. appearance similar purpose according to claim 3 identifies fusion method across camera again, which is characterized in that by target Detection block top left corner apex (x1, y1) and bottom right angular vertex (x2, y2) x, the direction y coordinate is respectively divided by the width w and height of image H is spent, normalized apex coordinate (x ' is obtainedi, y 'i):
x’i=xi/w
y’i=yi/h
I=1 in formula, 2, so that coordinate value between 0~1, obtains the normalized upper left corner and lower right corner apex coordinate (x '1, y’1), (x '2, y '2), it is then arranged in order, obtains position vector b=[x '1, y '1, x '2, y '2]。
5. appearance similar purpose according to claim 1 identifies fusion method across camera again, which is characterized in that it is described to Measure fusion method are as follows: first splice to visual angle vector, position vector and appearance vector, splicing sequence is unlimited;It is spliced Vector indicates vector by full connecting-type network and by the final target of normalized output.
6. appearance similar purpose according to claim 1 identifies fusion method across camera again, which is characterized in that described to adopt With the offline method excavated and online mining combines are as follows: use the collected target configuration ternary of the multiple cameras of synchronization Group;To giving the target observed under some camera, what other cameras observed is with target that the target has common identity Positive sample has the target of different identity for negative sample with the target;First all triples are generated using offline method for digging to make It using online mining method, that is, is continuously evaluated and goes after carrying out initial training to depth convolutional neural networks for initial data set Except what data were concentrated easily divides sample, training area is reduced, the generation and update of data set are completed.
7. appearance similar purpose according to claim 6 identifies fusion method across camera again, which is characterized in that described easy Dividing sample is the sample that triple loss function is 0.
8. appearance similar purpose according to claim 1 identifies fusion method across camera again, which is characterized in that it is described about Beam hierarchy clustering method is the calculation method of distance between distance and class cluster between indicating vector based on target, in which:
Target indicates distance between vector are as follows:
IfThe corresponding target of i-th of target indicates vector in the camera for being c for number, then target indicates vectorBetween Distance
Distance between class cluster are as follows: the target that distance is farthest in two class clusters indicates the distance between vector.
9. a kind of appearance similar purpose identifies emerging system across camera again characterized by comprising
Image capture module: including multiple cameras, from visual field model between different fixed angle synchronous acquisition scene image, camera Enclose mutual covering;
Module of target detection: the mesh in image that each camera is acquired using the object detector based on depth convolutional network Mark is detected, and target detection frame is exported;
Detection block normalizes module: by the abscissa in the target detection frame upper left corner and the lower right corner and ordinate respectively divided by image Width and height, obtain normalization detection block, it is described normalization detection block vertex abscissa and ordinate be 0~1 it Between dimensionless decimal;
Appearance vector generation module: whole characteristics of image figure of each camera acquisition is obtained using depth convolutional network, then whole It opens and local feature figure is extracted according to normalization detection block coordinate on characteristics of image figure;
Position vector generation module: according to target, corresponding target detection frame position generates the position of target under image coordinate system Set vector;
Visual angle vector generation module: encoding camera, generates the visual angle vector comprising observation Viewing-angle information;
Vector Fusion module: splicing appearance vector, position vector and visual angle vector, and spliced vector by connecting entirely Direct type network simultaneously indicates vector by the final target of normalized output;
Triple generation module: ternary group data set is generated and updated using the offline method combined with online mining of excavating;
Network training module: being trained depth convolutional neural networks using ternary group data set, what study was used to identify again Target indicates vector;
Cluster Analysis module: vector, which clusters, to be indicated to the target after the corresponding study of target in each image, is realized across phase Machine target identifies again.
10. appearance similar purpose according to claim 9 identifies emerging system across camera again, which is characterized in that also wrap Include following any one or any multinomial feature:
Described image acquisition module includes 4 cameras, is respectively arranged the corner location in scene;The target energy occurred in visual field It is enough to be captured simultaneously by 4 cameras.
The target detection frame is the minimum level boundary rectangle of target under image coordinate system, wherein the vertex of target detection frame Abscissa and ordinate unit are pixel;RoI region of the target in image coordinate system is marked according to target detection frame;That is, working as The top left corner apex coordinate of target detection frame is (x1, y1), lower right corner apex coordinate be (x2, y2) when, the region RoI is (x1, y1)- (x2, y2)。
CN201811630037.7A 2018-12-27 2018-12-27 A cross-camera re-identification fusion method and system for similar-looking targets Active CN109800794B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811630037.7A CN109800794B (en) 2018-12-27 2018-12-27 A cross-camera re-identification fusion method and system for similar-looking targets

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811630037.7A CN109800794B (en) 2018-12-27 2018-12-27 A cross-camera re-identification fusion method and system for similar-looking targets

Publications (2)

Publication Number Publication Date
CN109800794A true CN109800794A (en) 2019-05-24
CN109800794B CN109800794B (en) 2021-10-22

Family

ID=66558020

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811630037.7A Active CN109800794B (en) 2018-12-27 2018-12-27 A cross-camera re-identification fusion method and system for similar-looking targets

Country Status (1)

Country Link
CN (1) CN109800794B (en)

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110232717A (en) * 2019-06-10 2019-09-13 北京壹氢科技有限公司 A kind of target identity recognition methods suitable for multipair multi-targets recognition
CN110688976A (en) * 2019-10-09 2020-01-14 创新奇智(北京)科技有限公司 Store comparison method based on image identification
CN110866478A (en) * 2019-11-06 2020-03-06 支付宝(杭州)信息技术有限公司 Method, device and equipment for identifying object in image
CN111462240A (en) * 2020-04-08 2020-07-28 北京理工大学 A target localization method based on multi-monocular vision fusion
CN112446270A (en) * 2019-09-05 2021-03-05 华为技术有限公司 Training method of pedestrian re-identification network, and pedestrian re-identification method and device
CN112818837A (en) * 2021-01-29 2021-05-18 山东大学 Aerial photography vehicle weight recognition method based on attitude correction and difficult sample perception
CN113011440A (en) * 2021-03-19 2021-06-22 中联煤层气有限责任公司 Coal bed gas well field monitoring heavy identification technology
CN113516146A (en) * 2020-12-21 2021-10-19 腾讯科技(深圳)有限公司 Data classification method, computer and readable storage medium
CN113515995A (en) * 2020-12-15 2021-10-19 阿里巴巴集团控股有限公司 Re-identification of moving objects, model training method, equipment and storage medium
CN113536946A (en) * 2021-06-21 2021-10-22 清华大学 A self-supervised pedestrian re-identification method based on camera relationship
CN114898314A (en) * 2022-04-29 2022-08-12 广州文远知行科技有限公司 Target detection method, device and equipment for driving scene and storage medium
CN115147453A (en) * 2021-03-30 2022-10-04 阿里巴巴新加坡控股有限公司 Model training method, device and moving object re-identification method
CN115641559A (en) * 2022-12-23 2023-01-24 深圳佑驾创新科技有限公司 Target matching method and device for panoramic camera group and storage medium
CN115661780A (en) * 2022-12-23 2023-01-31 深圳佑驾创新科技有限公司 Camera target matching method and device under cross view angle and storage medium
CN113515995B (en) * 2020-12-15 2025-02-18 阿里巴巴集团控股有限公司 Mobile object re-identification, model training method, device and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160196652A1 (en) * 2013-09-04 2016-07-07 Menglong YE Method and apparatus
CN106599795A (en) * 2016-11-24 2017-04-26 武汉大学 Dynamic low-resolution pedestrian re-identification method based on scale distance gradient function interface learning
CN106803063A (en) * 2016-12-21 2017-06-06 华中科技大学 A kind of metric learning method that pedestrian recognizes again
CN107832672A (en) * 2017-10-12 2018-03-23 北京航空航天大学 A kind of pedestrian's recognition methods again that more loss functions are designed using attitude information
CN108121970A (en) * 2017-12-25 2018-06-05 武汉大学 A kind of recognition methods again of the pedestrian based on difference matrix and matrix measures
CN108875588A (en) * 2018-05-25 2018-11-23 武汉大学 Across camera pedestrian detection tracking based on deep learning

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160196652A1 (en) * 2013-09-04 2016-07-07 Menglong YE Method and apparatus
CN106599795A (en) * 2016-11-24 2017-04-26 武汉大学 Dynamic low-resolution pedestrian re-identification method based on scale distance gradient function interface learning
CN106803063A (en) * 2016-12-21 2017-06-06 华中科技大学 A kind of metric learning method that pedestrian recognizes again
CN107832672A (en) * 2017-10-12 2018-03-23 北京航空航天大学 A kind of pedestrian's recognition methods again that more loss functions are designed using attitude information
CN108121970A (en) * 2017-12-25 2018-06-05 武汉大学 A kind of recognition methods again of the pedestrian based on difference matrix and matrix measures
CN108875588A (en) * 2018-05-25 2018-11-23 武汉大学 Across camera pedestrian detection tracking based on deep learning

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
侯丽: ""多特征融合在摄像机网络行人跟踪中的关键技术研究"", 《中国博士学位论文全文数据库 信息科技辑》 *
李耀斌: ""面向真实场景的行人检测技术研究"", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *
罗建豪等: ""基于深度卷积特征的细粒度图像分类研究综述"", 《自动化学报》 *

Cited By (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110232717A (en) * 2019-06-10 2019-09-13 北京壹氢科技有限公司 A kind of target identity recognition methods suitable for multipair multi-targets recognition
CN112446270A (en) * 2019-09-05 2021-03-05 华为技术有限公司 Training method of pedestrian re-identification network, and pedestrian re-identification method and device
CN112446270B (en) * 2019-09-05 2024-05-14 华为云计算技术有限公司 Training method of pedestrian re-recognition network, pedestrian re-recognition method and device
CN110688976A (en) * 2019-10-09 2020-01-14 创新奇智(北京)科技有限公司 Store comparison method based on image identification
CN110866478A (en) * 2019-11-06 2020-03-06 支付宝(杭州)信息技术有限公司 Method, device and equipment for identifying object in image
CN110866478B (en) * 2019-11-06 2022-04-29 支付宝(杭州)信息技术有限公司 Method, device and equipment for identifying object in image
CN111462240B (en) * 2020-04-08 2023-05-30 北京理工大学 A target localization method based on multi-monocular vision fusion
CN111462240A (en) * 2020-04-08 2020-07-28 北京理工大学 A target localization method based on multi-monocular vision fusion
CN113515995B (en) * 2020-12-15 2025-02-18 阿里巴巴集团控股有限公司 Mobile object re-identification, model training method, device and storage medium
CN113515995A (en) * 2020-12-15 2021-10-19 阿里巴巴集团控股有限公司 Re-identification of moving objects, model training method, equipment and storage medium
CN113516146A (en) * 2020-12-21 2021-10-19 腾讯科技(深圳)有限公司 Data classification method, computer and readable storage medium
CN113516146B (en) * 2020-12-21 2025-02-07 腾讯科技(深圳)有限公司 A data classification method, computer and readable storage medium
CN112818837A (en) * 2021-01-29 2021-05-18 山东大学 Aerial photography vehicle weight recognition method based on attitude correction and difficult sample perception
CN112818837B (en) * 2021-01-29 2022-11-11 山东大学 Aerial photography vehicle weight recognition method based on attitude correction and difficult sample perception
CN113011440B (en) * 2021-03-19 2023-11-28 中联煤层气有限责任公司 A coalbed methane well site monitoring and re-identification technology
CN113011440A (en) * 2021-03-19 2021-06-22 中联煤层气有限责任公司 Coal bed gas well field monitoring heavy identification technology
CN115147453A (en) * 2021-03-30 2022-10-04 阿里巴巴新加坡控股有限公司 Model training method, device and moving object re-identification method
CN113536946B (en) * 2021-06-21 2024-04-19 清华大学 Self-supervision pedestrian re-identification method based on camera relationship
CN113536946A (en) * 2021-06-21 2021-10-22 清华大学 A self-supervised pedestrian re-identification method based on camera relationship
CN114898314A (en) * 2022-04-29 2022-08-12 广州文远知行科技有限公司 Target detection method, device and equipment for driving scene and storage medium
CN114898314B (en) * 2022-04-29 2024-08-16 广州文远知行科技有限公司 Method, device, equipment and storage medium for detecting target of driving scene
CN115641559A (en) * 2022-12-23 2023-01-24 深圳佑驾创新科技有限公司 Target matching method and device for panoramic camera group and storage medium
CN115661780A (en) * 2022-12-23 2023-01-31 深圳佑驾创新科技有限公司 Camera target matching method and device under cross view angle and storage medium

Also Published As

Publication number Publication date
CN109800794B (en) 2021-10-22

Similar Documents

Publication Publication Date Title
CN109800794A (en) A kind of appearance similar purpose identifies fusion method and system across camera again
Li et al. Analyzing growing plants from 4D point cloud data
CN113221625B (en) Method for re-identifying pedestrians by utilizing local features of deep learning
Liu et al. A contrario comparison of local descriptors for change detection in very high spatial resolution satellite images of urban areas
CN109166094A (en) A kind of insulator breakdown positioning identifying method based on deep learning
CN110378909A (en) Single wooden dividing method towards laser point cloud based on Faster R-CNN
CN110263845A (en) SAR image change detection based on semi-supervised confrontation depth network
WO2009123354A1 (en) Method, apparatus, and program for detecting object
Xu et al. Feature-based constraint deep CNN method for mapping rainfall-induced landslides in remote regions with mountainous terrain: An application to Brazil
CN107944416A (en) A kind of method that true man's verification is carried out by video
CN109344842A (en) A Pedestrian Re-identification Method Based on Semantic Region Representation
Liu et al. Land use and land cover mapping in China using multimodal fine-grained dual network
CN113436229A (en) Multi-target cross-camera pedestrian trajectory path generation method
CN113792584B (en) Wearing detection method and system for safety protection tool
CN107798308A (en) A kind of face identification method based on short-sighted frequency coaching method
Xu et al. Multiscale edge-guided network for accurate cultivated land parcel boundary extraction from remote sensing images
Zhao et al. Landsat time series clustering under modified Dynamic Time Warping
Sun et al. An improved YOLO V5-based algorithm of safety helmet wearing detection
Tosawadi et al. On the use of class activation map on rice blast disease identification and localization
Li et al. IDA-SiamNet: Interactive-and dynamic-aware Siamese network for building change detection
CN114494586B (en) Lattice projection deep learning network broadleaf branch and leaf separation and skeleton reconstruction method
CN112115838A (en) Thermal infrared image spectrum fusion human face classification method
CN108564043B (en) A Human Behavior Recognition Method Based on Spatio-temporal Distribution Map
Tu et al. Detecting facade damage on moderate damaged type from high-resolution oblique aerial images
Huang et al. Detection and instance segmentation of grape clusters in orchard environments using an improved mask R-CNN model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant