CN109948425A - A kind of perception of structure is from paying attention to and online example polymerize matched pedestrian's searching method and device - Google Patents

A kind of perception of structure is from paying attention to and online example polymerize matched pedestrian's searching method and device Download PDF

Info

Publication number
CN109948425A
CN109948425A CN201910061943.8A CN201910061943A CN109948425A CN 109948425 A CN109948425 A CN 109948425A CN 201910061943 A CN201910061943 A CN 201910061943A CN 109948425 A CN109948425 A CN 109948425A
Authority
CN
China
Prior art keywords
pedestrian
feature
frame
network
loss function
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910061943.8A
Other languages
Chinese (zh)
Other versions
CN109948425B (en
Inventor
姚睿
高存远
赵佳琦
周勇
夏士雄
王重秋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China University of Mining and Technology CUMT
Original Assignee
China University of Mining and Technology CUMT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China University of Mining and Technology CUMT filed Critical China University of Mining and Technology CUMT
Priority to CN201910061943.8A priority Critical patent/CN109948425B/en
Publication of CN109948425A publication Critical patent/CN109948425A/en
Application granted granted Critical
Publication of CN109948425B publication Critical patent/CN109948425B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Abstract

The invention discloses a kind of perception of structure to polymerize matched pedestrian's searching method and device from attention and online example, belongs to computer vision technique processing technology field.In the training stage, pass through the combination of a convolutional neural networks and non local layer first, feature extraction is carried out to the entire scene image of input, its character representation is obtained, for the anchor point of this special object design structure perception of pedestrian, promotes detection framework performance, after the pedestrian's frame pond that will test out is melted into identical size, it is sent into pedestrian and identifies network training again, save, the pedestrian's feature of optimization and update with label.In the model measurement stage, pedestrian detection is carried out to input scene image using trained non local convolutional neural networks, after detecting pedestrian's frame, and carries out spy's similarity mode with target pedestrian image and sorts and retrieve.The present invention can carry out pedestrian detection simultaneously to large-scale reality scene image and identify again, play a significant role in safety-security areas such as supervision of the cities.

Description

A kind of perception of structure from pay attention to and online example polymerize matched pedestrian's searching method and Device
Technical field
The invention belongs to computer vision technique processing technology field, further relates to target detection and target retrieval is led Structure perception in one of field technique field polymerize matched pedestrian's searching method from attention and online example.
Background technique
Document " Joint detection and identification feature learning for person Search, Computer Vision and Pattern Recognition (CVPR), 2017IEEE Conference On.IEEE, 2017:3376-3385. ", which are disclosed, a kind of integrates pedestrian detection and pedestrian that pedestrian identifies again searches for new frame.Mesh Preceding pedestrian identifies that benchmark and method are mainly the pedestrian's picture for matching clipped mistake again, but the scene in reality will not this Sample is ideal, when doing pedestrian's search, needs first to mark pedestrian with the method for pedestrian detection, then know method for distinguishing again with pedestrian and search Rope goes out specific people.
Document proposes a new deep learning frame for pedestrian's search, it can be by pedestrian detection and pedestrian again Identification is integrated into a convolutional neural networks, and proposes to train network using online example match loss function, because of its energy It well adapts to largely identify data set.Document the method inevitably occurs false positive example, missing inspection when detecting pedestrian The problems such as with frame dislocation is surrounded, these effects that can all search for pedestrian have an impact, and the limitation of convolutional neural networks makes Model, to the information with global distribution, good cannot position the region of pedestrian's comparatively dense, encounter data without calligraphy learning Overall size is smaller, and the image informations such as posture behavior act of people are not abundant enough, and same label pedestrian sample it is less when It waits, single online example match loss function can not make model acquire the strong feature of distinction.
Summary of the invention
The problems such as in order to reduce the false positive example that the detection part of pedestrian's search occurs, missing inspection and encirclement frame dislocation, make simultaneously Pedestrian's search model is integrated into global information, and simultaneously pedestrian is accurately positioned in the region for the comparatively dense that more watches for pedestrians, and study is to Shandong The feature representation of stick inhibits net to generate over-fitting easily, it is promoted to search for the development in practical application in pedestrian, and the present invention mentions A kind of perception of structure is gone out from paying attention to and online example polymerize matched pedestrian's searching method.The anchor point perceived using structure, is mentioned The precision for rising pedestrian detection improves efficiency simultaneously.Non local company is introduced with lesser extra computation cost in pedestrian's dragnet network Operation is connect, output feature is connected pixel remote on same image, to help depth network more Non local information is merged well, will be embodied in the weight sets for exporting feature in the intensive region of pedestrian, is further increased output Model accuracy.In addition to this, this method combines online example match and center loss function, proposes online example polymerization With cost function, image and different classes of image from the same category are better discriminated between, so that pedestrian is searched for e-learning and arrives Diversification and the feature for having judgement index, to be effectively relieved, the generic image of data set is few and lack of diversity problem is to bring It influences.
The technical solution adopted by the present invention to solve the technical problems is:
A kind of perception of structure from paying attention to and online example polymerize matched pedestrian's searching method,
The following steps are included:
1, pedestrian's search model is constructed
(a) existing depth convolutional network is divided into two parts, head and tail portion, wherein depth convolutional network is used and moved Learning strategy is moved, imports and uses the trained network parameter of ImageNet data set, as the initial training of depth network Parameter is eventually adding non local layer in convolutional neural networks head portion, obtained characteristic pattern and meanwhile share to pedestrian detection and Weight identification division;
(b) the pedestrian detection frame for the anchor point that setting is perceived comprising structure above characteristic pattern, is responsible for detecting pedestrian's frame, The pedestrian's frame feature detected enters pond layer;
(c) it is arranged after the layer of pond and identifies network again, be responsible for preservation when acting on trained, optimization and update has label Pedestrian's feature, be responsible for searched targets pedestrian in model measurement;
2, it constructs training dataset and trained hyper parameter is set
When constructing training set, upset the sequence that training data concentrates image, generate training data group, one group of data contains Panoramic picture, the label of pedestrian's frame position and pedestrian that image the inside includes;Using batch descent method to smooth absolutely loss Function, cross entropy loss function and online example match loss function are optimized;
Learning rate and momentum is separately provided in center loss function, and with certain Weight four loss functions of summation;
Four loss functions include: smooth absolutely loss function, cross entropy loss function, the loss of online example match Function, center loss function function, wherein smooth absolutely loss function and cross entropy loss function respectively use multitask twice Optimize simultaneously;
3, training pedestrian's search model
(d) scene picture feature extraction inputs an entire image, by depth convolutional network head and following formula (1) Shown in non local layer obtain scene picture feature f1, make Fusion Features global information, model enable to pay close attention to row in image The intensive region of people:
Wherein i indicates the position of output, and j indicates the traversal of all possible points, and f indicates that similarity function, g indicate input contracting Function is put, C (x) is normalization factor;
The scene picture feature vector f that will be obtained1, by the convolutional neural networks of pedestrian detection fast area, gone People's candidate frame feature f2, and the anchor point of structure perception, the formula such as (3) institute of anchor point are proposed for this special object of pedestrian Show,
The improved strategy of anchor point is as shown in (4):
Wherein A represents anchor point, and S represents size, and R represents ratio,Traversal is represented to be multiplied;
(e) candidate pedestrian's frame feature f is extracted in candidate pedestrian's frame detection2Later, using the convolutional neural networks of fast area In smooth absolute loss function accurately return the positions and dimensions of pedestrian candidate frame, and pedestrian is waited with cross entropy loss function It selects the classification of frame to exercise supervision, is melted into identical size in the pedestrian's frame pond obtained based on feature vector, and be sent into depth convolution net The tail portion that network is divided into subsequently enters and identifies network again, extracts each pedestrian's frame feature after L2 regularization;
(f) pedestrian's characteristic matching extracts pedestrian's frame feature after L2 regularization, using online example match to there is mark Label identity and the feature without tag identity are saved, and are optimized and are updated, an inquiry table is arranged in propagated forward, are calculated most Cosine similarity in small batch between sample and all tag identities;
In back-propagating, if the tag along sort of target pedestrian is t, just updated in inquiry table using following formula T column, enable inquiry table to save many attitude of same target pedestrian and the various features under angle,
Vt←γVt+ (1- γ) x, (5)
Wherein, wherein x is the feature of target pedestrian, VtThe feature of target pedestrian in updated inquiry table, γ are to update Weight, γ=0.5 can be taken in section (0,1) interior value, this method;
Expression of the pedestrian's frame feature of the not tag identity occurred in scene picture as negative sample, for learning characteristic And it is of great value, these features without tag identity are saved by setting round-robin queue Q, with U ∈ RD×QIt indicates, D × Q ties up matrix, and D is pedestrian's frame characteristic dimension after L2 regularization, and Q is the size of round-robin queue, and size is arranged according to actual scene, The cosine similarity U in U and minimum batch between sample x is calculated simultaneouslyTX, after each round iteration, by new feature vector It is pressed into queue, and rejects those out-of-date feature vectors, the process of a circulation is presented;
It introduces center loss function shown in formula (6) and constraint is realized to the feature with tag identity, by reducing class The training of internal loss Optimized model, center loss function only train pedestrian's feature with label, make the same a group traveling together of model minimization Internal feature variation,
Wherein, Xi∈RdThe feature of pedestrian's frame i is represented, it is to belong to people's identity label yiClass,Represent the person Part label yiThe central feature of class, m indicate the quantity of pedestrian pedestrian's classification;
Meanwhile pedestrian is accurately returned using the smooth absolute loss function in the convolutional neural networks of fast area again The positions and dimensions of candidate frame, and exercised supervision with cross entropy loss function to the classification of pedestrian candidate frame, obtain final row People's search model;
4, pedestrian's search model is tested.
In step 3 scene picture feature extraction, the similarity function f selects embedded Gaussian function.
In step 3 scene picture feature extraction, anchor point number by original 9 instead of 72, the size of each anchor point Ratio also accordingly changes, and specifically, is demarcated in the section of pedestrian's frame size and ratio integrated distribution in data set, anchor point setting The numerical point of size and ratio with comparatively dense;In the section that size and ratio are not concentrated, size and ratio is arranged in anchor point Numerical point between interval it is larger.
In step 3 candidate's pedestrian's frame detection, extracting each pedestrian's frame feature after L2 regularization is 256 dimensions.
Step 4 is specifically: for each picture library image, being calculated by network propagated forward and obtains all pedestrian candidate frames Feature replaces pedestrian's candidate frame with unique given bounding box, then propagated forward is calculated to obtain its spy for query image Vector is levied, finally, calculating the pairs of cosine similarity between query image feature and picture library candidate pedestrian's frame feature, is based on cosine The serial evaluation similarity level of similarity, and export the target pedestrian image of retrieval.
A kind of perception of structure is from paying attention to and online example polymerize matched pedestrian's searcher, comprising:
Pedestrian's search model constructs module, is responsible for pedestrian dragnet of the building based on attention mechanism and personal polymerization Network;After the convolutional neural networks of first part, be added attention mechanism in non local layer, blending image global information from And pay close attention to the intensive region of pedestrian;Pedestrian detection part constructs the anchor point of structure perception;It is poly- that online example is arranged in weight identification division Function is closed to supervise pedestrian's frame feature;
Network training module, using the training dataset constructed using batch gradient descent algorithm to the row built People searches for network and carries out parameter training;In the training stage, pass through the combination of a convolutional neural networks and non local layer first, it is right The entire scene image of input carries out feature extraction, obtains its character representation, designs knot for this special object of pedestrian The anchor point of structure perception, promotes detection framework performance, and feeding pedestrian identifies again after the pedestrian's frame pond that will test out is melted into identical size Network is saved, optimization and update have label using center loss function and the matched inquiry table training of online example match Pedestrian's feature, and using online example match round-robin queue to do not have label pedestrian's feature and some background informations into Row, which is rejected, to be updated, and trained pedestrian is finally searched for pedestrian when network takes out as test and searches for network;
Pedestrian's search model test module, for constructing test sample;And test sample is sent into trained pedestrian and is searched Rope network carries out pedestrian detection to the test sample scene image of input, detects after pedestrian's frame position and obtain its feature, Target pedestrian image is inputted again and obtains its feature, is carried out characteristic similarity with pedestrian's frame feature and is matched sequence and retrieving identity, with And determine its position in scene image.
The beneficial effects of the present invention are:
The first, it introduces from attention mechanism at present compared with the non local module of the technology in forward position, has effectively incorporated global information, It solves the problems, such as that non local feature is inflexible, model is allow to increasingly focus on the region of crowd massing in scene image.
Second, in the performance for the anchor point promotion detection framework that the stage of pedestrian detection by proposition there is structure to perceive, i.e., Make before the classification and recurrence for not yet completing anchor, the very close pedestrian's frame really marked, makes model restrain faster, Improve the efficiency of pedestrian detection.
Third polymerize matching cost function with online example, solve same label pedestrian sample it is less when, it is single One online example match loss function can not acquire the strong personal feature of distinction, so that the feature robustness that model is acquired is more By force, data set challenge bigger in reality scene is coped with.
Detailed description of the invention
Fig. 1 is the strategic process figure of pedestrian's weight identification division pedestrian frame feature training in pedestrian's search framework of the present invention.
Fig. 2 is that a kind of structure perception of the present invention pays attention to certainly and online example polymerize the network that matched pedestrian searches for network Figure.
Specific embodiment
In order to make the foregoing objectives, features and advantages of the present invention clearer and more comprehensible, below by specific embodiment and Attached drawing, the present invention will be further described.
It is the strategic process of pedestrian's weight identification division pedestrian frame feature training in pedestrian's search framework of the present invention as shown in Figure 1 Figure, including the following steps:
1, pedestrian's search model is constructed
(a) existing depth convolutional network is divided into two parts, is divided into head and tail portion, wherein depth convolutional network It using transfer learning strategy, imports and uses the trained network parameter of ImageNet data set, as the first of depth network Beginning training parameter is eventually adding non local layer in convolutional neural networks head portion, obtained characteristic pattern while sharing to pedestrian Detection and weight identification division.
(b) the pedestrian detection frame for the anchor point that setting is perceived comprising structure above characteristic pattern, is responsible for detecting pedestrian's frame, The pedestrian's frame feature detected enters pond layer.
(c) it is arranged after the layer of pond and identifies network again, be responsible for preservation when acting on trained, optimization and update has label Pedestrian's feature, be responsible for searched targets pedestrian in model measurement.
It is as follows that pedestrian described in step 1 searches for network architecture parameters:
For first layer input layer, it is 3, i.e. the three of image Color Channel that Feature Mapping map number, which is arranged,;
For second layer convolutional layer, it is 64 that Feature Mapping map number, which is arranged,;
For 9 layers of the residual block of third layer first, it is 256 that Feature Mapping map number, which is arranged,;
For the 4th layer of 12 layers of second residual block, it is 512 that Feature Mapping map number, which is arranged,;
For 9 layers of layer 5 third residual block, it is 1024 that Feature Mapping map number, which is arranged,;
Non local layer is connect for layer 6, keeping Feature Mapping map invariable number is still 1024;
For layer 7 convolutional layer, it is 512 that Feature Mapping map number, which is arranged,;And after layer 7 convolutional layer, setting Convolutional layer, setting Feature Mapping map number are anchor point number × 2, distinguish candidate frame prospect and background;
Another convolutional layer is set, and setting Feature Mapping map number is anchor point number × 4, returns candidate frame position and big It is small.
For the 8th layer of convolutional layer, the 4th 9 layers of residual block, setting Feature Mapping map number is 1024;
For the 9th layer of convolutional layer, the 5th 9 layers of residual block, setting Feature Mapping map number is 2048;
For three full articulamentums of the tenth layer of setting, it is 2,8,256 that Feature Mapping map number, which is respectively set, is respectively corresponded Distinguish candidate frame prospect and background, the recurrence of candidate frame position and size and the extraction of candidate frame pedestrian's feature.
2, it constructs training dataset and trained hyper parameter is set
When constructing training set, upset the sequence that training data concentrates image, generates training data group.One group of data contains Panoramic picture, the label of pedestrian's frame position and pedestrian that image the inside includes.Using batch descent method to smooth absolutely loss Function, cross entropy loss function and online example match loss function are optimized, and center loss function is separately provided Habit rate and momentum, and with certain above-mentioned 6 loss functions of Weight summation (that is: smooth absolutely loss function, cross entropy damage Function, online example match loss function, center loss function function are lost, wherein smooth absolutely loss function and intersection entropy loss Function respectively uses twice, as shown in Figure 1), multitask optimizes simultaneously.
3, training pedestrian's search model
(a) scene picture feature extraction.An entire image is exported, is obtained by depth convolutional network head and non local layer Feature f1.Non local layer formula (1) is as follows:
Wherein i indicates the position of output, and j indicates the traversal of all possible points, and f indicates that similarity function, g indicate input contracting Function is put, C (x) is normalization factor.Similarity function f in the present invention selects embedded Gauss.
Formula (2) is as follows:
Wherein θ (xi)=WθxiWith η (xj)=WηxjIt is two imbedding functions.
Feature vector is being obtained, by pedestrian detection Faster-RCNN, this special object proposes structure for pedestrian The anchor point of perception, formula (3), (4) are as follows:
Wherein A represents anchor point, and S represents size, and R represents ratio,It represents traversal to be multiplied, therefore anchor point number is by original 9 instead of 72, the dimensional ratios of each anchor point also accordingly change.Specifically, in data set demarcate pedestrian's frame size and In the section of ratio integrated distribution, anchor point setting has the size of comparatively dense and the numerical point of ratio;Do not collect in size and ratio In section in, the interval that anchor point is arranged between size and the numerical point of ratio is larger.
(b) candidate pedestrian's frame detection.After characteristic vector pickup candidate pedestrian's frame, respectively with smooth absolutely loss function and friendship Fork entropy loss function exercises supervision to the classification of candidate frame and position dimension, is melted into the pedestrian's frame pond obtained based on feature vector Identical size 7x7, and it is sent into the tail portion that depth convolutional network is divided into, and enter and identify network again, it extracts after L2 regularization Each pedestrian's frame feature be 256 dimension.
(c) pedestrian's characteristic matching.Pedestrian's frame feature after L2 regularization (256 dimension) is extracted, online example is utilized It is paired with tag identity and the feature without tag identity is saved, optimize and update, an inquiry table is set in propagated forward In, calculate the cosine similarity in minimum batch between sample and all tag identities.In back-propagating, if target pedestrian Tag along sort be t, just updated using following formula in inquiry table t column, formula such as (5);
Vt←γVt+ (1- γ) x, (5)
Wherein, x is the feature of target pedestrian, VtThe feature of target pedestrian in updated inquiry table, γ are the power updated Weight, can take γ=0.5 in section (0,1) interior value, this method;
Expression of the pedestrian's frame feature of the not tag identity occurred in scene picture as negative sample for learning characteristic And it is of great value, these features without tag identity are saved by setting round-robin queue Q, with U ∈ RD×QIt indicates, D × Q ties up matrix, and D is pedestrian's frame characteristic dimension after L2 regularization, and Q is the size of round-robin queue, and size is arranged according to actual scene, The cosine similarity U in U and minimum batch between sample x is calculated simultaneouslyTX, after each round iteration, by new feature vector It is pressed into queue, and rejects those out-of-date feature vectors, the process of a circulation is presented;
When due to training, input frame is whole image, and different row of labels people be shown in each image be it is random, it is dilute In thin and unbalanced, it is difficult the positive negative sample pair of tissue equivalent, verifying cannot be introduced directly into the frame of Faster-RCNN Comparison loss item item, so the center loss function of introducing realizes certain constraint, by reducing Intra-class loss Optimized model instruction Practice, center loss function only trains pedestrian's feature with label.
Formula is such as shown in (6):
Wherein Xi∈RdThe feature of pedestrian's frame i is represented, it is to belong to people's identity label yiClass,Representative's identity Label yiThe central feature of class, m indicate the quantity of pedestrian pedestrian's classification.
Meanwhile again using in Faster-RCNN smooth absolute loss function and cross entropy loss function to candidate The classification of frame and position dimension size carry out further fine.
4, pedestrian's search model is tested
For each picture library image, we calculate the feature for obtaining all pedestrian candidate frames by network propagated forward.It is right In query image, we replace pedestrian's candidate frame with uniquely given bounding box, and then propagated forward is calculated to obtain its feature Vector.Finally, we calculate the pairs of cosine similarity between query image feature and picture library candidate pedestrian's frame feature.Based on remaining The serial evaluation similarity level of string similarity, and export the target pedestrian image of retrieval.
The present invention further discloses a kind of perception of structure to polymerize matched pedestrian's searcher from attention and online example, Include:
Pedestrian's search model constructs module, is responsible for pedestrian dragnet of the building based on attention mechanism and personal polymerization Network;After the convolutional neural networks of first part, be added attention mechanism in non local layer, blending image global information from And pay close attention to the intensive region of pedestrian;Pedestrian detection part constructs the anchor point of structure perception;It is poly- that online example is arranged in weight identification division Function is closed to supervise pedestrian's frame feature;
Network training module, using the training dataset constructed using batch gradient descent algorithm to the row built People searches for network and carries out parameter training;In the training stage, pass through the combination of a convolutional neural networks and non local layer first, it is right The entire scene image of input carries out feature extraction, obtains its character representation, designs knot for this special object of pedestrian The anchor point of structure perception, promotes detection framework performance, and feeding pedestrian identifies again after the pedestrian's frame pond that will test out is melted into identical size Network is saved, optimization and update have label using center loss function and the matched inquiry table training of online example match Pedestrian's feature, and using online example match round-robin queue to do not have label pedestrian's feature and some background informations into Row, which is rejected, to be updated, and trained pedestrian is finally searched for pedestrian when network takes out as test and searches for network;
Pedestrian's search model test module, for constructing test sample;And test sample is sent into trained pedestrian and is searched Rope network carries out pedestrian detection to the test sample scene image of input, detects after pedestrian's frame position and obtain its feature, Target pedestrian image is inputted again and obtains its feature, is carried out characteristic similarity with pedestrian's frame feature and is matched sequence and retrieving identity, with And determine its position in scene image.
The above embodiments are merely illustrative of the technical solutions of the present invention rather than is limited, the ordinary skill of this field Personnel can be with modification or equivalent replacement of the technical solution of the present invention are made, without departing from the spirit and scope of the present invention, this The protection scope of invention should be subject to described in claims.

Claims (6)

1. a kind of structure perception polymerize matched pedestrian's searching method from attention and online example, which is characterized in that
The following steps are included:
1, pedestrian's search model is constructed
(a) existing depth convolutional network is divided into two parts, head and tail portion, wherein depth convolutional network is learned using migration Strategy is practised, imports and uses the trained network parameter of ImageNet data set, as the initial training parameter of depth network, It is eventually adding non local layer in convolutional neural networks head portion, obtained characteristic pattern while pedestrian detection is shared to and identifying again Part;
(b) the pedestrian detection frame for the anchor point that setting is perceived comprising structure above characteristic pattern, is responsible for detecting pedestrian's frame, detects Pedestrian's frame feature out enters pond layer;
(c) it is arranged after the layer of pond and identifies network again, be responsible for preservation when acting on trained, optimization and update has the row of label People's feature is responsible for searched targets pedestrian in model measurement;
2, it constructs training dataset and trained hyper parameter is set
When constructing training set, upset the sequence that training data concentrates image, generate training data group, one group of data contains panorama Image, the label of pedestrian's frame position and pedestrian that image the inside includes;Using batch descent method to smooth absolutely loss letter Number, cross entropy loss function and online example match loss function are optimized;
Learning rate and momentum is separately provided in center loss function, and with certain Weight four loss functions of summation;
Four loss functions include: smooth absolutely loss function, cross entropy loss function, online example match loss letter Number, center loss function function, wherein smooth absolutely loss function and cross entropy loss function respectively use twice, and multitask is same Shi Youhua;
3, training pedestrian's search model
(d) scene picture feature extraction inputs an entire image, by shown in depth convolutional network head and following formula (1) Non local layer obtain scene picture feature f1, make Fusion Features global information, so that model is paid close attention to pedestrian in image close The region of collection:
Wherein i indicates the position of output, and j indicates the traversal of all possible points, and f indicates that similarity function, g indicate input scaling letter Number, C (x) is normalization factor;
The scene picture feature vector f that will be obtained1, by the convolutional neural networks of pedestrian detection fast area, obtain pedestrian candidate Frame feature f2, and the anchor point of structure perception is proposed for this special object of pedestrian, the formula of anchor point such as (3) is shown,
The improved strategy of anchor point is as shown in (4):
Wherein A represents anchor point, and S represents size, and R represents ratio,Traversal is represented to be multiplied;
(e) candidate pedestrian's frame feature f is extracted in candidate pedestrian's frame detection2Later, using in the convolutional neural networks of fast area Smooth absolutely loss function accurately returns the positions and dimensions of pedestrian candidate frame, and with cross entropy loss function to pedestrian candidate frame Classification exercise supervision, be melted into identical size in pedestrian's frame pond for obtaining based on feature vector, and be sent into depth convolutional network point At tail portion, subsequently enter and identify network again, extract each pedestrian's frame feature after L2 regularization;
(f) pedestrian's characteristic matching extracts pedestrian's frame feature after L2 regularization, using online example match to there is label body Part and the feature without tag identity are saved, and are optimized and are updated, an inquiry table is arranged in propagated forward, are calculated most small quantities of Cosine similarity between secondary middle sample and all tag identities;
In back-propagating, if the tag along sort of target pedestrian is t, the t in inquiry table is just updated using following formula Column, enable inquiry table to save many attitude of same target pedestrian and the various features under angle,
Vt←γVt+ (1- γ) x, (5)
Wherein, x is the feature of target pedestrian, the feature of target pedestrian in the updated inquiry table of Vt, and γ is the weight updated, can To take γ=0.5 in section (0,1) interior value, this method;
Pedestrian's frame feature of the not tag identity occurred in scene picture is also for the expression of learning characteristic as negative sample It is of great value, these features without tag identity are saved by setting round-robin queue Q, with U ∈ RD×QIt indicates, D × Q dimension Matrix, D are pedestrian's frame characteristic dimensions after L2 regularization, and Q is the size of round-robin queue, size are arranged according to actual scene, together When calculate cosine similarity U in U and minimum batch between sample xTX, after each round iteration, by new feature vector pressure Enqueue, and those out-of-date feature vectors are rejected, the process of a circulation is presented;
It introduces center loss function shown in formula (6) and constraint is realized to the feature with tag identity, damaged by reducing in class Optimized model training is lost, center loss function only trains pedestrian's feature with label, makes model minimization in a group traveling together Portion's changing features,
Wherein, Xi∈RdThe feature of pedestrian's frame i is represented, it is to belong to people's identity label yiClass,Representative's identity label yiThe central feature of class, m indicate the quantity of pedestrian pedestrian's classification;
Meanwhile pedestrian candidate is accurately returned using the smooth absolute loss function in the convolutional neural networks of fast area again The positions and dimensions of frame simultaneously exercise supervision to the classification of pedestrian candidate frame with cross entropy loss function, obtain final pedestrian's search Model;
4, pedestrian's search model is tested.
2. structure perception polymerize matched pedestrian's searching method, feature from attention and online example according to claim 1 It is, in step 3 scene picture feature extraction, the similarity function f selects embedded Gaussian function.
3. structure perception polymerize matched pedestrian's searching method, feature from attention and online example according to claim 1 Be, in step 3 scene picture feature extraction, anchor point number by original 9 instead of 72, the dimensional ratios of each anchor point Also corresponding to change, it specifically, is demarcated in data set in the section of pedestrian's frame size and ratio integrated distribution, anchor point setting has The size of comparatively dense and the numerical point of ratio;In the section that size and ratio are not concentrated, the number of size and ratio is arranged in anchor point Interval between value point is larger.
4. structure perception polymerize matched pedestrian's searching method, feature from attention and online example according to claim 1 It is, in step 3 candidate's pedestrian's frame detection, extracting each pedestrian's frame feature after L2 regularization is 256 dimensions.
5. structure perception polymerize matched pedestrian's searching method, feature from attention and online example according to claim 1 It is, step 4 is specifically:
For each picture library image, the feature for obtaining all pedestrian candidate frames is calculated by network propagated forward, for query graph Picture replaces pedestrian's candidate frame with unique given bounding box, and then propagated forward is calculated to obtain its feature vector, finally, meter The pairs of cosine similarity between query image feature and picture library candidate pedestrian's frame feature is calculated, the sequence based on cosine similarity is commented Estimate similarity level, and exports the target pedestrian image of retrieval.
6. a kind of structure perception polymerize matched pedestrian's searcher from attention and online example characterized by comprising
Pedestrian's search model constructs module, is responsible for building based on the pedestrian of attention mechanism and personal polymerization and searches for network;? After the convolutional neural networks of first part, the non local layer in attention mechanism is added, blending image global information is to close Infuse the intensive region of pedestrian;
Pedestrian detection part constructs the anchor point of structure perception;
Online example aggregate function, which is arranged, in weight identification division supervises pedestrian's frame feature;
Network training module searches the pedestrian built using batch gradient descent algorithm using the training dataset constructed Rope network carries out parameter training;In the training stage, pass through the combination of a convolutional neural networks and non local layer first, to input Entire scene image carry out feature extraction, its character representation is obtained, for this special object design structure sense of pedestrian The anchor point known promotes detection framework performance, and feeding pedestrian identifies network again after the pedestrian's frame pond that will test out is melted into identical size, It using the training of the inquiry table of center loss function and online example match, saves, the pedestrian's feature of optimization and update with label, And the pedestrian's feature and some background informations that do not have label are rejected more using the round-robin queue of online example match Newly, trained pedestrian is finally searched for pedestrian when network takes out as test and searches for network;
Pedestrian's search model test module, for constructing test sample;And test sample is sent into trained pedestrian's dragnet Network carries out pedestrian detection to the test sample scene image of input, detects pedestrian's frame position later and obtain its feature, then is defeated Enter target pedestrian image and obtain its feature, carries out characteristic similarity with pedestrian's frame feature and match sequence and retrieving identity, and really Its fixed position in scene image.
CN201910061943.8A 2019-01-22 2019-01-22 Pedestrian searching method and device for structure-aware self-attention and online instance aggregation matching Active CN109948425B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910061943.8A CN109948425B (en) 2019-01-22 2019-01-22 Pedestrian searching method and device for structure-aware self-attention and online instance aggregation matching

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910061943.8A CN109948425B (en) 2019-01-22 2019-01-22 Pedestrian searching method and device for structure-aware self-attention and online instance aggregation matching

Publications (2)

Publication Number Publication Date
CN109948425A true CN109948425A (en) 2019-06-28
CN109948425B CN109948425B (en) 2023-06-09

Family

ID=67007387

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910061943.8A Active CN109948425B (en) 2019-01-22 2019-01-22 Pedestrian searching method and device for structure-aware self-attention and online instance aggregation matching

Country Status (1)

Country Link
CN (1) CN109948425B (en)

Cited By (32)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110348014A (en) * 2019-07-10 2019-10-18 电子科技大学 A kind of semantic similarity calculation method based on deep learning
CN110555420A (en) * 2019-09-09 2019-12-10 电子科技大学 fusion model network and method based on pedestrian regional feature extraction and re-identification
CN110569738A (en) * 2019-08-15 2019-12-13 杨春立 natural scene text detection method, equipment and medium based on dense connection network
CN110647816A (en) * 2019-08-26 2020-01-03 合肥工业大学 Target detection method for real-time monitoring of goods shelf medicines
CN110659721A (en) * 2019-08-02 2020-01-07 浙江省北大信息技术高等研究院 Method and system for constructing target detection network
CN110765880A (en) * 2019-09-24 2020-02-07 中国矿业大学 Light-weight video pedestrian heavy identification method
CN111027397A (en) * 2019-11-14 2020-04-17 上海交通大学 Method, system, medium and device for detecting comprehensive characteristic target in intelligent monitoring network
CN111241944A (en) * 2019-12-31 2020-06-05 浙江大学 Scene recognition and loopback detection method based on background target detection and background feature similarity matching
CN111401286A (en) * 2020-03-24 2020-07-10 武汉大学 Pedestrian retrieval method based on component weight generation network
CN111539257A (en) * 2020-03-31 2020-08-14 苏州科达科技股份有限公司 Personnel re-identification method, device and storage medium
CN111582225A (en) * 2020-05-19 2020-08-25 长沙理工大学 Remote sensing image scene classification method and device
CN111695526A (en) * 2020-06-15 2020-09-22 北京爱笔科技有限公司 Network model generation method, pedestrian re-identification method and device
CN111695470A (en) * 2020-06-02 2020-09-22 中山大学 Visible light-near infrared pedestrian re-identification method based on depth feature orthogonal decomposition
CN111709311A (en) * 2020-05-27 2020-09-25 西安理工大学 Pedestrian re-identification method based on multi-scale convolution feature fusion
CN111723728A (en) * 2020-06-18 2020-09-29 中国科学院自动化研究所 Pedestrian searching method, system and device based on bidirectional interactive network
CN111723719A (en) * 2020-06-12 2020-09-29 中国科学院自动化研究所 Video target detection method, system and device based on category external memory
CN111814845A (en) * 2020-03-26 2020-10-23 同济大学 Pedestrian re-identification method based on multi-branch flow fusion model
CN111914107A (en) * 2020-07-29 2020-11-10 厦门大学 Instance retrieval method based on multi-channel attention area expansion
CN112016591A (en) * 2020-08-04 2020-12-01 杰创智能科技股份有限公司 Training method of image recognition model and image recognition method
CN112241682A (en) * 2020-09-14 2021-01-19 同济大学 End-to-end pedestrian searching method based on blocking and multi-layer information fusion
CN112464730A (en) * 2020-11-03 2021-03-09 南京理工大学 Pedestrian re-identification method based on domain-independent foreground feature learning
CN112597956A (en) * 2020-12-30 2021-04-02 华侨大学 Multi-person attitude estimation method based on human body anchor point set and perception enhancement network
CN113076861A (en) * 2021-03-30 2021-07-06 南京大学环境规划设计研究院集团股份公司 Bird fine-granularity identification method based on second-order features
CN113095106A (en) * 2019-12-23 2021-07-09 华为数字技术(苏州)有限公司 Human body posture estimation method and device
CN113627383A (en) * 2021-08-25 2021-11-09 中国矿业大学 Pedestrian loitering re-identification method for panoramic intelligent security
CN113743251A (en) * 2021-08-17 2021-12-03 华中科技大学 Target searching method and device based on weak supervision scene
CN113920470A (en) * 2021-10-12 2022-01-11 中国电子科技集团公司第二十八研究所 Pedestrian retrieval method based on self-attention mechanism
CN113936301A (en) * 2021-07-02 2022-01-14 西北工业大学 Target re-identification method based on central point prediction loss function
CN114049609A (en) * 2021-11-24 2022-02-15 大连理工大学 Multilevel aggregation pedestrian re-identification method based on neural architecture search
US20220058396A1 (en) * 2019-11-19 2022-02-24 Tencent Technology (Shenzhen) Company Limited Video Classification Model Construction Method and Apparatus, Video Classification Method and Apparatus, Device, and Medium
CN115731588A (en) * 2021-08-27 2023-03-03 腾讯科技(深圳)有限公司 Model processing method and device
CN117456560A (en) * 2023-12-22 2024-01-26 华侨大学 Pedestrian re-identification method based on foreground perception dynamic part learning

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108197584A (en) * 2018-01-12 2018-06-22 武汉大学 A kind of recognition methods again of the pedestrian based on triple deep neural network
CN109165540A (en) * 2018-06-13 2019-01-08 深圳市感动智能科技有限公司 A kind of pedestrian's searching method and device based on priori candidate frame selection strategy

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108197584A (en) * 2018-01-12 2018-06-22 武汉大学 A kind of recognition methods again of the pedestrian based on triple deep neural network
CN109165540A (en) * 2018-06-13 2019-01-08 深圳市感动智能科技有限公司 A kind of pedestrian's searching method and device based on priori candidate frame selection strategy

Cited By (51)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110348014A (en) * 2019-07-10 2019-10-18 电子科技大学 A kind of semantic similarity calculation method based on deep learning
CN110659721A (en) * 2019-08-02 2020-01-07 浙江省北大信息技术高等研究院 Method and system for constructing target detection network
CN110659721B (en) * 2019-08-02 2022-07-22 杭州未名信科科技有限公司 Method and system for constructing target detection network
CN110569738A (en) * 2019-08-15 2019-12-13 杨春立 natural scene text detection method, equipment and medium based on dense connection network
CN110569738B (en) * 2019-08-15 2023-06-06 杨春立 Natural scene text detection method, equipment and medium based on densely connected network
CN110647816A (en) * 2019-08-26 2020-01-03 合肥工业大学 Target detection method for real-time monitoring of goods shelf medicines
CN110647816B (en) * 2019-08-26 2022-11-22 合肥工业大学 Target detection method for real-time monitoring of goods shelf medicines
CN110555420A (en) * 2019-09-09 2019-12-10 电子科技大学 fusion model network and method based on pedestrian regional feature extraction and re-identification
CN110555420B (en) * 2019-09-09 2022-04-12 电子科技大学 Fusion model network and method based on pedestrian regional feature extraction and re-identification
CN110765880B (en) * 2019-09-24 2023-04-18 中国矿业大学 Light-weight video pedestrian heavy identification method
CN110765880A (en) * 2019-09-24 2020-02-07 中国矿业大学 Light-weight video pedestrian heavy identification method
CN111027397B (en) * 2019-11-14 2023-05-12 上海交通大学 Comprehensive feature target detection method, system, medium and equipment suitable for intelligent monitoring network
CN111027397A (en) * 2019-11-14 2020-04-17 上海交通大学 Method, system, medium and device for detecting comprehensive characteristic target in intelligent monitoring network
US20220058396A1 (en) * 2019-11-19 2022-02-24 Tencent Technology (Shenzhen) Company Limited Video Classification Model Construction Method and Apparatus, Video Classification Method and Apparatus, Device, and Medium
US11967152B2 (en) * 2019-11-19 2024-04-23 Tencent Technology (Shenzhen) Company Limited Video classification model construction method and apparatus, video classification method and apparatus, device, and medium
CN113095106A (en) * 2019-12-23 2021-07-09 华为数字技术(苏州)有限公司 Human body posture estimation method and device
CN111241944B (en) * 2019-12-31 2023-05-26 浙江大学 Scene recognition and loop detection method based on background target and background feature matching
CN111241944A (en) * 2019-12-31 2020-06-05 浙江大学 Scene recognition and loopback detection method based on background target detection and background feature similarity matching
CN111401286A (en) * 2020-03-24 2020-07-10 武汉大学 Pedestrian retrieval method based on component weight generation network
CN111401286B (en) * 2020-03-24 2022-03-04 武汉大学 Pedestrian retrieval method based on component weight generation network
CN111814845A (en) * 2020-03-26 2020-10-23 同济大学 Pedestrian re-identification method based on multi-branch flow fusion model
CN111814845B (en) * 2020-03-26 2022-09-20 同济大学 Pedestrian re-identification method based on multi-branch flow fusion model
CN111539257B (en) * 2020-03-31 2022-07-26 苏州科达科技股份有限公司 Person re-identification method, device and storage medium
CN111539257A (en) * 2020-03-31 2020-08-14 苏州科达科技股份有限公司 Personnel re-identification method, device and storage medium
CN111582225A (en) * 2020-05-19 2020-08-25 长沙理工大学 Remote sensing image scene classification method and device
CN111709311A (en) * 2020-05-27 2020-09-25 西安理工大学 Pedestrian re-identification method based on multi-scale convolution feature fusion
CN111695470A (en) * 2020-06-02 2020-09-22 中山大学 Visible light-near infrared pedestrian re-identification method based on depth feature orthogonal decomposition
CN111695470B (en) * 2020-06-02 2023-05-12 中山大学 Visible light-near infrared pedestrian re-identification method based on depth feature orthogonal decomposition
CN111723719A (en) * 2020-06-12 2020-09-29 中国科学院自动化研究所 Video target detection method, system and device based on category external memory
CN111695526A (en) * 2020-06-15 2020-09-22 北京爱笔科技有限公司 Network model generation method, pedestrian re-identification method and device
CN111695526B (en) * 2020-06-15 2023-10-13 北京爱笔科技有限公司 Network model generation method, pedestrian re-recognition method and device
CN111723728A (en) * 2020-06-18 2020-09-29 中国科学院自动化研究所 Pedestrian searching method, system and device based on bidirectional interactive network
CN111914107B (en) * 2020-07-29 2022-06-14 厦门大学 Instance retrieval method based on multi-channel attention area expansion
CN111914107A (en) * 2020-07-29 2020-11-10 厦门大学 Instance retrieval method based on multi-channel attention area expansion
CN112016591A (en) * 2020-08-04 2020-12-01 杰创智能科技股份有限公司 Training method of image recognition model and image recognition method
CN112241682A (en) * 2020-09-14 2021-01-19 同济大学 End-to-end pedestrian searching method based on blocking and multi-layer information fusion
CN112464730A (en) * 2020-11-03 2021-03-09 南京理工大学 Pedestrian re-identification method based on domain-independent foreground feature learning
CN112597956B (en) * 2020-12-30 2023-06-02 华侨大学 Multi-person gesture estimation method based on human body anchor point set and perception enhancement network
CN112597956A (en) * 2020-12-30 2021-04-02 华侨大学 Multi-person attitude estimation method based on human body anchor point set and perception enhancement network
CN113076861B (en) * 2021-03-30 2022-02-25 南京大学环境规划设计研究院集团股份公司 Bird fine-granularity identification method based on second-order features
CN113076861A (en) * 2021-03-30 2021-07-06 南京大学环境规划设计研究院集团股份公司 Bird fine-granularity identification method based on second-order features
CN113936301A (en) * 2021-07-02 2022-01-14 西北工业大学 Target re-identification method based on central point prediction loss function
CN113936301B (en) * 2021-07-02 2024-03-12 西北工业大学 Target re-identification method based on center point prediction loss function
CN113743251A (en) * 2021-08-17 2021-12-03 华中科技大学 Target searching method and device based on weak supervision scene
CN113743251B (en) * 2021-08-17 2024-02-13 华中科技大学 Target searching method and device based on weak supervision scene
CN113627383A (en) * 2021-08-25 2021-11-09 中国矿业大学 Pedestrian loitering re-identification method for panoramic intelligent security
CN115731588A (en) * 2021-08-27 2023-03-03 腾讯科技(深圳)有限公司 Model processing method and device
CN113920470A (en) * 2021-10-12 2022-01-11 中国电子科技集团公司第二十八研究所 Pedestrian retrieval method based on self-attention mechanism
CN114049609A (en) * 2021-11-24 2022-02-15 大连理工大学 Multilevel aggregation pedestrian re-identification method based on neural architecture search
CN117456560A (en) * 2023-12-22 2024-01-26 华侨大学 Pedestrian re-identification method based on foreground perception dynamic part learning
CN117456560B (en) * 2023-12-22 2024-03-29 华侨大学 Pedestrian re-identification method based on foreground perception dynamic part learning

Also Published As

Publication number Publication date
CN109948425B (en) 2023-06-09

Similar Documents

Publication Publication Date Title
CN109948425A (en) A kind of perception of structure is from paying attention to and online example polymerize matched pedestrian's searching method and device
CN107330396B (en) Pedestrian re-identification method based on multi-attribute and multi-strategy fusion learning
Qiao et al. LGPMA: complicated table structure recognition with local and global pyramid mask alignment
CN106407352B (en) Traffic image search method based on deep learning
CN110084195B (en) Remote sensing image target detection method based on convolutional neural network
CN105808732B (en) A kind of integrated Target attribute recognition and precise search method based on depth measure study
CN110334705A (en) A kind of Language Identification of the scene text image of the global and local information of combination
CN107967451A (en) A kind of method for carrying out crowd's counting to static image using multiple dimensioned multitask convolutional neural networks
CN108171184A (en) Method for distinguishing is known based on Siamese networks again for pedestrian
CN108830188A (en) Vehicle checking method based on deep learning
CN109559320A (en) Realize that vision SLAM semanteme builds the method and system of figure function based on empty convolution deep neural network
CN109800629A (en) A kind of Remote Sensing Target detection method based on convolutional neural networks
CN109711281A (en) A kind of pedestrian based on deep learning identifies again identifies fusion method with feature
CN106504233A (en) Image electric power widget recognition methodss and system are patrolled and examined based on the unmanned plane of Faster R CNN
CN106529499A (en) Fourier descriptor and gait energy image fusion feature-based gait identification method
CN107832835A (en) The light weight method and device of a kind of convolutional neural networks
CN108921107A (en) Pedestrian's recognition methods again based on sequence loss and Siamese network
CN113221625B (en) Method for re-identifying pedestrians by utilizing local features of deep learning
CN109165540A (en) A kind of pedestrian's searching method and device based on priori candidate frame selection strategy
CN110188209A (en) Cross-module state Hash model building method, searching method and device based on level label
CN108447080A (en) Method for tracking target, system and storage medium based on individual-layer data association and convolutional neural networks
He et al. Exemplar-driven top-down saliency detection via deep association
CN107316042A (en) A kind of pictorial image search method and device
CN111709311A (en) Pedestrian re-identification method based on multi-scale convolution feature fusion
CN111985367A (en) Pedestrian re-recognition feature extraction method based on multi-scale feature fusion

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant