CN109948425A - A kind of perception of structure is from paying attention to and online example polymerize matched pedestrian's searching method and device - Google Patents
A kind of perception of structure is from paying attention to and online example polymerize matched pedestrian's searching method and device Download PDFInfo
- Publication number
- CN109948425A CN109948425A CN201910061943.8A CN201910061943A CN109948425A CN 109948425 A CN109948425 A CN 109948425A CN 201910061943 A CN201910061943 A CN 201910061943A CN 109948425 A CN109948425 A CN 109948425A
- Authority
- CN
- China
- Prior art keywords
- pedestrian
- feature
- frame
- network
- loss function
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Abstract
The invention discloses a kind of perception of structure to polymerize matched pedestrian's searching method and device from attention and online example, belongs to computer vision technique processing technology field.In the training stage, pass through the combination of a convolutional neural networks and non local layer first, feature extraction is carried out to the entire scene image of input, its character representation is obtained, for the anchor point of this special object design structure perception of pedestrian, promotes detection framework performance, after the pedestrian's frame pond that will test out is melted into identical size, it is sent into pedestrian and identifies network training again, save, the pedestrian's feature of optimization and update with label.In the model measurement stage, pedestrian detection is carried out to input scene image using trained non local convolutional neural networks, after detecting pedestrian's frame, and carries out spy's similarity mode with target pedestrian image and sorts and retrieve.The present invention can carry out pedestrian detection simultaneously to large-scale reality scene image and identify again, play a significant role in safety-security areas such as supervision of the cities.
Description
Technical field
The invention belongs to computer vision technique processing technology field, further relates to target detection and target retrieval is led
Structure perception in one of field technique field polymerize matched pedestrian's searching method from attention and online example.
Background technique
Document " Joint detection and identification feature learning for person
Search, Computer Vision and Pattern Recognition (CVPR), 2017IEEE Conference
On.IEEE, 2017:3376-3385. ", which are disclosed, a kind of integrates pedestrian detection and pedestrian that pedestrian identifies again searches for new frame.Mesh
Preceding pedestrian identifies that benchmark and method are mainly the pedestrian's picture for matching clipped mistake again, but the scene in reality will not this
Sample is ideal, when doing pedestrian's search, needs first to mark pedestrian with the method for pedestrian detection, then know method for distinguishing again with pedestrian and search
Rope goes out specific people.
Document proposes a new deep learning frame for pedestrian's search, it can be by pedestrian detection and pedestrian again
Identification is integrated into a convolutional neural networks, and proposes to train network using online example match loss function, because of its energy
It well adapts to largely identify data set.Document the method inevitably occurs false positive example, missing inspection when detecting pedestrian
The problems such as with frame dislocation is surrounded, these effects that can all search for pedestrian have an impact, and the limitation of convolutional neural networks makes
Model, to the information with global distribution, good cannot position the region of pedestrian's comparatively dense, encounter data without calligraphy learning
Overall size is smaller, and the image informations such as posture behavior act of people are not abundant enough, and same label pedestrian sample it is less when
It waits, single online example match loss function can not make model acquire the strong feature of distinction.
Summary of the invention
The problems such as in order to reduce the false positive example that the detection part of pedestrian's search occurs, missing inspection and encirclement frame dislocation, make simultaneously
Pedestrian's search model is integrated into global information, and simultaneously pedestrian is accurately positioned in the region for the comparatively dense that more watches for pedestrians, and study is to Shandong
The feature representation of stick inhibits net to generate over-fitting easily, it is promoted to search for the development in practical application in pedestrian, and the present invention mentions
A kind of perception of structure is gone out from paying attention to and online example polymerize matched pedestrian's searching method.The anchor point perceived using structure, is mentioned
The precision for rising pedestrian detection improves efficiency simultaneously.Non local company is introduced with lesser extra computation cost in pedestrian's dragnet network
Operation is connect, output feature is connected pixel remote on same image, to help depth network more
Non local information is merged well, will be embodied in the weight sets for exporting feature in the intensive region of pedestrian, is further increased output
Model accuracy.In addition to this, this method combines online example match and center loss function, proposes online example polymerization
With cost function, image and different classes of image from the same category are better discriminated between, so that pedestrian is searched for e-learning and arrives
Diversification and the feature for having judgement index, to be effectively relieved, the generic image of data set is few and lack of diversity problem is to bring
It influences.
The technical solution adopted by the present invention to solve the technical problems is:
A kind of perception of structure from paying attention to and online example polymerize matched pedestrian's searching method,
The following steps are included:
1, pedestrian's search model is constructed
(a) existing depth convolutional network is divided into two parts, head and tail portion, wherein depth convolutional network is used and moved
Learning strategy is moved, imports and uses the trained network parameter of ImageNet data set, as the initial training of depth network
Parameter is eventually adding non local layer in convolutional neural networks head portion, obtained characteristic pattern and meanwhile share to pedestrian detection and
Weight identification division;
(b) the pedestrian detection frame for the anchor point that setting is perceived comprising structure above characteristic pattern, is responsible for detecting pedestrian's frame,
The pedestrian's frame feature detected enters pond layer;
(c) it is arranged after the layer of pond and identifies network again, be responsible for preservation when acting on trained, optimization and update has label
Pedestrian's feature, be responsible for searched targets pedestrian in model measurement;
2, it constructs training dataset and trained hyper parameter is set
When constructing training set, upset the sequence that training data concentrates image, generate training data group, one group of data contains
Panoramic picture, the label of pedestrian's frame position and pedestrian that image the inside includes;Using batch descent method to smooth absolutely loss
Function, cross entropy loss function and online example match loss function are optimized;
Learning rate and momentum is separately provided in center loss function, and with certain Weight four loss functions of summation;
Four loss functions include: smooth absolutely loss function, cross entropy loss function, the loss of online example match
Function, center loss function function, wherein smooth absolutely loss function and cross entropy loss function respectively use multitask twice
Optimize simultaneously;
3, training pedestrian's search model
(d) scene picture feature extraction inputs an entire image, by depth convolutional network head and following formula (1)
Shown in non local layer obtain scene picture feature f1, make Fusion Features global information, model enable to pay close attention to row in image
The intensive region of people:
Wherein i indicates the position of output, and j indicates the traversal of all possible points, and f indicates that similarity function, g indicate input contracting
Function is put, C (x) is normalization factor;
The scene picture feature vector f that will be obtained1, by the convolutional neural networks of pedestrian detection fast area, gone
People's candidate frame feature f2, and the anchor point of structure perception, the formula such as (3) institute of anchor point are proposed for this special object of pedestrian
Show,
The improved strategy of anchor point is as shown in (4):
Wherein A represents anchor point, and S represents size, and R represents ratio,Traversal is represented to be multiplied;
(e) candidate pedestrian's frame feature f is extracted in candidate pedestrian's frame detection2Later, using the convolutional neural networks of fast area
In smooth absolute loss function accurately return the positions and dimensions of pedestrian candidate frame, and pedestrian is waited with cross entropy loss function
It selects the classification of frame to exercise supervision, is melted into identical size in the pedestrian's frame pond obtained based on feature vector, and be sent into depth convolution net
The tail portion that network is divided into subsequently enters and identifies network again, extracts each pedestrian's frame feature after L2 regularization;
(f) pedestrian's characteristic matching extracts pedestrian's frame feature after L2 regularization, using online example match to there is mark
Label identity and the feature without tag identity are saved, and are optimized and are updated, an inquiry table is arranged in propagated forward, are calculated most
Cosine similarity in small batch between sample and all tag identities;
In back-propagating, if the tag along sort of target pedestrian is t, just updated in inquiry table using following formula
T column, enable inquiry table to save many attitude of same target pedestrian and the various features under angle,
Vt←γVt+ (1- γ) x, (5)
Wherein, wherein x is the feature of target pedestrian, VtThe feature of target pedestrian in updated inquiry table, γ are to update
Weight, γ=0.5 can be taken in section (0,1) interior value, this method;
Expression of the pedestrian's frame feature of the not tag identity occurred in scene picture as negative sample, for learning characteristic
And it is of great value, these features without tag identity are saved by setting round-robin queue Q, with U ∈ RD×QIt indicates, D ×
Q ties up matrix, and D is pedestrian's frame characteristic dimension after L2 regularization, and Q is the size of round-robin queue, and size is arranged according to actual scene,
The cosine similarity U in U and minimum batch between sample x is calculated simultaneouslyTX, after each round iteration, by new feature vector
It is pressed into queue, and rejects those out-of-date feature vectors, the process of a circulation is presented;
It introduces center loss function shown in formula (6) and constraint is realized to the feature with tag identity, by reducing class
The training of internal loss Optimized model, center loss function only train pedestrian's feature with label, make the same a group traveling together of model minimization
Internal feature variation,
Wherein, Xi∈RdThe feature of pedestrian's frame i is represented, it is to belong to people's identity label yiClass,Represent the person
Part label yiThe central feature of class, m indicate the quantity of pedestrian pedestrian's classification;
Meanwhile pedestrian is accurately returned using the smooth absolute loss function in the convolutional neural networks of fast area again
The positions and dimensions of candidate frame, and exercised supervision with cross entropy loss function to the classification of pedestrian candidate frame, obtain final row
People's search model;
4, pedestrian's search model is tested.
In step 3 scene picture feature extraction, the similarity function f selects embedded Gaussian function.
In step 3 scene picture feature extraction, anchor point number by original 9 instead of 72, the size of each anchor point
Ratio also accordingly changes, and specifically, is demarcated in the section of pedestrian's frame size and ratio integrated distribution in data set, anchor point setting
The numerical point of size and ratio with comparatively dense;In the section that size and ratio are not concentrated, size and ratio is arranged in anchor point
Numerical point between interval it is larger.
In step 3 candidate's pedestrian's frame detection, extracting each pedestrian's frame feature after L2 regularization is 256 dimensions.
Step 4 is specifically: for each picture library image, being calculated by network propagated forward and obtains all pedestrian candidate frames
Feature replaces pedestrian's candidate frame with unique given bounding box, then propagated forward is calculated to obtain its spy for query image
Vector is levied, finally, calculating the pairs of cosine similarity between query image feature and picture library candidate pedestrian's frame feature, is based on cosine
The serial evaluation similarity level of similarity, and export the target pedestrian image of retrieval.
A kind of perception of structure is from paying attention to and online example polymerize matched pedestrian's searcher, comprising:
Pedestrian's search model constructs module, is responsible for pedestrian dragnet of the building based on attention mechanism and personal polymerization
Network;After the convolutional neural networks of first part, be added attention mechanism in non local layer, blending image global information from
And pay close attention to the intensive region of pedestrian;Pedestrian detection part constructs the anchor point of structure perception;It is poly- that online example is arranged in weight identification division
Function is closed to supervise pedestrian's frame feature;
Network training module, using the training dataset constructed using batch gradient descent algorithm to the row built
People searches for network and carries out parameter training;In the training stage, pass through the combination of a convolutional neural networks and non local layer first, it is right
The entire scene image of input carries out feature extraction, obtains its character representation, designs knot for this special object of pedestrian
The anchor point of structure perception, promotes detection framework performance, and feeding pedestrian identifies again after the pedestrian's frame pond that will test out is melted into identical size
Network is saved, optimization and update have label using center loss function and the matched inquiry table training of online example match
Pedestrian's feature, and using online example match round-robin queue to do not have label pedestrian's feature and some background informations into
Row, which is rejected, to be updated, and trained pedestrian is finally searched for pedestrian when network takes out as test and searches for network;
Pedestrian's search model test module, for constructing test sample;And test sample is sent into trained pedestrian and is searched
Rope network carries out pedestrian detection to the test sample scene image of input, detects after pedestrian's frame position and obtain its feature,
Target pedestrian image is inputted again and obtains its feature, is carried out characteristic similarity with pedestrian's frame feature and is matched sequence and retrieving identity, with
And determine its position in scene image.
The beneficial effects of the present invention are:
The first, it introduces from attention mechanism at present compared with the non local module of the technology in forward position, has effectively incorporated global information,
It solves the problems, such as that non local feature is inflexible, model is allow to increasingly focus on the region of crowd massing in scene image.
Second, in the performance for the anchor point promotion detection framework that the stage of pedestrian detection by proposition there is structure to perceive, i.e.,
Make before the classification and recurrence for not yet completing anchor, the very close pedestrian's frame really marked, makes model restrain faster,
Improve the efficiency of pedestrian detection.
Third polymerize matching cost function with online example, solve same label pedestrian sample it is less when, it is single
One online example match loss function can not acquire the strong personal feature of distinction, so that the feature robustness that model is acquired is more
By force, data set challenge bigger in reality scene is coped with.
Detailed description of the invention
Fig. 1 is the strategic process figure of pedestrian's weight identification division pedestrian frame feature training in pedestrian's search framework of the present invention.
Fig. 2 is that a kind of structure perception of the present invention pays attention to certainly and online example polymerize the network that matched pedestrian searches for network
Figure.
Specific embodiment
In order to make the foregoing objectives, features and advantages of the present invention clearer and more comprehensible, below by specific embodiment and
Attached drawing, the present invention will be further described.
It is the strategic process of pedestrian's weight identification division pedestrian frame feature training in pedestrian's search framework of the present invention as shown in Figure 1
Figure, including the following steps:
1, pedestrian's search model is constructed
(a) existing depth convolutional network is divided into two parts, is divided into head and tail portion, wherein depth convolutional network
It using transfer learning strategy, imports and uses the trained network parameter of ImageNet data set, as the first of depth network
Beginning training parameter is eventually adding non local layer in convolutional neural networks head portion, obtained characteristic pattern while sharing to pedestrian
Detection and weight identification division.
(b) the pedestrian detection frame for the anchor point that setting is perceived comprising structure above characteristic pattern, is responsible for detecting pedestrian's frame,
The pedestrian's frame feature detected enters pond layer.
(c) it is arranged after the layer of pond and identifies network again, be responsible for preservation when acting on trained, optimization and update has label
Pedestrian's feature, be responsible for searched targets pedestrian in model measurement.
It is as follows that pedestrian described in step 1 searches for network architecture parameters:
For first layer input layer, it is 3, i.e. the three of image Color Channel that Feature Mapping map number, which is arranged,;
For second layer convolutional layer, it is 64 that Feature Mapping map number, which is arranged,;
For 9 layers of the residual block of third layer first, it is 256 that Feature Mapping map number, which is arranged,;
For the 4th layer of 12 layers of second residual block, it is 512 that Feature Mapping map number, which is arranged,;
For 9 layers of layer 5 third residual block, it is 1024 that Feature Mapping map number, which is arranged,;
Non local layer is connect for layer 6, keeping Feature Mapping map invariable number is still 1024;
For layer 7 convolutional layer, it is 512 that Feature Mapping map number, which is arranged,;And after layer 7 convolutional layer, setting
Convolutional layer, setting Feature Mapping map number are anchor point number × 2, distinguish candidate frame prospect and background;
Another convolutional layer is set, and setting Feature Mapping map number is anchor point number × 4, returns candidate frame position and big
It is small.
For the 8th layer of convolutional layer, the 4th 9 layers of residual block, setting Feature Mapping map number is 1024;
For the 9th layer of convolutional layer, the 5th 9 layers of residual block, setting Feature Mapping map number is 2048;
For three full articulamentums of the tenth layer of setting, it is 2,8,256 that Feature Mapping map number, which is respectively set, is respectively corresponded
Distinguish candidate frame prospect and background, the recurrence of candidate frame position and size and the extraction of candidate frame pedestrian's feature.
2, it constructs training dataset and trained hyper parameter is set
When constructing training set, upset the sequence that training data concentrates image, generates training data group.One group of data contains
Panoramic picture, the label of pedestrian's frame position and pedestrian that image the inside includes.Using batch descent method to smooth absolutely loss
Function, cross entropy loss function and online example match loss function are optimized, and center loss function is separately provided
Habit rate and momentum, and with certain above-mentioned 6 loss functions of Weight summation (that is: smooth absolutely loss function, cross entropy damage
Function, online example match loss function, center loss function function are lost, wherein smooth absolutely loss function and intersection entropy loss
Function respectively uses twice, as shown in Figure 1), multitask optimizes simultaneously.
3, training pedestrian's search model
(a) scene picture feature extraction.An entire image is exported, is obtained by depth convolutional network head and non local layer
Feature f1.Non local layer formula (1) is as follows:
Wherein i indicates the position of output, and j indicates the traversal of all possible points, and f indicates that similarity function, g indicate input contracting
Function is put, C (x) is normalization factor.Similarity function f in the present invention selects embedded Gauss.
Formula (2) is as follows:
Wherein θ (xi)=WθxiWith η (xj)=WηxjIt is two imbedding functions.
Feature vector is being obtained, by pedestrian detection Faster-RCNN, this special object proposes structure for pedestrian
The anchor point of perception, formula (3), (4) are as follows:
Wherein A represents anchor point, and S represents size, and R represents ratio,It represents traversal to be multiplied, therefore anchor point number is by original
9 instead of 72, the dimensional ratios of each anchor point also accordingly change.Specifically, in data set demarcate pedestrian's frame size and
In the section of ratio integrated distribution, anchor point setting has the size of comparatively dense and the numerical point of ratio;Do not collect in size and ratio
In section in, the interval that anchor point is arranged between size and the numerical point of ratio is larger.
(b) candidate pedestrian's frame detection.After characteristic vector pickup candidate pedestrian's frame, respectively with smooth absolutely loss function and friendship
Fork entropy loss function exercises supervision to the classification of candidate frame and position dimension, is melted into the pedestrian's frame pond obtained based on feature vector
Identical size 7x7, and it is sent into the tail portion that depth convolutional network is divided into, and enter and identify network again, it extracts after L2 regularization
Each pedestrian's frame feature be 256 dimension.
(c) pedestrian's characteristic matching.Pedestrian's frame feature after L2 regularization (256 dimension) is extracted, online example is utilized
It is paired with tag identity and the feature without tag identity is saved, optimize and update, an inquiry table is set in propagated forward
In, calculate the cosine similarity in minimum batch between sample and all tag identities.In back-propagating, if target pedestrian
Tag along sort be t, just updated using following formula in inquiry table t column, formula such as (5);
Vt←γVt+ (1- γ) x, (5)
Wherein, x is the feature of target pedestrian, VtThe feature of target pedestrian in updated inquiry table, γ are the power updated
Weight, can take γ=0.5 in section (0,1) interior value, this method;
Expression of the pedestrian's frame feature of the not tag identity occurred in scene picture as negative sample for learning characteristic
And it is of great value, these features without tag identity are saved by setting round-robin queue Q, with U ∈ RD×QIt indicates, D ×
Q ties up matrix, and D is pedestrian's frame characteristic dimension after L2 regularization, and Q is the size of round-robin queue, and size is arranged according to actual scene,
The cosine similarity U in U and minimum batch between sample x is calculated simultaneouslyTX, after each round iteration, by new feature vector
It is pressed into queue, and rejects those out-of-date feature vectors, the process of a circulation is presented;
When due to training, input frame is whole image, and different row of labels people be shown in each image be it is random, it is dilute
In thin and unbalanced, it is difficult the positive negative sample pair of tissue equivalent, verifying cannot be introduced directly into the frame of Faster-RCNN
Comparison loss item item, so the center loss function of introducing realizes certain constraint, by reducing Intra-class loss Optimized model instruction
Practice, center loss function only trains pedestrian's feature with label.
Formula is such as shown in (6):
Wherein Xi∈RdThe feature of pedestrian's frame i is represented, it is to belong to people's identity label yiClass,Representative's identity
Label yiThe central feature of class, m indicate the quantity of pedestrian pedestrian's classification.
Meanwhile again using in Faster-RCNN smooth absolute loss function and cross entropy loss function to candidate
The classification of frame and position dimension size carry out further fine.
4, pedestrian's search model is tested
For each picture library image, we calculate the feature for obtaining all pedestrian candidate frames by network propagated forward.It is right
In query image, we replace pedestrian's candidate frame with uniquely given bounding box, and then propagated forward is calculated to obtain its feature
Vector.Finally, we calculate the pairs of cosine similarity between query image feature and picture library candidate pedestrian's frame feature.Based on remaining
The serial evaluation similarity level of string similarity, and export the target pedestrian image of retrieval.
The present invention further discloses a kind of perception of structure to polymerize matched pedestrian's searcher from attention and online example,
Include:
Pedestrian's search model constructs module, is responsible for pedestrian dragnet of the building based on attention mechanism and personal polymerization
Network;After the convolutional neural networks of first part, be added attention mechanism in non local layer, blending image global information from
And pay close attention to the intensive region of pedestrian;Pedestrian detection part constructs the anchor point of structure perception;It is poly- that online example is arranged in weight identification division
Function is closed to supervise pedestrian's frame feature;
Network training module, using the training dataset constructed using batch gradient descent algorithm to the row built
People searches for network and carries out parameter training;In the training stage, pass through the combination of a convolutional neural networks and non local layer first, it is right
The entire scene image of input carries out feature extraction, obtains its character representation, designs knot for this special object of pedestrian
The anchor point of structure perception, promotes detection framework performance, and feeding pedestrian identifies again after the pedestrian's frame pond that will test out is melted into identical size
Network is saved, optimization and update have label using center loss function and the matched inquiry table training of online example match
Pedestrian's feature, and using online example match round-robin queue to do not have label pedestrian's feature and some background informations into
Row, which is rejected, to be updated, and trained pedestrian is finally searched for pedestrian when network takes out as test and searches for network;
Pedestrian's search model test module, for constructing test sample;And test sample is sent into trained pedestrian and is searched
Rope network carries out pedestrian detection to the test sample scene image of input, detects after pedestrian's frame position and obtain its feature,
Target pedestrian image is inputted again and obtains its feature, is carried out characteristic similarity with pedestrian's frame feature and is matched sequence and retrieving identity, with
And determine its position in scene image.
The above embodiments are merely illustrative of the technical solutions of the present invention rather than is limited, the ordinary skill of this field
Personnel can be with modification or equivalent replacement of the technical solution of the present invention are made, without departing from the spirit and scope of the present invention, this
The protection scope of invention should be subject to described in claims.
Claims (6)
1. a kind of structure perception polymerize matched pedestrian's searching method from attention and online example, which is characterized in that
The following steps are included:
1, pedestrian's search model is constructed
(a) existing depth convolutional network is divided into two parts, head and tail portion, wherein depth convolutional network is learned using migration
Strategy is practised, imports and uses the trained network parameter of ImageNet data set, as the initial training parameter of depth network,
It is eventually adding non local layer in convolutional neural networks head portion, obtained characteristic pattern while pedestrian detection is shared to and identifying again
Part;
(b) the pedestrian detection frame for the anchor point that setting is perceived comprising structure above characteristic pattern, is responsible for detecting pedestrian's frame, detects
Pedestrian's frame feature out enters pond layer;
(c) it is arranged after the layer of pond and identifies network again, be responsible for preservation when acting on trained, optimization and update has the row of label
People's feature is responsible for searched targets pedestrian in model measurement;
2, it constructs training dataset and trained hyper parameter is set
When constructing training set, upset the sequence that training data concentrates image, generate training data group, one group of data contains panorama
Image, the label of pedestrian's frame position and pedestrian that image the inside includes;Using batch descent method to smooth absolutely loss letter
Number, cross entropy loss function and online example match loss function are optimized;
Learning rate and momentum is separately provided in center loss function, and with certain Weight four loss functions of summation;
Four loss functions include: smooth absolutely loss function, cross entropy loss function, online example match loss letter
Number, center loss function function, wherein smooth absolutely loss function and cross entropy loss function respectively use twice, and multitask is same
Shi Youhua;
3, training pedestrian's search model
(d) scene picture feature extraction inputs an entire image, by shown in depth convolutional network head and following formula (1)
Non local layer obtain scene picture feature f1, make Fusion Features global information, so that model is paid close attention to pedestrian in image close
The region of collection:
Wherein i indicates the position of output, and j indicates the traversal of all possible points, and f indicates that similarity function, g indicate input scaling letter
Number, C (x) is normalization factor;
The scene picture feature vector f that will be obtained1, by the convolutional neural networks of pedestrian detection fast area, obtain pedestrian candidate
Frame feature f2, and the anchor point of structure perception is proposed for this special object of pedestrian, the formula of anchor point such as (3) is shown,
The improved strategy of anchor point is as shown in (4):
Wherein A represents anchor point, and S represents size, and R represents ratio,Traversal is represented to be multiplied;
(e) candidate pedestrian's frame feature f is extracted in candidate pedestrian's frame detection2Later, using in the convolutional neural networks of fast area
Smooth absolutely loss function accurately returns the positions and dimensions of pedestrian candidate frame, and with cross entropy loss function to pedestrian candidate frame
Classification exercise supervision, be melted into identical size in pedestrian's frame pond for obtaining based on feature vector, and be sent into depth convolutional network point
At tail portion, subsequently enter and identify network again, extract each pedestrian's frame feature after L2 regularization;
(f) pedestrian's characteristic matching extracts pedestrian's frame feature after L2 regularization, using online example match to there is label body
Part and the feature without tag identity are saved, and are optimized and are updated, an inquiry table is arranged in propagated forward, are calculated most small quantities of
Cosine similarity between secondary middle sample and all tag identities;
In back-propagating, if the tag along sort of target pedestrian is t, the t in inquiry table is just updated using following formula
Column, enable inquiry table to save many attitude of same target pedestrian and the various features under angle,
Vt←γVt+ (1- γ) x, (5)
Wherein, x is the feature of target pedestrian, the feature of target pedestrian in the updated inquiry table of Vt, and γ is the weight updated, can
To take γ=0.5 in section (0,1) interior value, this method;
Pedestrian's frame feature of the not tag identity occurred in scene picture is also for the expression of learning characteristic as negative sample
It is of great value, these features without tag identity are saved by setting round-robin queue Q, with U ∈ RD×QIt indicates, D × Q dimension
Matrix, D are pedestrian's frame characteristic dimensions after L2 regularization, and Q is the size of round-robin queue, size are arranged according to actual scene, together
When calculate cosine similarity U in U and minimum batch between sample xTX, after each round iteration, by new feature vector pressure
Enqueue, and those out-of-date feature vectors are rejected, the process of a circulation is presented;
It introduces center loss function shown in formula (6) and constraint is realized to the feature with tag identity, damaged by reducing in class
Optimized model training is lost, center loss function only trains pedestrian's feature with label, makes model minimization in a group traveling together
Portion's changing features,
Wherein, Xi∈RdThe feature of pedestrian's frame i is represented, it is to belong to people's identity label yiClass,Representative's identity label
yiThe central feature of class, m indicate the quantity of pedestrian pedestrian's classification;
Meanwhile pedestrian candidate is accurately returned using the smooth absolute loss function in the convolutional neural networks of fast area again
The positions and dimensions of frame simultaneously exercise supervision to the classification of pedestrian candidate frame with cross entropy loss function, obtain final pedestrian's search
Model;
4, pedestrian's search model is tested.
2. structure perception polymerize matched pedestrian's searching method, feature from attention and online example according to claim 1
It is, in step 3 scene picture feature extraction, the similarity function f selects embedded Gaussian function.
3. structure perception polymerize matched pedestrian's searching method, feature from attention and online example according to claim 1
Be, in step 3 scene picture feature extraction, anchor point number by original 9 instead of 72, the dimensional ratios of each anchor point
Also corresponding to change, it specifically, is demarcated in data set in the section of pedestrian's frame size and ratio integrated distribution, anchor point setting has
The size of comparatively dense and the numerical point of ratio;In the section that size and ratio are not concentrated, the number of size and ratio is arranged in anchor point
Interval between value point is larger.
4. structure perception polymerize matched pedestrian's searching method, feature from attention and online example according to claim 1
It is, in step 3 candidate's pedestrian's frame detection, extracting each pedestrian's frame feature after L2 regularization is 256 dimensions.
5. structure perception polymerize matched pedestrian's searching method, feature from attention and online example according to claim 1
It is, step 4 is specifically:
For each picture library image, the feature for obtaining all pedestrian candidate frames is calculated by network propagated forward, for query graph
Picture replaces pedestrian's candidate frame with unique given bounding box, and then propagated forward is calculated to obtain its feature vector, finally, meter
The pairs of cosine similarity between query image feature and picture library candidate pedestrian's frame feature is calculated, the sequence based on cosine similarity is commented
Estimate similarity level, and exports the target pedestrian image of retrieval.
6. a kind of structure perception polymerize matched pedestrian's searcher from attention and online example characterized by comprising
Pedestrian's search model constructs module, is responsible for building based on the pedestrian of attention mechanism and personal polymerization and searches for network;?
After the convolutional neural networks of first part, the non local layer in attention mechanism is added, blending image global information is to close
Infuse the intensive region of pedestrian;
Pedestrian detection part constructs the anchor point of structure perception;
Online example aggregate function, which is arranged, in weight identification division supervises pedestrian's frame feature;
Network training module searches the pedestrian built using batch gradient descent algorithm using the training dataset constructed
Rope network carries out parameter training;In the training stage, pass through the combination of a convolutional neural networks and non local layer first, to input
Entire scene image carry out feature extraction, its character representation is obtained, for this special object design structure sense of pedestrian
The anchor point known promotes detection framework performance, and feeding pedestrian identifies network again after the pedestrian's frame pond that will test out is melted into identical size,
It using the training of the inquiry table of center loss function and online example match, saves, the pedestrian's feature of optimization and update with label,
And the pedestrian's feature and some background informations that do not have label are rejected more using the round-robin queue of online example match
Newly, trained pedestrian is finally searched for pedestrian when network takes out as test and searches for network;
Pedestrian's search model test module, for constructing test sample;And test sample is sent into trained pedestrian's dragnet
Network carries out pedestrian detection to the test sample scene image of input, detects pedestrian's frame position later and obtain its feature, then is defeated
Enter target pedestrian image and obtain its feature, carries out characteristic similarity with pedestrian's frame feature and match sequence and retrieving identity, and really
Its fixed position in scene image.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910061943.8A CN109948425B (en) | 2019-01-22 | 2019-01-22 | Pedestrian searching method and device for structure-aware self-attention and online instance aggregation matching |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910061943.8A CN109948425B (en) | 2019-01-22 | 2019-01-22 | Pedestrian searching method and device for structure-aware self-attention and online instance aggregation matching |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109948425A true CN109948425A (en) | 2019-06-28 |
CN109948425B CN109948425B (en) | 2023-06-09 |
Family
ID=67007387
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910061943.8A Active CN109948425B (en) | 2019-01-22 | 2019-01-22 | Pedestrian searching method and device for structure-aware self-attention and online instance aggregation matching |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109948425B (en) |
Cited By (32)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110348014A (en) * | 2019-07-10 | 2019-10-18 | 电子科技大学 | A kind of semantic similarity calculation method based on deep learning |
CN110555420A (en) * | 2019-09-09 | 2019-12-10 | 电子科技大学 | fusion model network and method based on pedestrian regional feature extraction and re-identification |
CN110569738A (en) * | 2019-08-15 | 2019-12-13 | 杨春立 | natural scene text detection method, equipment and medium based on dense connection network |
CN110647816A (en) * | 2019-08-26 | 2020-01-03 | 合肥工业大学 | Target detection method for real-time monitoring of goods shelf medicines |
CN110659721A (en) * | 2019-08-02 | 2020-01-07 | 浙江省北大信息技术高等研究院 | Method and system for constructing target detection network |
CN110765880A (en) * | 2019-09-24 | 2020-02-07 | 中国矿业大学 | Light-weight video pedestrian heavy identification method |
CN111027397A (en) * | 2019-11-14 | 2020-04-17 | 上海交通大学 | Method, system, medium and device for detecting comprehensive characteristic target in intelligent monitoring network |
CN111241944A (en) * | 2019-12-31 | 2020-06-05 | 浙江大学 | Scene recognition and loopback detection method based on background target detection and background feature similarity matching |
CN111401286A (en) * | 2020-03-24 | 2020-07-10 | 武汉大学 | Pedestrian retrieval method based on component weight generation network |
CN111539257A (en) * | 2020-03-31 | 2020-08-14 | 苏州科达科技股份有限公司 | Personnel re-identification method, device and storage medium |
CN111582225A (en) * | 2020-05-19 | 2020-08-25 | 长沙理工大学 | Remote sensing image scene classification method and device |
CN111695526A (en) * | 2020-06-15 | 2020-09-22 | 北京爱笔科技有限公司 | Network model generation method, pedestrian re-identification method and device |
CN111695470A (en) * | 2020-06-02 | 2020-09-22 | 中山大学 | Visible light-near infrared pedestrian re-identification method based on depth feature orthogonal decomposition |
CN111709311A (en) * | 2020-05-27 | 2020-09-25 | 西安理工大学 | Pedestrian re-identification method based on multi-scale convolution feature fusion |
CN111723728A (en) * | 2020-06-18 | 2020-09-29 | 中国科学院自动化研究所 | Pedestrian searching method, system and device based on bidirectional interactive network |
CN111723719A (en) * | 2020-06-12 | 2020-09-29 | 中国科学院自动化研究所 | Video target detection method, system and device based on category external memory |
CN111814845A (en) * | 2020-03-26 | 2020-10-23 | 同济大学 | Pedestrian re-identification method based on multi-branch flow fusion model |
CN111914107A (en) * | 2020-07-29 | 2020-11-10 | 厦门大学 | Instance retrieval method based on multi-channel attention area expansion |
CN112016591A (en) * | 2020-08-04 | 2020-12-01 | 杰创智能科技股份有限公司 | Training method of image recognition model and image recognition method |
CN112241682A (en) * | 2020-09-14 | 2021-01-19 | 同济大学 | End-to-end pedestrian searching method based on blocking and multi-layer information fusion |
CN112464730A (en) * | 2020-11-03 | 2021-03-09 | 南京理工大学 | Pedestrian re-identification method based on domain-independent foreground feature learning |
CN112597956A (en) * | 2020-12-30 | 2021-04-02 | 华侨大学 | Multi-person attitude estimation method based on human body anchor point set and perception enhancement network |
CN113076861A (en) * | 2021-03-30 | 2021-07-06 | 南京大学环境规划设计研究院集团股份公司 | Bird fine-granularity identification method based on second-order features |
CN113095106A (en) * | 2019-12-23 | 2021-07-09 | 华为数字技术(苏州)有限公司 | Human body posture estimation method and device |
CN113627383A (en) * | 2021-08-25 | 2021-11-09 | 中国矿业大学 | Pedestrian loitering re-identification method for panoramic intelligent security |
CN113743251A (en) * | 2021-08-17 | 2021-12-03 | 华中科技大学 | Target searching method and device based on weak supervision scene |
CN113920470A (en) * | 2021-10-12 | 2022-01-11 | 中国电子科技集团公司第二十八研究所 | Pedestrian retrieval method based on self-attention mechanism |
CN113936301A (en) * | 2021-07-02 | 2022-01-14 | 西北工业大学 | Target re-identification method based on central point prediction loss function |
CN114049609A (en) * | 2021-11-24 | 2022-02-15 | 大连理工大学 | Multilevel aggregation pedestrian re-identification method based on neural architecture search |
US20220058396A1 (en) * | 2019-11-19 | 2022-02-24 | Tencent Technology (Shenzhen) Company Limited | Video Classification Model Construction Method and Apparatus, Video Classification Method and Apparatus, Device, and Medium |
CN115731588A (en) * | 2021-08-27 | 2023-03-03 | 腾讯科技(深圳)有限公司 | Model processing method and device |
CN117456560A (en) * | 2023-12-22 | 2024-01-26 | 华侨大学 | Pedestrian re-identification method based on foreground perception dynamic part learning |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108197584A (en) * | 2018-01-12 | 2018-06-22 | 武汉大学 | A kind of recognition methods again of the pedestrian based on triple deep neural network |
CN109165540A (en) * | 2018-06-13 | 2019-01-08 | 深圳市感动智能科技有限公司 | A kind of pedestrian's searching method and device based on priori candidate frame selection strategy |
-
2019
- 2019-01-22 CN CN201910061943.8A patent/CN109948425B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108197584A (en) * | 2018-01-12 | 2018-06-22 | 武汉大学 | A kind of recognition methods again of the pedestrian based on triple deep neural network |
CN109165540A (en) * | 2018-06-13 | 2019-01-08 | 深圳市感动智能科技有限公司 | A kind of pedestrian's searching method and device based on priori candidate frame selection strategy |
Cited By (51)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110348014A (en) * | 2019-07-10 | 2019-10-18 | 电子科技大学 | A kind of semantic similarity calculation method based on deep learning |
CN110659721A (en) * | 2019-08-02 | 2020-01-07 | 浙江省北大信息技术高等研究院 | Method and system for constructing target detection network |
CN110659721B (en) * | 2019-08-02 | 2022-07-22 | 杭州未名信科科技有限公司 | Method and system for constructing target detection network |
CN110569738A (en) * | 2019-08-15 | 2019-12-13 | 杨春立 | natural scene text detection method, equipment and medium based on dense connection network |
CN110569738B (en) * | 2019-08-15 | 2023-06-06 | 杨春立 | Natural scene text detection method, equipment and medium based on densely connected network |
CN110647816A (en) * | 2019-08-26 | 2020-01-03 | 合肥工业大学 | Target detection method for real-time monitoring of goods shelf medicines |
CN110647816B (en) * | 2019-08-26 | 2022-11-22 | 合肥工业大学 | Target detection method for real-time monitoring of goods shelf medicines |
CN110555420A (en) * | 2019-09-09 | 2019-12-10 | 电子科技大学 | fusion model network and method based on pedestrian regional feature extraction and re-identification |
CN110555420B (en) * | 2019-09-09 | 2022-04-12 | 电子科技大学 | Fusion model network and method based on pedestrian regional feature extraction and re-identification |
CN110765880B (en) * | 2019-09-24 | 2023-04-18 | 中国矿业大学 | Light-weight video pedestrian heavy identification method |
CN110765880A (en) * | 2019-09-24 | 2020-02-07 | 中国矿业大学 | Light-weight video pedestrian heavy identification method |
CN111027397B (en) * | 2019-11-14 | 2023-05-12 | 上海交通大学 | Comprehensive feature target detection method, system, medium and equipment suitable for intelligent monitoring network |
CN111027397A (en) * | 2019-11-14 | 2020-04-17 | 上海交通大学 | Method, system, medium and device for detecting comprehensive characteristic target in intelligent monitoring network |
US20220058396A1 (en) * | 2019-11-19 | 2022-02-24 | Tencent Technology (Shenzhen) Company Limited | Video Classification Model Construction Method and Apparatus, Video Classification Method and Apparatus, Device, and Medium |
US11967152B2 (en) * | 2019-11-19 | 2024-04-23 | Tencent Technology (Shenzhen) Company Limited | Video classification model construction method and apparatus, video classification method and apparatus, device, and medium |
CN113095106A (en) * | 2019-12-23 | 2021-07-09 | 华为数字技术(苏州)有限公司 | Human body posture estimation method and device |
CN111241944B (en) * | 2019-12-31 | 2023-05-26 | 浙江大学 | Scene recognition and loop detection method based on background target and background feature matching |
CN111241944A (en) * | 2019-12-31 | 2020-06-05 | 浙江大学 | Scene recognition and loopback detection method based on background target detection and background feature similarity matching |
CN111401286A (en) * | 2020-03-24 | 2020-07-10 | 武汉大学 | Pedestrian retrieval method based on component weight generation network |
CN111401286B (en) * | 2020-03-24 | 2022-03-04 | 武汉大学 | Pedestrian retrieval method based on component weight generation network |
CN111814845A (en) * | 2020-03-26 | 2020-10-23 | 同济大学 | Pedestrian re-identification method based on multi-branch flow fusion model |
CN111814845B (en) * | 2020-03-26 | 2022-09-20 | 同济大学 | Pedestrian re-identification method based on multi-branch flow fusion model |
CN111539257B (en) * | 2020-03-31 | 2022-07-26 | 苏州科达科技股份有限公司 | Person re-identification method, device and storage medium |
CN111539257A (en) * | 2020-03-31 | 2020-08-14 | 苏州科达科技股份有限公司 | Personnel re-identification method, device and storage medium |
CN111582225A (en) * | 2020-05-19 | 2020-08-25 | 长沙理工大学 | Remote sensing image scene classification method and device |
CN111709311A (en) * | 2020-05-27 | 2020-09-25 | 西安理工大学 | Pedestrian re-identification method based on multi-scale convolution feature fusion |
CN111695470A (en) * | 2020-06-02 | 2020-09-22 | 中山大学 | Visible light-near infrared pedestrian re-identification method based on depth feature orthogonal decomposition |
CN111695470B (en) * | 2020-06-02 | 2023-05-12 | 中山大学 | Visible light-near infrared pedestrian re-identification method based on depth feature orthogonal decomposition |
CN111723719A (en) * | 2020-06-12 | 2020-09-29 | 中国科学院自动化研究所 | Video target detection method, system and device based on category external memory |
CN111695526A (en) * | 2020-06-15 | 2020-09-22 | 北京爱笔科技有限公司 | Network model generation method, pedestrian re-identification method and device |
CN111695526B (en) * | 2020-06-15 | 2023-10-13 | 北京爱笔科技有限公司 | Network model generation method, pedestrian re-recognition method and device |
CN111723728A (en) * | 2020-06-18 | 2020-09-29 | 中国科学院自动化研究所 | Pedestrian searching method, system and device based on bidirectional interactive network |
CN111914107B (en) * | 2020-07-29 | 2022-06-14 | 厦门大学 | Instance retrieval method based on multi-channel attention area expansion |
CN111914107A (en) * | 2020-07-29 | 2020-11-10 | 厦门大学 | Instance retrieval method based on multi-channel attention area expansion |
CN112016591A (en) * | 2020-08-04 | 2020-12-01 | 杰创智能科技股份有限公司 | Training method of image recognition model and image recognition method |
CN112241682A (en) * | 2020-09-14 | 2021-01-19 | 同济大学 | End-to-end pedestrian searching method based on blocking and multi-layer information fusion |
CN112464730A (en) * | 2020-11-03 | 2021-03-09 | 南京理工大学 | Pedestrian re-identification method based on domain-independent foreground feature learning |
CN112597956B (en) * | 2020-12-30 | 2023-06-02 | 华侨大学 | Multi-person gesture estimation method based on human body anchor point set and perception enhancement network |
CN112597956A (en) * | 2020-12-30 | 2021-04-02 | 华侨大学 | Multi-person attitude estimation method based on human body anchor point set and perception enhancement network |
CN113076861B (en) * | 2021-03-30 | 2022-02-25 | 南京大学环境规划设计研究院集团股份公司 | Bird fine-granularity identification method based on second-order features |
CN113076861A (en) * | 2021-03-30 | 2021-07-06 | 南京大学环境规划设计研究院集团股份公司 | Bird fine-granularity identification method based on second-order features |
CN113936301A (en) * | 2021-07-02 | 2022-01-14 | 西北工业大学 | Target re-identification method based on central point prediction loss function |
CN113936301B (en) * | 2021-07-02 | 2024-03-12 | 西北工业大学 | Target re-identification method based on center point prediction loss function |
CN113743251A (en) * | 2021-08-17 | 2021-12-03 | 华中科技大学 | Target searching method and device based on weak supervision scene |
CN113743251B (en) * | 2021-08-17 | 2024-02-13 | 华中科技大学 | Target searching method and device based on weak supervision scene |
CN113627383A (en) * | 2021-08-25 | 2021-11-09 | 中国矿业大学 | Pedestrian loitering re-identification method for panoramic intelligent security |
CN115731588A (en) * | 2021-08-27 | 2023-03-03 | 腾讯科技(深圳)有限公司 | Model processing method and device |
CN113920470A (en) * | 2021-10-12 | 2022-01-11 | 中国电子科技集团公司第二十八研究所 | Pedestrian retrieval method based on self-attention mechanism |
CN114049609A (en) * | 2021-11-24 | 2022-02-15 | 大连理工大学 | Multilevel aggregation pedestrian re-identification method based on neural architecture search |
CN117456560A (en) * | 2023-12-22 | 2024-01-26 | 华侨大学 | Pedestrian re-identification method based on foreground perception dynamic part learning |
CN117456560B (en) * | 2023-12-22 | 2024-03-29 | 华侨大学 | Pedestrian re-identification method based on foreground perception dynamic part learning |
Also Published As
Publication number | Publication date |
---|---|
CN109948425B (en) | 2023-06-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109948425A (en) | A kind of perception of structure is from paying attention to and online example polymerize matched pedestrian's searching method and device | |
CN107330396B (en) | Pedestrian re-identification method based on multi-attribute and multi-strategy fusion learning | |
Qiao et al. | LGPMA: complicated table structure recognition with local and global pyramid mask alignment | |
CN106407352B (en) | Traffic image search method based on deep learning | |
CN110084195B (en) | Remote sensing image target detection method based on convolutional neural network | |
CN105808732B (en) | A kind of integrated Target attribute recognition and precise search method based on depth measure study | |
CN110334705A (en) | A kind of Language Identification of the scene text image of the global and local information of combination | |
CN107967451A (en) | A kind of method for carrying out crowd's counting to static image using multiple dimensioned multitask convolutional neural networks | |
CN108171184A (en) | Method for distinguishing is known based on Siamese networks again for pedestrian | |
CN108830188A (en) | Vehicle checking method based on deep learning | |
CN109559320A (en) | Realize that vision SLAM semanteme builds the method and system of figure function based on empty convolution deep neural network | |
CN109800629A (en) | A kind of Remote Sensing Target detection method based on convolutional neural networks | |
CN109711281A (en) | A kind of pedestrian based on deep learning identifies again identifies fusion method with feature | |
CN106504233A (en) | Image electric power widget recognition methodss and system are patrolled and examined based on the unmanned plane of Faster R CNN | |
CN106529499A (en) | Fourier descriptor and gait energy image fusion feature-based gait identification method | |
CN107832835A (en) | The light weight method and device of a kind of convolutional neural networks | |
CN108921107A (en) | Pedestrian's recognition methods again based on sequence loss and Siamese network | |
CN113221625B (en) | Method for re-identifying pedestrians by utilizing local features of deep learning | |
CN109165540A (en) | A kind of pedestrian's searching method and device based on priori candidate frame selection strategy | |
CN110188209A (en) | Cross-module state Hash model building method, searching method and device based on level label | |
CN108447080A (en) | Method for tracking target, system and storage medium based on individual-layer data association and convolutional neural networks | |
He et al. | Exemplar-driven top-down saliency detection via deep association | |
CN107316042A (en) | A kind of pictorial image search method and device | |
CN111709311A (en) | Pedestrian re-identification method based on multi-scale convolution feature fusion | |
CN111985367A (en) | Pedestrian re-recognition feature extraction method based on multi-scale feature fusion |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |