CN109948425A

CN109948425A - A kind of perception of structure is from paying attention to and online example polymerize matched pedestrian's searching method and device

Info

Publication number: CN109948425A
Application number: CN201910061943.8A
Authority: CN
Inventors: 姚睿; 高存远; 赵佳琦; 周勇; 夏士雄; 王重秋
Original assignee: China University of Mining and Technology CUMT
Current assignee: China University of Mining and Technology CUMT
Priority date: 2019-01-22
Filing date: 2019-01-22
Publication date: 2019-06-28
Anticipated expiration: 2039-01-22
Also published as: CN109948425B

Abstract

The invention discloses a kind of perception of structure to polymerize matched pedestrian's searching method and device from attention and online example, belongs to computer vision technique processing technology field.In the training stage, pass through the combination of a convolutional neural networks and non local layer first, feature extraction is carried out to the entire scene image of input, its character representation is obtained, for the anchor point of this special object design structure perception of pedestrian, promotes detection framework performance, after the pedestrian's frame pond that will test out is melted into identical size, it is sent into pedestrian and identifies network training again, save, the pedestrian's feature of optimization and update with label.In the model measurement stage, pedestrian detection is carried out to input scene image using trained non local convolutional neural networks, after detecting pedestrian's frame, and carries out spy's similarity mode with target pedestrian image and sorts and retrieve.The present invention can carry out pedestrian detection simultaneously to large-scale reality scene image and identify again, play a significant role in safety-security areas such as supervision of the cities.

Description

A kind of perception of structure from pay attention to and online example polymerize matched pedestrian's searching method and Device

Technical field

The invention belongs to computer vision technique processing technology field, further relates to target detection and target retrieval is led Structure perception in one of field technique field polymerize matched pedestrian's searching method from attention and online example.

Background technique

Document " Joint detection and identification feature learning for person Search, Computer Vision and Pattern Recognition (CVPR), 2017IEEE Conference On.IEEE, 2017:3376-3385. ", which are disclosed, a kind of integrates pedestrian detection and pedestrian that pedestrian identifies again searches for new frame.Mesh Preceding pedestrian identifies that benchmark and method are mainly the pedestrian's picture for matching clipped mistake again, but the scene in reality will not this Sample is ideal, when doing pedestrian's search, needs first to mark pedestrian with the method for pedestrian detection, then know method for distinguishing again with pedestrian and search Rope goes out specific people.

Document proposes a new deep learning frame for pedestrian's search, it can be by pedestrian detection and pedestrian again Identification is integrated into a convolutional neural networks, and proposes to train network using online example match loss function, because of its energy It well adapts to largely identify data set.Document the method inevitably occurs false positive example, missing inspection when detecting pedestrian The problems such as with frame dislocation is surrounded, these effects that can all search for pedestrian have an impact, and the limitation of convolutional neural networks makes Model, to the information with global distribution, good cannot position the region of pedestrian's comparatively dense, encounter data without calligraphy learning Overall size is smaller, and the image informations such as posture behavior act of people are not abundant enough, and same label pedestrian sample it is less when It waits, single online example match loss function can not make model acquire the strong feature of distinction.

Summary of the invention

The problems such as in order to reduce the false positive example that the detection part of pedestrian's search occurs, missing inspection and encirclement frame dislocation, make simultaneously Pedestrian's search model is integrated into global information, and simultaneously pedestrian is accurately positioned in the region for the comparatively dense that more watches for pedestrians, and study is to Shandong The feature representation of stick inhibits net to generate over-fitting easily, it is promoted to search for the development in practical application in pedestrian, and the present invention mentions A kind of perception of structure is gone out from paying attention to and online example polymerize matched pedestrian's searching method.The anchor point perceived using structure, is mentioned The precision for rising pedestrian detection improves efficiency simultaneously.Non local company is introduced with lesser extra computation cost in pedestrian's dragnet network Operation is connect, output feature is connected pixel remote on same image, to help depth network more Non local information is merged well, will be embodied in the weight sets for exporting feature in the intensive region of pedestrian, is further increased output Model accuracy.In addition to this, this method combines online example match and center loss function, proposes online example polymerization With cost function, image and different classes of image from the same category are better discriminated between, so that pedestrian is searched for e-learning and arrives Diversification and the feature for having judgement index, to be effectively relieved, the generic image of data set is few and lack of diversity problem is to bring It influences.

The technical solution adopted by the present invention to solve the technical problems is:

A kind of perception of structure from paying attention to and online example polymerize matched pedestrian's searching method,

The following steps are included:

1, pedestrian's search model is constructed

(a) existing depth convolutional network is divided into two parts, head and tail portion, wherein depth convolutional network is used and moved Learning strategy is moved, imports and uses the trained network parameter of ImageNet data set, as the initial training of depth network Parameter is eventually adding non local layer in convolutional neural networks head portion, obtained characteristic pattern and meanwhile share to pedestrian detection and Weight identification division；

(b) the pedestrian detection frame for the anchor point that setting is perceived comprising structure above characteristic pattern, is responsible for detecting pedestrian's frame, The pedestrian's frame feature detected enters pond layer；

(c) it is arranged after the layer of pond and identifies network again, be responsible for preservation when acting on trained, optimization and update has label Pedestrian's feature, be responsible for searched targets pedestrian in model measurement；

2, it constructs training dataset and trained hyper parameter is set

When constructing training set, upset the sequence that training data concentrates image, generate training data group, one group of data contains Panoramic picture, the label of pedestrian's frame position and pedestrian that image the inside includes；Using batch descent method to smooth absolutely loss Function, cross entropy loss function and online example match loss function are optimized；

Learning rate and momentum is separately provided in center loss function, and with certain Weight four loss functions of summation；

Four loss functions include: smooth absolutely loss function, cross entropy loss function, the loss of online example match Function, center loss function function, wherein smooth absolutely loss function and cross entropy loss function respectively use multitask twice Optimize simultaneously；

3, training pedestrian's search model

(d) scene picture feature extraction inputs an entire image, by depth convolutional network head and following formula (1) Shown in non local layer obtain scene picture feature f₁, make Fusion Features global information, model enable to pay close attention to row in image The intensive region of people:

Wherein i indicates the position of output, and j indicates the traversal of all possible points, and f indicates that similarity function, g indicate input contracting Function is put, C (x) is normalization factor；

The scene picture feature vector f that will be obtained₁, by the convolutional neural networks of pedestrian detection fast area, gone People's candidate frame feature f₂, and the anchor point of structure perception, the formula such as (3) institute of anchor point are proposed for this special object of pedestrian Show,

The improved strategy of anchor point is as shown in (4):

Wherein A represents anchor point, and S represents size, and R represents ratio,Traversal is represented to be multiplied；

(e) candidate pedestrian's frame feature f is extracted in candidate pedestrian's frame detection₂Later, using the convolutional neural networks of fast area In smooth absolute loss function accurately return the positions and dimensions of pedestrian candidate frame, and pedestrian is waited with cross entropy loss function It selects the classification of frame to exercise supervision, is melted into identical size in the pedestrian's frame pond obtained based on feature vector, and be sent into depth convolution net The tail portion that network is divided into subsequently enters and identifies network again, extracts each pedestrian's frame feature after L2 regularization；

(f) pedestrian's characteristic matching extracts pedestrian's frame feature after L2 regularization, using online example match to there is mark Label identity and the feature without tag identity are saved, and are optimized and are updated, an inquiry table is arranged in propagated forward, are calculated most Cosine similarity in small batch between sample and all tag identities；

In back-propagating, if the tag along sort of target pedestrian is t, just updated in inquiry table using following formula T column, enable inquiry table to save many attitude of same target pedestrian and the various features under angle,

V_t←γV_t+ (1- γ) x, (5)

Wherein, wherein x is the feature of target pedestrian, V_tThe feature of target pedestrian in updated inquiry table, γ are to update Weight, γ=0.5 can be taken in section (0,1) interior value, this method；

Expression of the pedestrian's frame feature of the not tag identity occurred in scene picture as negative sample, for learning characteristic And it is of great value, these features without tag identity are saved by setting round-robin queue Q, with U ∈ R^D×QIt indicates, D × Q ties up matrix, and D is pedestrian's frame characteristic dimension after L2 regularization, and Q is the size of round-robin queue, and size is arranged according to actual scene, The cosine similarity U in U and minimum batch between sample x is calculated simultaneously^TX, after each round iteration, by new feature vector It is pressed into queue, and rejects those out-of-date feature vectors, the process of a circulation is presented；

It introduces center loss function shown in formula (6) and constraint is realized to the feature with tag identity, by reducing class The training of internal loss Optimized model, center loss function only train pedestrian's feature with label, make the same a group traveling together of model minimization Internal feature variation,

Wherein, X_i∈R^dThe feature of pedestrian's frame i is represented, it is to belong to people's identity label y_iClass,Represent the person Part label y_iThe central feature of class, m indicate the quantity of pedestrian pedestrian's classification；

Meanwhile pedestrian is accurately returned using the smooth absolute loss function in the convolutional neural networks of fast area again The positions and dimensions of candidate frame, and exercised supervision with cross entropy loss function to the classification of pedestrian candidate frame, obtain final row People's search model；

4, pedestrian's search model is tested.

In step 3 scene picture feature extraction, the similarity function f selects embedded Gaussian function.

In step 3 scene picture feature extraction, anchor point number by original 9 instead of 72, the size of each anchor point Ratio also accordingly changes, and specifically, is demarcated in the section of pedestrian's frame size and ratio integrated distribution in data set, anchor point setting The numerical point of size and ratio with comparatively dense；In the section that size and ratio are not concentrated, size and ratio is arranged in anchor point Numerical point between interval it is larger.

In step 3 candidate's pedestrian's frame detection, extracting each pedestrian's frame feature after L2 regularization is 256 dimensions.

Step 4 is specifically: for each picture library image, being calculated by network propagated forward and obtains all pedestrian candidate frames Feature replaces pedestrian's candidate frame with unique given bounding box, then propagated forward is calculated to obtain its spy for query image Vector is levied, finally, calculating the pairs of cosine similarity between query image feature and picture library candidate pedestrian's frame feature, is based on cosine The serial evaluation similarity level of similarity, and export the target pedestrian image of retrieval.

A kind of perception of structure is from paying attention to and online example polymerize matched pedestrian's searcher, comprising:

Pedestrian's search model constructs module, is responsible for pedestrian dragnet of the building based on attention mechanism and personal polymerization Network；After the convolutional neural networks of first part, be added attention mechanism in non local layer, blending image global information from And pay close attention to the intensive region of pedestrian；Pedestrian detection part constructs the anchor point of structure perception；It is poly- that online example is arranged in weight identification division Function is closed to supervise pedestrian's frame feature；

Network training module, using the training dataset constructed using batch gradient descent algorithm to the row built People searches for network and carries out parameter training；In the training stage, pass through the combination of a convolutional neural networks and non local layer first, it is right The entire scene image of input carries out feature extraction, obtains its character representation, designs knot for this special object of pedestrian The anchor point of structure perception, promotes detection framework performance, and feeding pedestrian identifies again after the pedestrian's frame pond that will test out is melted into identical size Network is saved, optimization and update have label using center loss function and the matched inquiry table training of online example match Pedestrian's feature, and using online example match round-robin queue to do not have label pedestrian's feature and some background informations into Row, which is rejected, to be updated, and trained pedestrian is finally searched for pedestrian when network takes out as test and searches for network；

Pedestrian's search model test module, for constructing test sample；And test sample is sent into trained pedestrian and is searched Rope network carries out pedestrian detection to the test sample scene image of input, detects after pedestrian's frame position and obtain its feature, Target pedestrian image is inputted again and obtains its feature, is carried out characteristic similarity with pedestrian's frame feature and is matched sequence and retrieving identity, with And determine its position in scene image.

The beneficial effects of the present invention are:

The first, it introduces from attention mechanism at present compared with the non local module of the technology in forward position, has effectively incorporated global information, It solves the problems, such as that non local feature is inflexible, model is allow to increasingly focus on the region of crowd massing in scene image.

Second, in the performance for the anchor point promotion detection framework that the stage of pedestrian detection by proposition there is structure to perceive, i.e., Make before the classification and recurrence for not yet completing anchor, the very close pedestrian's frame really marked, makes model restrain faster, Improve the efficiency of pedestrian detection.

Third polymerize matching cost function with online example, solve same label pedestrian sample it is less when, it is single One online example match loss function can not acquire the strong personal feature of distinction, so that the feature robustness that model is acquired is more By force, data set challenge bigger in reality scene is coped with.

Detailed description of the invention

Fig. 1 is the strategic process figure of pedestrian's weight identification division pedestrian frame feature training in pedestrian's search framework of the present invention.

Fig. 2 is that a kind of structure perception of the present invention pays attention to certainly and online example polymerize the network that matched pedestrian searches for network Figure.

Specific embodiment

In order to make the foregoing objectives, features and advantages of the present invention clearer and more comprehensible, below by specific embodiment and Attached drawing, the present invention will be further described.

It is the strategic process of pedestrian's weight identification division pedestrian frame feature training in pedestrian's search framework of the present invention as shown in Figure 1 Figure, including the following steps:

1, pedestrian's search model is constructed

(a) existing depth convolutional network is divided into two parts, is divided into head and tail portion, wherein depth convolutional network It using transfer learning strategy, imports and uses the trained network parameter of ImageNet data set, as the first of depth network Beginning training parameter is eventually adding non local layer in convolutional neural networks head portion, obtained characteristic pattern while sharing to pedestrian Detection and weight identification division.

(b) the pedestrian detection frame for the anchor point that setting is perceived comprising structure above characteristic pattern, is responsible for detecting pedestrian's frame, The pedestrian's frame feature detected enters pond layer.

(c) it is arranged after the layer of pond and identifies network again, be responsible for preservation when acting on trained, optimization and update has label Pedestrian's feature, be responsible for searched targets pedestrian in model measurement.

It is as follows that pedestrian described in step 1 searches for network architecture parameters:

For first layer input layer, it is 3, i.e. the three of image Color Channel that Feature Mapping map number, which is arranged,；

For second layer convolutional layer, it is 64 that Feature Mapping map number, which is arranged,；

For 9 layers of the residual block of third layer first, it is 256 that Feature Mapping map number, which is arranged,；

For the 4th layer of 12 layers of second residual block, it is 512 that Feature Mapping map number, which is arranged,；

For 9 layers of layer 5 third residual block, it is 1024 that Feature Mapping map number, which is arranged,；

Non local layer is connect for layer 6, keeping Feature Mapping map invariable number is still 1024；

For layer 7 convolutional layer, it is 512 that Feature Mapping map number, which is arranged,；And after layer 7 convolutional layer, setting Convolutional layer, setting Feature Mapping map number are anchor point number × 2, distinguish candidate frame prospect and background；

Another convolutional layer is set, and setting Feature Mapping map number is anchor point number × 4, returns candidate frame position and big It is small.

For the 8th layer of convolutional layer, the 4th 9 layers of residual block, setting Feature Mapping map number is 1024；

For the 9th layer of convolutional layer, the 5th 9 layers of residual block, setting Feature Mapping map number is 2048；

For three full articulamentums of the tenth layer of setting, it is 2,8,256 that Feature Mapping map number, which is respectively set, is respectively corresponded Distinguish candidate frame prospect and background, the recurrence of candidate frame position and size and the extraction of candidate frame pedestrian's feature.

2, it constructs training dataset and trained hyper parameter is set

When constructing training set, upset the sequence that training data concentrates image, generates training data group.One group of data contains Panoramic picture, the label of pedestrian's frame position and pedestrian that image the inside includes.Using batch descent method to smooth absolutely loss Function, cross entropy loss function and online example match loss function are optimized, and center loss function is separately provided Habit rate and momentum, and with certain above-mentioned 6 loss functions of Weight summation (that is: smooth absolutely loss function, cross entropy damage Function, online example match loss function, center loss function function are lost, wherein smooth absolutely loss function and intersection entropy loss Function respectively uses twice, as shown in Figure 1), multitask optimizes simultaneously.

3, training pedestrian's search model

(a) scene picture feature extraction.An entire image is exported, is obtained by depth convolutional network head and non local layer Feature f₁.Non local layer formula (1) is as follows:

Wherein i indicates the position of output, and j indicates the traversal of all possible points, and f indicates that similarity function, g indicate input contracting Function is put, C (x) is normalization factor.Similarity function f in the present invention selects embedded Gauss.

Formula (2) is as follows:

Wherein θ (x_i)=W_θx_iWith η (x_j)=W_ηx_jIt is two imbedding functions.

Feature vector is being obtained, by pedestrian detection Faster-RCNN, this special object proposes structure for pedestrian The anchor point of perception, formula (3), (4) are as follows:

Wherein A represents anchor point, and S represents size, and R represents ratio,It represents traversal to be multiplied, therefore anchor point number is by original 9 instead of 72, the dimensional ratios of each anchor point also accordingly change.Specifically, in data set demarcate pedestrian's frame size and In the section of ratio integrated distribution, anchor point setting has the size of comparatively dense and the numerical point of ratio；Do not collect in size and ratio In section in, the interval that anchor point is arranged between size and the numerical point of ratio is larger.

(b) candidate pedestrian's frame detection.After characteristic vector pickup candidate pedestrian's frame, respectively with smooth absolutely loss function and friendship Fork entropy loss function exercises supervision to the classification of candidate frame and position dimension, is melted into the pedestrian's frame pond obtained based on feature vector Identical size 7x7, and it is sent into the tail portion that depth convolutional network is divided into, and enter and identify network again, it extracts after L2 regularization Each pedestrian's frame feature be 256 dimension.

(c) pedestrian's characteristic matching.Pedestrian's frame feature after L2 regularization (256 dimension) is extracted, online example is utilized It is paired with tag identity and the feature without tag identity is saved, optimize and update, an inquiry table is set in propagated forward In, calculate the cosine similarity in minimum batch between sample and all tag identities.In back-propagating, if target pedestrian Tag along sort be t, just updated using following formula in inquiry table t column, formula such as (5)；

V_t←γV_t+ (1- γ) x, (5)

Wherein, x is the feature of target pedestrian, V_tThe feature of target pedestrian in updated inquiry table, γ are the power updated Weight, can take γ=0.5 in section (0,1) interior value, this method；

Expression of the pedestrian's frame feature of the not tag identity occurred in scene picture as negative sample for learning characteristic And it is of great value, these features without tag identity are saved by setting round-robin queue Q, with U ∈ R^D×QIt indicates, D × Q ties up matrix, and D is pedestrian's frame characteristic dimension after L2 regularization, and Q is the size of round-robin queue, and size is arranged according to actual scene, The cosine similarity U in U and minimum batch between sample x is calculated simultaneously^TX, after each round iteration, by new feature vector It is pressed into queue, and rejects those out-of-date feature vectors, the process of a circulation is presented；

When due to training, input frame is whole image, and different row of labels people be shown in each image be it is random, it is dilute In thin and unbalanced, it is difficult the positive negative sample pair of tissue equivalent, verifying cannot be introduced directly into the frame of Faster-RCNN Comparison loss item item, so the center loss function of introducing realizes certain constraint, by reducing Intra-class loss Optimized model instruction Practice, center loss function only trains pedestrian's feature with label.

Formula is such as shown in (6):

Wherein X_i∈R^dThe feature of pedestrian's frame i is represented, it is to belong to people's identity label y_iClass,Representative's identity Label y_iThe central feature of class, m indicate the quantity of pedestrian pedestrian's classification.

Meanwhile again using in Faster-RCNN smooth absolute loss function and cross entropy loss function to candidate The classification of frame and position dimension size carry out further fine.

4, pedestrian's search model is tested

For each picture library image, we calculate the feature for obtaining all pedestrian candidate frames by network propagated forward.It is right In query image, we replace pedestrian's candidate frame with uniquely given bounding box, and then propagated forward is calculated to obtain its feature Vector.Finally, we calculate the pairs of cosine similarity between query image feature and picture library candidate pedestrian's frame feature.Based on remaining The serial evaluation similarity level of string similarity, and export the target pedestrian image of retrieval.

The present invention further discloses a kind of perception of structure to polymerize matched pedestrian's searcher from attention and online example, Include:

The above embodiments are merely illustrative of the technical solutions of the present invention rather than is limited, the ordinary skill of this field Personnel can be with modification or equivalent replacement of the technical solution of the present invention are made, without departing from the spirit and scope of the present invention, this The protection scope of invention should be subject to described in claims.

Claims

1. a kind of structure perception polymerize matched pedestrian's searching method from attention and online example, which is characterized in that

The following steps are included:

1, pedestrian's search model is constructed

(a) existing depth convolutional network is divided into two parts, head and tail portion, wherein depth convolutional network is learned using migration Strategy is practised, imports and uses the trained network parameter of ImageNet data set, as the initial training parameter of depth network, It is eventually adding non local layer in convolutional neural networks head portion, obtained characteristic pattern while pedestrian detection is shared to and identifying again Part；

(b) the pedestrian detection frame for the anchor point that setting is perceived comprising structure above characteristic pattern, is responsible for detecting pedestrian's frame, detects Pedestrian's frame feature out enters pond layer；

(c) it is arranged after the layer of pond and identifies network again, be responsible for preservation when acting on trained, optimization and update has the row of label People's feature is responsible for searched targets pedestrian in model measurement；

2, it constructs training dataset and trained hyper parameter is set

When constructing training set, upset the sequence that training data concentrates image, generate training data group, one group of data contains panorama Image, the label of pedestrian's frame position and pedestrian that image the inside includes；Using batch descent method to smooth absolutely loss letter Number, cross entropy loss function and online example match loss function are optimized；

Four loss functions include: smooth absolutely loss function, cross entropy loss function, online example match loss letter Number, center loss function function, wherein smooth absolutely loss function and cross entropy loss function respectively use twice, and multitask is same Shi Youhua；

3, training pedestrian's search model

(d) scene picture feature extraction inputs an entire image, by shown in depth convolutional network head and following formula (1) Non local layer obtain scene picture feature f₁, make Fusion Features global information, so that model is paid close attention to pedestrian in image close The region of collection:

Wherein i indicates the position of output, and j indicates the traversal of all possible points, and f indicates that similarity function, g indicate input scaling letter Number, C (x) is normalization factor；

The scene picture feature vector f that will be obtained₁, by the convolutional neural networks of pedestrian detection fast area, obtain pedestrian candidate Frame feature f₂, and the anchor point of structure perception is proposed for this special object of pedestrian, the formula of anchor point such as (3) is shown,

The improved strategy of anchor point is as shown in (4):

(e) candidate pedestrian's frame feature f is extracted in candidate pedestrian's frame detection₂Later, using in the convolutional neural networks of fast area Smooth absolutely loss function accurately returns the positions and dimensions of pedestrian candidate frame, and with cross entropy loss function to pedestrian candidate frame Classification exercise supervision, be melted into identical size in pedestrian's frame pond for obtaining based on feature vector, and be sent into depth convolutional network point At tail portion, subsequently enter and identify network again, extract each pedestrian's frame feature after L2 regularization；

(f) pedestrian's characteristic matching extracts pedestrian's frame feature after L2 regularization, using online example match to there is label body Part and the feature without tag identity are saved, and are optimized and are updated, an inquiry table is arranged in propagated forward, are calculated most small quantities of Cosine similarity between secondary middle sample and all tag identities；

In back-propagating, if the tag along sort of target pedestrian is t, the t in inquiry table is just updated using following formula Column, enable inquiry table to save many attitude of same target pedestrian and the various features under angle,

V_t←γV_t+ (1- γ) x, (5)

Wherein, x is the feature of target pedestrian, the feature of target pedestrian in the updated inquiry table of Vt, and γ is the weight updated, can To take γ=0.5 in section (0,1) interior value, this method；

Pedestrian's frame feature of the not tag identity occurred in scene picture is also for the expression of learning characteristic as negative sample It is of great value, these features without tag identity are saved by setting round-robin queue Q, with U ∈ R^D×QIt indicates, D × Q dimension Matrix, D are pedestrian's frame characteristic dimensions after L2 regularization, and Q is the size of round-robin queue, size are arranged according to actual scene, together When calculate cosine similarity U in U and minimum batch between sample x^TX, after each round iteration, by new feature vector pressure Enqueue, and those out-of-date feature vectors are rejected, the process of a circulation is presented；

It introduces center loss function shown in formula (6) and constraint is realized to the feature with tag identity, damaged by reducing in class Optimized model training is lost, center loss function only trains pedestrian's feature with label, makes model minimization in a group traveling together Portion's changing features,

Wherein, X_i∈R^dThe feature of pedestrian's frame i is represented, it is to belong to people's identity label y_iClass,Representative's identity label y_iThe central feature of class, m indicate the quantity of pedestrian pedestrian's classification；

Meanwhile pedestrian candidate is accurately returned using the smooth absolute loss function in the convolutional neural networks of fast area again The positions and dimensions of frame simultaneously exercise supervision to the classification of pedestrian candidate frame with cross entropy loss function, obtain final pedestrian's search Model；

4, pedestrian's search model is tested.

2. structure perception polymerize matched pedestrian's searching method, feature from attention and online example according to claim 1 It is, in step 3 scene picture feature extraction, the similarity function f selects embedded Gaussian function.

3. structure perception polymerize matched pedestrian's searching method, feature from attention and online example according to claim 1 Be, in step 3 scene picture feature extraction, anchor point number by original 9 instead of 72, the dimensional ratios of each anchor point Also corresponding to change, it specifically, is demarcated in data set in the section of pedestrian's frame size and ratio integrated distribution, anchor point setting has The size of comparatively dense and the numerical point of ratio；In the section that size and ratio are not concentrated, the number of size and ratio is arranged in anchor point Interval between value point is larger.

4. structure perception polymerize matched pedestrian's searching method, feature from attention and online example according to claim 1 It is, in step 3 candidate's pedestrian's frame detection, extracting each pedestrian's frame feature after L2 regularization is 256 dimensions.

5. structure perception polymerize matched pedestrian's searching method, feature from attention and online example according to claim 1 It is, step 4 is specifically:

For each picture library image, the feature for obtaining all pedestrian candidate frames is calculated by network propagated forward, for query graph Picture replaces pedestrian's candidate frame with unique given bounding box, and then propagated forward is calculated to obtain its feature vector, finally, meter The pairs of cosine similarity between query image feature and picture library candidate pedestrian's frame feature is calculated, the sequence based on cosine similarity is commented Estimate similarity level, and exports the target pedestrian image of retrieval.

6. a kind of structure perception polymerize matched pedestrian's searcher from attention and online example characterized by comprising

Pedestrian's search model constructs module, is responsible for building based on the pedestrian of attention mechanism and personal polymerization and searches for network；? After the convolutional neural networks of first part, the non local layer in attention mechanism is added, blending image global information is to close Infuse the intensive region of pedestrian；

Pedestrian detection part constructs the anchor point of structure perception；

Online example aggregate function, which is arranged, in weight identification division supervises pedestrian's frame feature；

Network training module searches the pedestrian built using batch gradient descent algorithm using the training dataset constructed Rope network carries out parameter training；In the training stage, pass through the combination of a convolutional neural networks and non local layer first, to input Entire scene image carry out feature extraction, its character representation is obtained, for this special object design structure sense of pedestrian The anchor point known promotes detection framework performance, and feeding pedestrian identifies network again after the pedestrian's frame pond that will test out is melted into identical size, It using the training of the inquiry table of center loss function and online example match, saves, the pedestrian's feature of optimization and update with label, And the pedestrian's feature and some background informations that do not have label are rejected more using the round-robin queue of online example match Newly, trained pedestrian is finally searched for pedestrian when network takes out as test and searches for network；

Pedestrian's search model test module, for constructing test sample；And test sample is sent into trained pedestrian's dragnet Network carries out pedestrian detection to the test sample scene image of input, detects pedestrian's frame position later and obtain its feature, then is defeated Enter target pedestrian image and obtain its feature, carries out characteristic similarity with pedestrian's frame feature and match sequence and retrieving identity, and really Its fixed position in scene image.