CN110232348A

CN110232348A - Pedestrian's attribute recognition approach, device and computer equipment

Info

Publication number: CN110232348A
Application number: CN201910495047.2A
Authority: CN
Inventors: 刘皓
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2019-06-10
Filing date: 2019-06-10
Publication date: 2019-09-13

Abstract

This application involves a kind of pedestrian's attribute recognition approaches, it include: to carry out down-sampling to pedestrian image to generate characteristic fragment sequence, by characteristic fragment sequence inputting first from attention model extract characteristic fragment between relationship weight, characteristic fragment sequence is updated according to relationship weight, obtains updated characteristic fragment sequence for the first time.Updated characteristic fragment sequence inputting characteristic fragment for the first time and attribute attention model are extracted into the relationship weight between characteristic fragment and attribute, updated characteristic fragment sequence for the first time is updated according to relationship weight to obtain second of updated characteristic fragment sequence.Second of updated characteristic fragment sequence inputting second is extracted into the relationship weight between attribute from attention model, second of updated characteristic fragment sequence is updated according to relationship weight to obtain the updated characteristic fragment sequence of third time.Pedestrian's Attribute Recognition is carried out according to the updated characteristic fragment sequence of third time, obtains pedestrian's Attribute Recognition result.

Description

Pedestrian's attribute recognition approach, device and computer equipment

Technical field

This application involves field of computer technology, more particularly to a kind of pedestrian's attribute recognition approach, device and computer Equipment.

Background technique

In field of video monitoring or other fields for needing to identify pedestrian's attribute, it is often necessary to the property of pedestrian Not, the age, clothing, whether be branded as, whether wear glasses, the attributes such as hair style are identified.Pedestrian's recognition result is to public security department Crime is carried out to seize or be the pedestrian in specific region to be investigated etc. all to play crucial effect.Therefore, Hang Renshi The accuracy of other result is with regard to most important.However, since the video resolution that there is monitor camera shooting in practice is low, different The problems such as pedestrian's angle of position camera shooting is different, illumination condition is different and pedestrian is blocked, so that pedestrian is in video camera There is larger interferences for lower shown attribute, therefore, pedestrian's attribute that traditional pedestrian's attribute recognition approach is identified Accuracy is lower.

Summary of the invention

Based on this, it is necessary to which the pedestrian's attribute accuracy identified for traditional pedestrian's attribute recognition approach is lower Technical problem provides a kind of pedestrian's attribute recognition approach, device and computer equipment.

A kind of pedestrian's attribute recognition approach, comprising:

Down-sampling is carried out to pedestrian image and generates characteristic fragment sequence；

The characteristic fragment sequence inputting first is extracted into the relationship weight between characteristic fragment from attention model, according to The relationship weight is updated the characteristic fragment sequence, obtains updated characteristic fragment sequence for the first time；

The first time updated characteristic fragment sequence inputting characteristic fragment and attribute attention model are extracted into feature Relationship weight between segment and attribute carries out the first time updated characteristic fragment sequence according to the relationship weight Update obtains second of updated characteristic fragment sequence；

The second updated characteristic fragment sequence inputting second is extracted from attention model to the pass between attribute It is weight, the second updated characteristic fragment sequence is updated to obtain third time according to the relationship weight updates Characteristic fragment sequence afterwards；

Pedestrian's Attribute Recognition is carried out according to the updated characteristic fragment sequence of the third time, obtains pedestrian's Attribute Recognition knot Fruit.

A kind of pedestrian's attribute recognition approach, comprising:

The characteristic fragment sequence inputting shot and long term memory models are encoded, characteristic fragment sequential coding knot is obtained Fruit；

The characteristic fragment sequential coding result is inputted first from the relationship between attention model extraction characteristic fragment Weight is updated the characteristic fragment sequence according to the relationship weight, obtains updated characteristic fragment sequence for the first time Column；

Second of updated characteristic fragment sequence inputting shot and long term memory models are decoded, feature piece is obtained Section sequential decoding result；

Characteristic fragment sequential decoding result input second is extracted into the relationship weight between attribute from attention model, Second of updated characteristic fragment sequence is updated to obtain the updated spy of third time according to the relationship weight Levy fragment sequence；

A kind of pedestrian's property recognition means, described device include:

Characteristic fragment sequence generating module generates characteristic fragment sequence for carrying out down-sampling to pedestrian image；

Relationship weight extraction module between characteristic fragment, for by the characteristic fragment sequence inputting first from attention Relationship weight between model extraction characteristic fragment is updated the characteristic fragment sequence according to the relationship weight, obtains To updated characteristic fragment sequence for the first time；

Relationship weight extraction module between characteristic fragment and attribute is used for the first time updated characteristic fragment Sequence inputting characteristic fragment and attribute attention model extract the relationship weight between characteristic fragment and attribute, according to the relationship Weight is updated the first time updated characteristic fragment sequence to obtain second of updated characteristic fragment sequence；

Relationship weight extraction module between attribute, for will the updated characteristic fragment sequence inputting for the second time Two extract the relationship weight between attribute from attention model, according to the relationship weight to second of updated feature Fragment sequence is updated to obtain the updated characteristic fragment sequence of third time；

Pedestrian's Attribute Recognition module, for carrying out the knowledge of pedestrian's attribute according to the updated characteristic fragment sequence of the third time Not, pedestrian's Attribute Recognition result is obtained.

A kind of computer readable storage medium is stored with computer program, when the computer program is executed by processor, So that the processor executes the step of method as described above.

A kind of computer equipment, including memory and processor, the memory are stored with computer program, the calculating When machine program is executed by the processor, so that the step of processor executes method as described above.

Above-mentioned pedestrian's attribute recognition approach, device and computer equipment carry out down-sampling to pedestrian image and generate feature piece Characteristic fragment sequence inputting first is extracted the relationship weight between characteristic fragment from attention model, according to relationship by Duan Xulie Weight is updated characteristic fragment sequence, obtains updated characteristic fragment sequence for the first time.It will first time updated spy The relationship weight between fragment sequence input feature vector segment and attribute attention model extraction characteristic fragment and attribute is levied, according to pass It is that weight is updated updated characteristic fragment sequence for the first time to obtain second of updated characteristic fragment sequence.By Secondary updated characteristic fragment sequence inputting second extracts the relationship weight between attribute from attention model, is weighed according to relationship Second of updated characteristic fragment sequence is updated again to obtain the updated characteristic fragment sequence of third time.According to third Secondary updated characteristic fragment sequence carries out pedestrian's Attribute Recognition, obtains pedestrian's Attribute Recognition result.

This method directly automatically generates characteristic fragment sequence for pedestrian image as input, so to characteristic fragment sequence according to Relationship weight, characteristic fragment between secondary extraction characteristic fragment and the relationship weight between attribute and the relationship between attribute are weighed Weight, is extracted the relationship weight of tri-layer, and be successively updated to characteristic fragment sequence.The characteristic fragment sequence obtained in this way The relationship weight of this tri-layer is just taken into account, so that just more by the attributive character of the obtained pedestrian of this feature fragment sequence Accurately, and then the accuracy of finally obtained pedestrian's Attribute Recognition result is improved.

Detailed description of the invention

Fig. 1 is the applied environment figure of pedestrian's attribute recognition approach in one embodiment；

Fig. 2 is the flow diagram of pedestrian's attribute recognition approach in one embodiment；

Fig. 2A is the schematic diagram of the relationship weight in pedestrian image between characteristic fragment and attribute；

Fig. 3 is the network structure of pedestrian's attribute recognition approach in Fig. 2；

Fig. 4 is to carry out the flow diagram that down-sampling generates characteristic fragment sequence method to pedestrian image in Fig. 2；

Fig. 5 is to carry out the flow diagram that down-sampling generates characteristic fragment sequence method to pedestrian image in Fig. 4；

Fig. 6 is that characteristic fragment sequence inputting to first is obtained updated feature for the first time from attention model in Fig. 2 The flow diagram of the method for fragment sequence；

Fig. 6 A is the schematic diagram of the relationship weight in pedestrian image between attribute；

Fig. 7 is from attention model network structure；

Fig. 8 is the flow diagram of pedestrian's attribute recognition approach in another embodiment；

Fig. 9 is the network structure of pedestrian's attribute recognition approach in Fig. 8；

Figure 10 is the structural block diagram of pedestrian's property recognition means in one embodiment；

Figure 11 is the structural block diagram of characteristic fragment sequence generating module in Figure 10；

Figure 12 is the structural block diagram of the relationship weight extraction module in Figure 10 between characteristic fragment；

Figure 13 is the structural block diagram of pedestrian's property recognition means in another embodiment；

Figure 14 is the structural block diagram of computer equipment in one embodiment.

Specific embodiment

It is with reference to the accompanying drawings and embodiments, right in order to which the objects, technical solutions and advantages of the application are more clearly understood The application is further elaborated.It should be appreciated that specific embodiment described herein is only used to explain the application, and It is not used in restriction the application.

Fig. 1 is the applied environment figure of pedestrian's attribute recognition approach in one embodiment.Referring to Fig.1, pedestrian's Attribute Recognition Method is applied to pedestrian's Attribute Recognition system.Pedestrian's Attribute Recognition system includes terminal 110 and server 120.110 He of terminal Server 120 passes through network connection.The number of terminal 110 can be one or more, and terminal 110, which specifically can be, has camera shooting Monitoring camera, terminal console or the mobile terminal of head, mobile terminal specifically can be mobile phone, tablet computer, laptop At least one of Deng.Server 120 can with the server cluster of independent server either multiple servers composition come It realizes.Pedestrian's attribute recognition approach includes: to carry out down-sampling to pedestrian image to generate characteristic fragment sequence, by characteristic fragment sequence Column input first from attention model extract characteristic fragment between relationship weight, according to relationship weight to characteristic fragment sequence into Row updates, and obtains updated characteristic fragment sequence for the first time.It will first time updated characteristic fragment sequence inputting feature piece Section and attribute attention model extract the relationship weight between characteristic fragment and attribute, after being updated according to relationship weight to first time Characteristic fragment sequence be updated to obtain second of updated characteristic fragment sequence.It will second of updated characteristic fragment Sequence inputting second extracts the relationship weight between attribute from attention model, according to relationship weight to second of updated spy Sign fragment sequence is updated to obtain the updated characteristic fragment sequence of third time.According to the updated characteristic fragment sequence of third time Column carry out pedestrian's Attribute Recognition, obtain pedestrian's Attribute Recognition result.

As shown in Fig. 2, in one embodiment, providing a kind of pedestrian's attribute recognition approach.The present embodiment is mainly with this Method is applied to the server 120 in above-mentioned Fig. 1 to illustrate.Referring to Fig. 2, pedestrian's attribute recognition approach specifically include as Lower step: step S210 to S290.The network structure that pedestrian's attribute recognition approach is based on, as shown in figure 3, pedestrian is schemed As input CNN (convolutional neural networks, Convolutional Neural Networks) progress feature extraction, then will be extracted Characteristic pattern sequentially input to from attention network model, segment-attribute attention network model, from attention network model into Row processing, finally predicts pedestrian's attribute.

S210 carries out down-sampling to pedestrian image and generates characteristic fragment sequence.

Shooting video or image are carried out to the scene with pedestrian by the terminal with camera, obtain the view of the scene Frequency or image.For example, according to monitoring camera in the shop in market, it can be to the shop by the monitoring camera Shooting video or image in visual range.Monitoring camera carries out the server of captured video or image transmitting to rear end Post-processing, if it is video that server institute is received, server extracts single-frame images from captured video out, from Pedestrian image is extracted in single-frame images again.If received server institute is directly exactly image, extracted from single-frame images Pedestrian image out.

Pedestrian image herein refers to the region in image including single pedestrian, under normal circumstances, monitoring camera institute May occur more than one pedestrian in the image of shooting simultaneously, so it includes single for can extracting from the image multiple The region of pedestrian.Server carries out down-sampling to each pedestrian image extracted from single-frame images and generates characteristic fragment sequence Column.

Pedestrian image is input to convolutional neural networks (Convolutional Neural Networks, CNN) to carry out down Sampling, and then directly obtain the characteristic fragment sequence of pedestrian image.Convolutional neural networks, which are a kind of, to be included convolutional calculation and has The feedforward neural network (Feedforward Neural Networks) of depth structure, is deep learning (deep learning) One of representative algorithm.Wherein, characteristic fragment sequence be a string include characteristic fragment sequences, such as to a pedestrian image into Row down-sampling obtains the characteristic fragment sequence T={ T that length is n₁,T₂…T_n}.Wherein, down-sampling is exactly the process of downscaled images. Down-sampling principle is the image for a width having a size of M × N, carries out s times of down-sampling to it to get (M/s) × (N/s) size is arrived Image in different resolution, certain s should be the common divisor of M and N.For the image of matrix form, it is equivalent to an original graph Image in picture s × s window becomes a pixel, and the value of this pixel is exactly the mean value of all pixels in window.

Conventional method is when extracting characteristic fragment sequence, firstly, passing through artificial cutting to pedestrian image is image sheet Section, then feature is extracted from each picture fragment respectively, obtain characteristic fragment sequence.The regional choice of cutting with finally mentioned Taking-up is characterized in two independent processes, this will lead to selected cutting region and is not necessarily to feature extraction and most The whole best region of Attribute Recognition.

Characteristic fragment sequence inputting first is extracted the relationship weight between characteristic fragment, root from attention model by S230 Characteristic fragment sequence is updated according to relationship weight, obtains updated characteristic fragment sequence for the first time.

It should be noted that attention mechanism is with reference to human visual attention's mechanism.The vision of the mankind is in perception thing Tail generally from the beginning will not be meticulously seen when object to a scene, and often observation is gone to pay attention in the scene according to demand Specific a part, and when it is found that oneself thinking that the things of observation is often repetitively appearing in certain part in similar scene, People can remember and learn to the scene, so that attention is directly put into the portion when occurring similar scene again in the future On point.It is one of attention mechanism special circumstances from attention mechanism (Self-attention Mechanism), it can Directly calculate dependence to ignore the distance between word, can learn the internal structure of a sentence, realize it is relatively simple and It can be with parallel computation.It is exactly from attention model using the model from attention mechanism.From attention model be for sequence and Speech, be one-dimensional attention model.

Characteristic fragment sequence inputting first relationship between characteristic fragment is extracted from attention model in the server to weigh Weight.Then, the relationship weight between extracted characteristic fragment is acted on into each characteristic fragment, i.e., by relationship weight and feature Fragment sequence carries out product calculation to be updated to characteristic fragment sequence, obtains updated characteristic fragment sequence for the first time. It is of course also possible to which relationship weight and characteristic fragment sequence are carried out other modes operation to be updated to characteristic fragment sequence.

Updated characteristic fragment sequence for the first time not only includes the element in characteristic fragment sequence, further includes each element Between relationship weight, relationship weight is similarity.Specifically, each element (characteristic fragment) in characteristic fragment sequence passes through It crosses from the output after attention model as a result, containing the phase of the element (characteristic fragment) and every other element (characteristic fragment) Like degree, i.e. attention.Therefore, by characteristic fragment sequence inputting first from attention model, characteristic fragment sequence has just been extracted In each element (characteristic fragment) and every other element (characteristic fragment) relationship weight.

S250 updated characteristic fragment sequence inputting characteristic fragment and attribute attention model will extract feature for the first time Relationship weight between segment and attribute is updated to obtain according to relationship weight to updated characteristic fragment sequence for the first time Second of updated characteristic fragment sequence.

Characteristic fragment and attribute attention model are typical attention model, similar with the structure from attention model. Will updated characteristic fragment sequence inputting characteristic fragment and attribute attention model for the first time, extract characteristic fragment and attribute it Between relationship weight, relationship weight is similarity.As shown in Figure 2 A, it is shown that the relationship power between characteristic fragment and attribute Heavy, the lines the thick in figure, indicates that relationship weight between the two is bigger i.e. more related.Specifically, server updates first time Characteristic fragment sequence afterwards by characteristic fragment and attribute attention model processing after, export result in contain characteristic fragment with The similarity of attribute, i.e. attention, that is, relationship weight.By the relationship weight and characteristic fragment between characteristic fragment and attribute Sequence carries out product calculation to be updated to characteristic fragment sequence, obtains second of updated characteristic fragment sequence.Certainly, Relationship weight and characteristic fragment sequence can also be subjected to other modes operation to be updated to characteristic fragment sequence.

Second updated characteristic fragment sequence inputting second is extracted from attention model the pass between attribute by S270 It is weight, second of updated characteristic fragment sequence is updated according to relationship weight to obtain the updated feature of third time Fragment sequence.

Server also wraps second of updated characteristic fragment sequence inputting second from attention model, output result The similarity of attribute Yu every other attribute, i.e. attention are contained, to extract the relationship weight between attribute.Then, it will be mentioned Relationship weight between the attribute taken acts on each characteristic fragment, i.e., according to relationship weight to second of updated feature piece Duan Xulie is updated, and obtains the updated characteristic fragment sequence of third time.Specifically, can be by by the relationship between attribute Weight carries out product calculation with characteristic fragment sequence to be updated to characteristic fragment sequence, obtains the updated feature of third time Fragment sequence.It is of course also possible to which relationship weight and characteristic fragment sequence are carried out other modes operation to characteristic fragment sequence It is updated.

Updated characteristic fragment sequence for the third time not only includes the element (characteristic fragment) in characteristic fragment sequence, also Including the relationship weight between relationship weight, characteristic fragment and the attribute between characteristic fragment, the relationship weight between attribute.

S290 carries out pedestrian's Attribute Recognition according to the updated characteristic fragment sequence of third time, obtains pedestrian's Attribute Recognition As a result.

Obtain relationship weight between relationship weight, characteristic fragment and the attribute between features described above segment and attribute it Between relationship weight, and after being updated respectively to characteristic fragment sequence, obtained the updated characteristic fragment sequence of third time. Server directly carries out pedestrian's Attribute Recognition according to the updated characteristic fragment sequence of third time, specifically, updating to third time Characteristic fragment sequence afterwards is classified using classifier, obtains pedestrian's Attribute Recognition result.

In the embodiment of the present application, pedestrian image is input to convolutional neural networks and carries out down-sampling, and then directly obtains row The characteristic fragment sequence of people's image.It does not need that pedestrian image is carried out to be cut into image segments, then respectively to each image sheet Duan Jinhang feature extraction obtains characteristic fragment sequence.When avoiding to carry out cutting to pedestrian image in this way, often by artificial root Operation or random operation and bring randomness are carried out according to experience, and artificial when carrying out cutting, cuts subregional selection and gained To characteristic fragment sequence be two independent processes, cause selected cutting region be not necessarily to feature extraction and The problem of final Attribute Recognition best region.

And the relationship between relationship weight, characteristic fragment and attribute between characteristic fragment is carried out to characteristic fragment sequence and is weighed The relationship weight extraction of weight, relationship weight this tri-layer between attribute, and extracted relationship weight is successively acted on into spy It levies on fragment sequence.Therefore, characteristic fragment sequence is corrected three times by the relationship weight of tri-layer, so that final gained The information content of the characteristic fragment sequence arrived is more and more, also more and more accurate, so that finally carrying out pedestrian's Attribute Recognition institute The result obtained is more accurate, so that the video resolution for reducing monitor camera shooting in practice is low, different location camera is clapped The problems such as pedestrian's angle taken the photograph is different, illumination condition is different and pedestrian is blocked are to interference brought by pedestrian's recognition result.

In one embodiment, as shown in figure 4, step 210, carries out down-sampling to pedestrian image and generate characteristic fragment sequence Column, comprising:

Step 211, pedestrian image is obtained from image to be processed.

Shooting video or image are carried out to the scene with pedestrian by the terminal with camera, obtain the view of the scene Frequency or image, image to be processed are the single-frame images in the video or image of the scene.Server is from the end with camera The received video of end institute or image extract single-frame images, and pedestrian image is then obtained from single-frame images.Pedestrian herein Image refers to that the region in image including single pedestrian under normal circumstances may be simultaneously in image captured by camera There is more than one pedestrian, so multiple regions including single pedestrian can be extracted from the image.

Step 213, pedestrian image is inputted into the first convolutional neural networks and extracts characteristic pattern.

Common neural network becomes as the depth increase of network can have the problems such as gradient explosion disappears with gradient It is increasingly difficult to training.The it is proposed of depth residual error network is so that training depth network becomes to be more easier.Depth residual error network Principle is to increase the output of front layer after the linear block of a certain layer, before nonlinear block, and (this operation is claimed again For skip connection), so that the non-linear layer of this layer is a^ (l+1)=g (z^l+a^ (l-n)), such n block quilt A referred to as residual block.

First convolutional neural networks can be 50 layer depth residual error networks (ResNet50) of standard, certainly, the first convolution Neural network can also be other depth residual error networks, such as ResNet34, ResNet50 and ResNet101 etc..Specifically, As shown in figure 5, the first part (Conv1-Con4_2) that pedestrian image is input to ResNet50, which is carried out down-sampling, obtains feature Figure.For example, first using first part (Conv1-Con4_2) extraction of ResNet50 defeated the pedestrian image of 224 × 112 pixels It is out the characteristic pattern F1 of 14 × 7 pixels.Herein, characteristic pattern F1 is characteristic pattern.

Step 215, characteristic pattern input space attention model extraction space transforms are tried hard to.

After carrying out down-sampling to pedestrian image by depth residual error network and having obtained characteristic pattern, characteristic pattern F1 is inputted The space transforms that spatial attention model generates same pixel size try hard to M.Wherein, spatial attention model is a two dimension note Meaning power model, can give the region to merit attention on image high weight, for example, space transforms try hard to the weighted value in M in 0-1 Between, try hard to that high weight can be configured by the region to merit attention in M in space transforms.Under normal circumstances, characteristic pattern F1 passes through After spatial attention model, high weight can be configured by foreground area, that is, portrait area in characteristic pattern F1, thus prominent row Foreground area, that is, portrait area in people's image, to effectively reduce the interference of contextual factor.Pass through spatial attention in Fig. 5 The white area of the characteristic pattern exported after model indicates network area of interest.

Step 217, characteristic pattern and space transforms are tried hard to be masked operation, generates background and filters out characteristic pattern.

After the space transforms for having obtained characteristic pattern F1 and same pixel size by above-mentioned processing try hard to M, to feature Figure F1 and space transforms try hard to M and are masked operation, generate background and filter out characteristic pattern.Specific calculating process is, by characteristic pattern F1 and space transforms try hard to M and are masked operation and obtain background filter out characteristic pattern F_masked=F1 ⊙ M, mask operation here are real Matter is the dot product operation of element-wise.Background filters out characteristic pattern F_maskedThe interference of contextual factor can effectively be reduced.

Step 219, background is filtered out into characteristic pattern and inputs the second convolutional neural networks generation characteristic fragment sequence.

Second convolutional neural networks can be 50 layer depth residual error networks (ResNet50) of standard, certainly, the first convolution Neural network can also be other depth residual error networks, such as ResNet34, ResNet50 and ResNet101 etc..Because special The size of sign figure F1 is 14 × 7 pixels, so background to be filtered out to the second part (Conv4_3- of characteristic pattern input ResNet50 Con5_3 characteristic fragment sequence S, the S ∈ R that length is 14) is obtained^14×1×512.Each characteristic fragment S of S_{N, n=1,2 ... 14}It is corresponding In a certain piece of region on original image from top to bottom.So far, characteristic fragment sequence, this feature piece have just been obtained by image to be processed Focal point is placed in the portrait area of prospect by Duan Xulie.

In the embodiment of the present application, pedestrian image is inputted into the first convolutional neural networks and carries out down-sampling extraction characteristic pattern, so Characteristic pattern input space attention model extraction space transforms are tried hard to again afterwards, and characteristic pattern and space transforms are tried hard to cover Modulo operation generates background and filters out characteristic pattern.Obtained background filters out feature after spatial attention model and mask operation Figure, can protrude the foreground area i.e. portrait area in pedestrian image, so that the characteristic pattern of extraction be allowed adaptively to pay close attention to effectively Portrait area, reduce the interference of contextual factor.In turn, it improves and background is finally filtered out into characteristic pattern the second convolutional Neural of input Network generates the robustness and validity of characteristic fragment sequence, reduces the interference of contextual factor.

In one embodiment, the first convolutional neural networks and the second convolutional neural networks are depth residual error network ResNet101 or depth residual error network ResNet50.

In the embodiment of the present application, ResNet101 relative to ResNet50 with depth (number of plies) increase because solving volume The degenerate problem of product neural network, performance are constantly promoted.

In one embodiment, as shown in fig. 6, step 230, characteristic fragment sequence inputting first is mentioned from attention model The relationship weight between characteristic fragment is taken, characteristic fragment sequence is updated according to relationship weight, is obtained after updating for the first time Characteristic fragment sequence, comprising:

Step 232, any one subsequence in characteristic fragment sequence is obtained.

Step 234, three input sequences are calculated from three list entries vectors of attention model using subsequence as first The similarity of the two of them list entries vector of column vector.

Step 236, scaling is carried out to similarity, normalized obtains the attention weight of subsequence.

Step 238, the attention weight of each of characteristic fragment sequence subsequence is multiplied with characteristic fragment sequence Obtain updated characteristic fragment sequence for the first time.

It is above-mentioned obtain characteristic fragment sequence S after, obtain characteristic fragment sequence S in any one subsequence.From note The specific structure of meaning power model is as shown in fig. 7, K, Q, V respectively represent three list entries vectors from attention model.It will Any one subsequence should be from attention model from three list entries vectors input of attention model as first.For example, The characteristic fragment sequence S={ S for being n for a length₁,S₂…S_n, Q=K=V=S is enabled, using K, Q, V as three input sequences Column vector input should be from attention model.Q first and K carries out inner product similarity calculation Q × K^T, it is by zoom factor later sqt(d_k) module carry out scaling, here sqt indicate extraction of square root, d_kFor the dimension of vector K.Then pass through SoftMax letter Number is normalized to obtain sequence attention weight, is finally multiplied sequence attention weight with sequence V to obtain final defeated Z out, the process are represented by Z=SoftMax (Q × K^T/sqt(d_k)V).The characteristic fragment sequence S=for being n for a length {S₁,S₂…S_n, Q=K=V=S is enabled, then each sequential element S_iBy the output Z from after paying attention to power module_iContain sequence Element S_iWith the similarity of every other sequential element, i.e. attention because calculate attention element be all from sequence itself, Therefore it is known as from attention.

Then, the relationship weight between extracted characteristic fragment is acted on into each characteristic fragment, i.e., by relationship weight Product calculation is carried out to be updated to characteristic fragment sequence with characteristic fragment sequence, obtains updated characteristic fragment for the first time Sequence.It is of course also possible to which relationship weight and characteristic fragment sequence are carried out other modes operation to carry out to characteristic fragment sequence It updates.

In the embodiment of the present application, any one subsequence in characteristic fragment sequence is input to and is mentioned from attention model Take the subsequence and the direct weight relationship of every other subsequence.Again successively by each of characteristic fragment sequence subsequence It is input to and extracts the subsequence and the direct weight relationship of every other subsequence from attention model.In this way, just obtaining Weight relationship in characteristic fragment sequence between each subsequence.Relationship weight between extracted characteristic fragment is acted on Relationship weight and characteristic fragment sequence are carried out product calculation to be updated to characteristic fragment sequence by each characteristic fragment, Obtain updated characteristic fragment sequence for the first time.

In one embodiment, step 250, second of updated characteristic fragment sequence inputting second is incited somebody to action from attention mould Type extracts the relationship weight between attribute, is updated to obtain to second of updated characteristic fragment sequence according to relationship weight Updated characteristic fragment sequence for the third time, comprising:

Obtain any one subsequence in second of updated characteristic fragment sequence；

Using subsequence as second from three list entries vectors of attention model, three list entries vectors are calculated The similarity of two of them list entries vector；

Scaling is carried out to similarity, normalized obtains the attention weight of subsequence；

The attention weight of each of second of updated characteristic fragment sequence subsequence is updated with second Characteristic fragment sequence afterwards is multiplied to obtain the updated characteristic fragment sequence of third time.

Specifically, handled after obtaining first time updated characteristic fragment sequence by first from attention model, it will Updated characteristic fragment sequence inputting characteristic fragment and attribute attention model obtain second of updated feature for the first time Fragment sequence.Then, any one subsequence in second of updated characteristic fragment sequence is input to second from attention Power model carries out the calculating process such as the first attention model, repeats no more here, obtains second of updated feature piece The attention weight of each of Duan Xulie subsequence.As shown in Figure 6A, it is shown that the relationship weight between attribute, figure middle line The item the thick, indicates that relationship weight between the two is bigger i.e. more related.Finally, will second of updated characteristic fragment sequence Each of the attention weight of subsequence is multiplied and has just been obtained for the third time more with second updated characteristic fragment sequence Characteristic fragment sequence after new.

In the embodiment of the present application, any one subsequence in second updated characteristic fragment sequence is input to the Two extract the subsequence and the direct weight relationship of every other subsequence from attention model.It will successively update for second again Each of characteristic fragment sequence afterwards subsequence, which is input to, extracts the subsequence and every other son from attention model The direct weight relationship of sequence.In this way, just having obtained in second of updated characteristic fragment sequence between each subsequence Weight relationship.Relationship weight between extracted second of updated characteristic fragment is acted on into each characteristic fragment, i.e., Relationship weight and second of updated characteristic fragment sequence are subjected to product calculation to second of updated characteristic fragment Sequence is updated, and obtains the updated characteristic fragment sequence of third time.The updated characteristic fragment sequence of the third time is just wrapped Relationship weight between relationship weight, characteristic fragment and the attribute between characteristic fragment, the relationship weight between attribute are contained.It adopts Pedestrian's Attribute Recognition is carried out with the characteristic fragment sequence with three layers of relationship weight, it is clear that the information that characteristic fragment sequence is included Measure it is more relatively comprehensively, finally improve the accuracy of pedestrian's Attribute Recognition result.

In one embodiment, the similarity of the two of them list entries vector of three list entries vectors is calculated, is wrapped It includes:

The two of them list entries vector of three list entries vectors is done into inner product operation, obtain three list entries to The similarity of the two of them list entries vector of amount；Or three list entries vectors are calculated by cosine similarity mode The similarity of two of them list entries vector.

In the embodiment of the present application, from the specific structure of attention model as shown in fig. 7, K, Q, V respectively represent this from attention Three list entries vectors of power model.Using any one subsequence as first from three list entries of attention model to Amount input should be from attention model.For example, the characteristic fragment sequence S={ S for being n for a length₁,S₂…S_n, enable Q=K= V=S, should be from attention model using K, Q, V as three list entries vector inputs.Q first and K carries out inner product similarity calculation Q×K^TThe similarity of Q and K can also be calculated by other means herein, for example, can count by cosine similarity mode Calculate the similarity of Q and K.Cosine similarity is the similarity that them are assessed by calculating the included angle cosine value of two vectors.It is remaining Vector according to coordinate value, is plotted in vector space by string similarity, such as the most common two-dimensional space.It is, of course, also possible to pass through Kernel function indicates the similarities of two vectors of Q and K.Specifically, it is higher to select kernel function appropriate that can be mapped to Q and K Dimension space, and implicit similarity expression is carried out, it is better able to indicate sample using the advantages of kernel function, while being capable of table Show the feature space that can not be explicitly defined with mathematical function.It can also include the other methods for calculating the similarity of two vectors, It is not limited herein.

In one embodiment, as shown in figure 8, additionally providing a kind of pedestrian's attribute recognition approach, include the following steps: to walk Rapid S802 to S814.The network structure that pedestrian's attribute recognition approach is based on, as shown in figure 9, pedestrian image is inputted CNN1 carry out feature extraction, then by extracted characteristic pattern input space attention network model, input CNN2 again, then successively Be input to LSTM1, from attention network model, segment-attribute attention network model, LSTM2, from attention network model into Row processing, finally predicts pedestrian's attribute.Certainly, CNN1 and CNN2 here also could alternatively be ResNet101 or other Depth residual error network.

S802 carries out down-sampling to pedestrian image and generates characteristic fragment sequence.

Specifically, firstly, pedestrian image is obtained from image to be processed, by pedestrian image the first convolutional neural networks of input It carries out down-sampling and extracts characteristic pattern.The first part (Conv1-Con4_2) that pedestrian image is input to ResNet50 adopt Sample obtains characteristic pattern.Wherein, image to be processed is the video for passing through the scene captured by camera or the single frames in image Image, pedestrian image refer to that the region in image including single pedestrian under normal circumstances can in image captured by camera More than one pedestrian can occur simultaneously, so multiple regions including single pedestrian can be extracted from the image.

Secondly, characteristic pattern input space attention model extraction space transforms are tried hard to.Wherein, spatial attention model is One two-dimentional attention model, can give the region to merit attention on image high weight.Characteristic pattern F1 passes through spatial attention mould After type, high weight can be configured by foreground area, that is, portrait area in characteristic pattern F1, thus before in prominent pedestrian image Scene area, that is, portrait area, to effectively reduce the interference of contextual factor.

Again, it characteristic pattern F1 and space transforms are tried hard to M is masked operation and obtains background filter out characteristic pattern F_masked=F1 ⊙ M, mask operation here are substantially the dot product operations of element-wise.

Characteristic fragment sequence is generated finally, background is filtered out characteristic pattern and inputs the second convolutional neural networks.Specifically, will back The second part (Conv4_3-Con5_3) that scape filters out characteristic pattern input ResNet50 obtains characteristic fragment sequence S.

Characteristic fragment sequence inputting shot and long term memory models are encoded, obtain characteristic fragment sequential coding knot by S804 Fruit.

LSTM network model (shot and long term memory models, Long Short-Term Memory), characteristic fragment sequence S is defeated Enter shot and long term memory models to be encoded, obtain characteristic fragment sequential coding as a result, characteristic fragment sequential coding result be also and The same sequence of characteristic fragment sequence length.It can be extracted by the obtained characteristic fragment sequential coding result of LSTM1 network The ordinal characteristics of characteristic fragment sequence.LSTM network is recurrent neural network (RNN:Recurrent Neutral Network) One kind, be a kind of special neural network self called according to time series or character string.The ordinal characteristics that LSTM is extracted It is a kind of reinforcement study to original feature, fragment sequence is subjected to feature integration (aggregation), therefore, the spy of study Sign has more expressivity, so that being more convenient for classifying when finally carrying out attributive classification.

Characteristic fragment sequential coding result is inputted first from the relationship between attention model extraction characteristic fragment by S806 Weight is updated characteristic fragment sequential coding result according to relationship weight, obtains updated characteristic fragment sequence for the first time Column.

After the above-mentioned characteristic fragment sequential coding result for obtaining characteristic fragment sequence S, obtains characteristic fragment sequence and compile Any one subsequence in code result.From the specific structure of attention model as shown in fig. 7, K, Q, V respectively represent this from note Three list entries vectors of meaning power model.Using any one subsequence as first from three list entries of attention model Vector input should be from attention model.For example, the characteristic fragment sequential coding result T for being n for a length, enables Q=K=V =T, should be from attention model using K, Q, V as three list entries vector inputs.Q first and K carries out inner product similarity calculation Q ×K^T, it is later sqt (d by zoom factor_k) module carry out scaling, here sqt indicate extraction of square root, d_kFor vector K's Dimension.Then it is normalized to obtain sequence attention weight by SoftMax function, finally by sequence attention weight and sequence Column V is multiplied to obtain final output Z, which is represented by Z=SoftMax (Q × K^T/sqt(d_k)V).For a length For the characteristic fragment sequential coding result T={ T of n₁,T₂…T_n, Q=K=V=T is enabled, then each sequential element T_iBy infusing certainly Output Z after power module of anticipating_iContain sequential element T_iWith the similarity of every other sequential element, i.e. attention because calculate The element of attention is all from sequence itself, therefore is known as from attention.

Then, the relationship weight between extracted characteristic fragment is acted on into each characteristic fragment sequential coding as a result, It will be related to that weight and characteristic fragment sequential coding result carry out product calculation to carry out more to characteristic fragment sequential coding result Newly, updated characteristic fragment sequence for the first time is obtained.It is of course also possible to by relationship weight and characteristic fragment sequential coding result Other modes operation is carried out to be updated to characteristic fragment sequential coding result.

S808 updated characteristic fragment sequence inputting characteristic fragment and attribute attention model will extract feature for the first time Relationship weight between segment and attribute is updated to obtain according to relationship weight to updated characteristic fragment sequence for the first time Second of updated characteristic fragment sequence.

Characteristic fragment and attribute attention model are typical attention model, similar with the structure from attention model. Will updated characteristic fragment sequence inputting characteristic fragment and attribute attention model for the first time, extract characteristic fragment and attribute it Between relationship weight, relationship weight is similarity.Specifically, updated characteristic fragment sequence passes through characteristic fragment for the first time After the processing of attribute attention model, exports in result and contain the similarity of characteristic fragment and attribute, i.e. attention, that is, Relationship weight.Relationship weight between characteristic fragment and attribute is subjected to product calculation to characteristic fragment with characteristic fragment sequence Sequence is updated, and obtains second of updated characteristic fragment sequence.It is of course also possible to by relationship weight and characteristic fragment sequence Column carry out other modes operation to be updated to characteristic fragment sequence.

Second of updated characteristic fragment sequence inputting shot and long term memory models is decoded, obtains feature by S810 Fragment sequence decoding result.

Second of updated characteristic fragment sequence inputting shot and long term memory models LSTM2 is encoded, feature is obtained Fragment sequence decoding result, characteristic fragment sequential decoding result are also the sequence as characteristic fragment sequence length.Pass through The obtained characteristic fragment sequential decoding result of LSTM2 network can extract the ordinal characteristics of characteristic fragment sequence.

Characteristic fragment sequential decoding result input second is extracted the relationship between attribute from attention model and weighed by S812 Weight is updated characteristic fragment sequential decoding result according to relationship weight to obtain the updated characteristic fragment sequence of third time.

Specifically, any one subsequence in characteristic fragment sequential decoding result is input to second from attention model The calculating process such as the first attention model is carried out, repeats no more, obtains every in characteristic fragment sequential decoding result here The attention weight of one subsequence.Finally, the attention of each of characteristic fragment sequential decoding result subsequence is weighed Weight is multiplied with characteristic fragment sequential decoding result has just obtained the updated characteristic fragment sequence of third time.

S814 carries out pedestrian's Attribute Recognition according to the updated characteristic fragment sequence of third time, obtains pedestrian's Attribute Recognition As a result.

By the updated characteristic fragment sequence of third time by classifier carry out two classification predictions obtain attribute forecast as a result, Using attribute forecast result as pedestrian's Attribute Recognition result.

In the embodiment of the present application, pedestrian image input CNN1 is subjected to feature extraction, then extracted characteristic pattern is inputted Spatial attention network model inputs CNN2 again, then sequentially inputs to LSTM1, first from attention network model, segment- Attribute attention network model, LSTM2, second are handled from attention network model, finally predict pedestrian's attribute.Its In, spatial attention network model can configure high weight for foreground area, that is, portrait area in characteristic pattern, thus prominent row Foreground area, that is, portrait area in people's image, to effectively reduce the interference of contextual factor.By LSTM network to characteristic Fragment sequence carries out encoding and decoding, can extract the ordinal characteristics of characteristic fragment sequence.There are also pass through first from attention network mould Type can extract relationship weight between characteristic fragment, can extract feature by segment-attribute attention network model Relationship weight, process second between segment and attribute can extract the power of the relationship between attribute from attention network model Weight.

In this way, after the ordinal characteristics and above-mentioned three layers of relationship weight for having extracted characteristic fragment sequence, it is obtained The result of updated characteristic fragment sequence is just more accurate for the third time, and then the pedestrian's attribute predicted is just more accurate. And then it substantially increases public security department and the efficiency or improve in specific region that crime is seized is carried out according to monitor video Pedestrian the accuracy of result is investigated.

In one embodiment, pedestrian's Attribute Recognition is carried out according to the updated characteristic fragment sequence of third time, is gone Humanized recognition result, comprising:

In the embodiment of the present application, classifier can be SVM classifier, it is of course also possible to be other kinds of classifier. SVM (Support Vector Machine, support vector machines) is a kind of by supervised learning (supervised learning) side Formula carries out the generalized linear classifier of binary classification (binary classification) to data.It is generally used in classifier Decision tree, SGD (Stochastic Gradient Descegnt, stochastic gradient descent method), random forest and gradient Boosting does two classification prediction to data set.Specifically, will the updated characteristic fragment sequence of third time by classifier into The classification prediction of row two obtains attribute forecast as a result, for example, predicting pedestrian's attribute is " male/female ", " have knapsack/without knapsack " Deng.To in Fig. 3 or Fig. 9 pedestrian image carry out attribute forecast obtaining pedestrian's Attribute Recognition result are as follows: male, not put a label on, The attributes such as age 40, grid jacket, cloth trousers.

Fig. 8 is the flow diagram of pedestrian's attribute recognition approach in one embodiment.Although should be understood that Fig. 8's Each step in flow chart is successively shown according to the instruction of arrow, but these steps are not necessarily to indicate according to arrow Sequence successively executes.Unless expressly stating otherwise herein, there is no stringent sequences to limit for the execution of these steps, these steps Suddenly it can execute in other order.Moreover, at least part step in Fig. 8 may include multiple sub-steps or multiple ranks Section, these sub-steps or stage are not necessarily to execute completion in synchronization, but can execute at different times, this The execution sequence in a little step perhaps stage be also not necessarily successively carry out but can be with other steps or other steps Sub-step or at least part in stage execute in turn or alternately.

In one embodiment, as shown in Figure 10, a kind of pedestrian's property recognition means 1000 are provided, comprising:

Characteristic fragment sequence generating module 1010 generates characteristic fragment sequence for carrying out down-sampling to pedestrian image；

Relationship weight extraction module 1030 between characteristic fragment, for by characteristic fragment sequence inputting first from attention Relationship weight between model extraction characteristic fragment is updated characteristic fragment sequence according to relationship weight, obtains for the first time Updated characteristic fragment sequence；

Relationship weight extraction module 1050 between characteristic fragment and attribute, being used for will first time updated characteristic fragment Sequence inputting characteristic fragment and attribute attention model extract the relationship weight between characteristic fragment and attribute, according to relationship weight Updated characteristic fragment sequence for the first time is updated to obtain second of updated characteristic fragment sequence；

Relationship weight extraction module 1070 between attribute, for will second updated characteristic fragment sequence inputting the Two extract the relationship weight between attribute from attention model, according to relationship weight to second of updated characteristic fragment sequence It is updated to obtain the updated characteristic fragment sequence of third time；

Pedestrian's Attribute Recognition module 1090, for carrying out the knowledge of pedestrian's attribute according to the updated characteristic fragment sequence of third time Not, pedestrian's Attribute Recognition result is obtained.

In one embodiment, as shown in figure 11, characteristic fragment sequence generating module 1010, comprising:

Pedestrian image acquiring unit 1011, for obtaining pedestrian image from image to be processed；

Characteristic pattern extraction unit 1013 extracts characteristic pattern for pedestrian image to be inputted the first convolutional neural networks；

Space transforms try hard to extraction unit 1015, for characteristic pattern input space attention model to be extracted spatial attention Figure；

Background filters out characteristic pattern extraction unit 1017, is masked operation for trying hard to characteristic pattern and space transforms, raw Characteristic pattern is filtered out at background；

Characteristic fragment sequence generating unit 1019 inputs the generation of the second convolutional neural networks for background to be filtered out characteristic pattern Characteristic fragment sequence.

In one embodiment, as shown in figure 12, the relationship weight extraction module 1030 between characteristic fragment, comprising:

Subsequence acquiring unit 1032, for obtaining any one subsequence in characteristic fragment sequence；

Similarity calculated 1034, for using subsequence as first from three list entries of attention model to Amount, calculates the similarity of the two of them list entries vector of three list entries vectors；

Attention weight calculation unit 1036, for carrying out scaling to similarity, normalized obtains subsequence Attention weight；

Characteristic fragment sequence updating unit 1038, for weighing the attention of each of characteristic fragment sequence subsequence Weight is multiplied to obtain with characteristic fragment sequence updated characteristic fragment sequence for the first time.

In one embodiment, the relationship weight extraction module 1070 between attribute, after being also used to obtain second of update Characteristic fragment sequence in any one subsequence；Using subsequence as second from three list entries of attention model to Amount, calculates the similarity of the two of them list entries vector of three list entries vectors；Scaling is carried out to similarity, is returned One change handles to obtain the attention weight of subsequence；By each of second of updated characteristic fragment sequence subsequence Attention weight is multiplied to obtain the updated characteristic fragment sequence of third time with second of updated characteristic fragment sequence.

In one embodiment, similarity calculated 1034 are also used to the two of them of three list entries vectors List entries vector does inner product operation, obtains the similarity of the two of them list entries vector of three list entries vectors；Or The similarity of the two of them list entries vector of three list entries vectors is calculated by cosine similarity mode.

In one embodiment, as shown in figure 13, a kind of pedestrian's property recognition means 1300 are additionally provided, comprising:

Characteristic fragment sequence generating module 1302 generates characteristic fragment sequence for carrying out down-sampling to pedestrian image；

Coding module 1304 obtains feature piece for encoding characteristic fragment sequence inputting shot and long term memory models Section sequential coding result；

Relationship weight extraction module 1306 between characteristic fragment, for characteristic fragment sequential coding result to be inputted first The relationship weight between characteristic fragment is extracted from attention model, characteristic fragment sequence is updated according to relationship weight, is obtained To updated characteristic fragment sequence for the first time；

Relationship weight extraction module 1308 between characteristic fragment and attribute, being used for will first time updated characteristic fragment Sequence inputting characteristic fragment and attribute attention model extract the relationship weight between characteristic fragment and attribute, according to relationship weight Updated characteristic fragment sequence for the first time is updated to obtain second of updated characteristic fragment sequence；

Decoder module 1310, for solving second of updated characteristic fragment sequence inputting shot and long term memory models Code, obtains characteristic fragment sequential decoding result；

Relationship weight extraction module 1312 between attribute, for infusing characteristic fragment sequential decoding result input second certainly Relationship weight between meaning power model extraction attribute carries out more second of updated characteristic fragment sequence according to relationship weight Newly obtain the updated characteristic fragment sequence of third time；

Pedestrian's Attribute Recognition module 1314, for carrying out the knowledge of pedestrian's attribute according to the updated characteristic fragment sequence of third time Not, pedestrian's Attribute Recognition result is obtained.

In one embodiment, pedestrian's Attribute Recognition module is also used to the updated characteristic fragment sequence warp of third time It crosses two classification prediction of classifier progress and obtains attribute forecast as a result, using attribute forecast result as pedestrian's Attribute Recognition result.

Figure 14 shows the internal structure chart of computer equipment in one embodiment.The computer equipment specifically can be figure Server 120 in 1.As shown in figure 14, which includes the processor and memory connected by system bus.Wherein, The processor supports the operation of entire electronic equipment for providing calculating and control ability.Memory may include non-volatile deposits Storage media and built-in storage.Non-volatile memory medium is stored with operating system and computer program.The computer program can quilt Performed by processor, for realizing a kind of pedestrian's attribute recognition approach provided by following each embodiment.Built-in storage is Operating system computer program in non-volatile memory medium provides the running environment of cache.Server can be with independently The server cluster of server either multiple servers composition realize.

It will be understood by those skilled in the art that structure shown in Figure 14, only part relevant to application scheme The block diagram of structure, does not constitute the restriction for the computer equipment being applied thereon to application scheme, and specific computer is set Standby may include perhaps combining certain components or with different component layouts than more or fewer components as shown in the figure.

In one embodiment, pedestrian's property recognition means provided by the present application can be implemented as a kind of computer program Form, computer program can be run in computer equipment as shown in figure 14.Group can be stored in the memory of computer equipment At each program module of pedestrian's property recognition means, for example, characteristic fragment sequence generating module 1010 shown in Fig. 10, spy It levies the relationship weight extraction module 1050 between relationship weight extraction module 1030, characteristic fragment and attribute between segment, belong to Relationship weight extraction module 1070 and pedestrian's Attribute Recognition module 1090 between property.The computer journey that each program module is constituted Sequence makes processor execute the step in pedestrian's attribute recognition approach of each embodiment of the application described in this specification.

For example, computer equipment shown in Figure 14 can pass through the feature in pedestrian's property recognition means as shown in Figure 10 Fragment sequence generation module 1010 executes step S210.Computer equipment can pass through the relationship weight extraction mould between characteristic fragment Block 1030 executes step S230.Computer equipment can be held by the relationship weight extraction module 1050 between characteristic fragment and attribute Row step S250.Computer equipment can execute step S270 by the relationship weight extraction module 1070 between attribute.Computer Equipment can execute step S290 by pedestrian's Attribute Recognition module 1090.

In one embodiment, a kind of computer equipment, including memory and processor are provided, memory is stored with meter Calculation machine program, when computer program is executed by processor, so that the step of processor executes above-mentioned pedestrian's attribute recognition approach.This The step of locating pedestrian's attribute recognition approach can be the step in pedestrian's attribute recognition approach of above-mentioned each embodiment.

In one embodiment, a kind of computer readable storage medium is provided, computer program, computer journey are stored with When sequence is executed by processor, so that the step of processor executes above-mentioned pedestrian's attribute recognition approach.Pedestrian Attribute Recognition side herein The step of method, can be the step in pedestrian's attribute recognition approach of above-mentioned each embodiment.

Those of ordinary skill in the art will appreciate that realizing all or part of the process in above-described embodiment method, being can be with Instruct relevant hardware to complete by computer program, program can be stored in a non-volatile computer storage can be read In medium, the program is when being executed, it may include such as the process of the embodiment of above-mentioned each method.Wherein, provided herein each To any reference of memory, storage, database or other media used in embodiment, may each comprise it is non-volatile and/ Or volatile memory.Nonvolatile memory may include read-only memory (ROM), programming ROM (PROM), electrically programmable ROM (EPROM), electrically erasable ROM (EEPROM) or flash memory.Volatile memory may include random access memory (RAM) or external cache.By way of illustration and not limitation, RAM is available in many forms, such as static state RAM (SRAM), dynamic ram (DRAM), synchronous dram (SDRAM), double data rate sdram (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronization link (Synchlink) DRAM (SLDRAM), memory bus (Rambus) directly RAM (RDRAM), straight Connect memory bus dynamic ram (DRDRAM) and memory bus dynamic ram (RDRAM) etc..

Each technical characteristic of embodiment described above can be combined arbitrarily, for simplicity of description, not to above-mentioned reality It applies all possible combination of each technical characteristic in example to be all described, as long as however, the combination of these technical characteristics is not deposited In contradiction, all should be considered as described in this specification.

The several embodiments of the application above described embodiment only expresses, the description thereof is more specific and detailed, but simultaneously The limitation to the application the scope of the patents therefore cannot be interpreted as.It should be pointed out that for those of ordinary skill in the art For, without departing from the concept of this application, various modifications and improvements can be made, these belong to the guarantor of the application Protect range.Therefore, the scope of protection shall be subject to the appended claims for the application patent.

Claims

1. a kind of pedestrian's attribute recognition approach, comprising:

The characteristic fragment sequence inputting first is extracted into the relationship weight between characteristic fragment from attention model, according to described Relationship weight is updated the characteristic fragment sequence, obtains updated characteristic fragment sequence for the first time；

The first time updated characteristic fragment sequence inputting characteristic fragment and attribute attention model are extracted into characteristic fragment Relationship weight between attribute is updated the first time updated characteristic fragment sequence according to the relationship weight Obtain second of updated characteristic fragment sequence；

Second of updated characteristic fragment sequence inputting second is extracted the relationship between attribute from attention model to weigh Weight is updated to obtain updated for the third time according to the relationship weight to second of updated characteristic fragment sequence Characteristic fragment sequence；

Pedestrian's Attribute Recognition is carried out according to the updated characteristic fragment sequence of the third time, obtains pedestrian's Attribute Recognition result.

2. the method according to claim 1, wherein described carry out down-sampling generation characteristic fragment to pedestrian image Sequence, comprising:

Pedestrian image is obtained from image to be processed；

The pedestrian image is inputted into the first convolutional neural networks and extracts characteristic pattern；

Characteristic pattern input space attention model extraction space transforms are tried hard to；

The characteristic pattern and the space transforms are tried hard to be masked operation, background is generated and filters out characteristic pattern；

The background is filtered out into characteristic pattern and inputs the second convolutional neural networks generation characteristic fragment sequence.

3. according to the method described in claim 2, it is characterized in that, first convolutional neural networks and second convolution mind It is depth residual error network ResNet101 or depth residual error network ResNet50 through network.

4. the method according to claim 1, wherein described pay attention to the characteristic fragment sequence inputting first certainly Relationship weight between power model extraction characteristic fragment is updated the characteristic fragment sequence according to the relationship weight, Obtain updated characteristic fragment sequence for the first time, comprising:

Obtain any one subsequence in the characteristic fragment sequence；

Using the subsequence as described first from three list entries vectors of attention model, three inputs sequence is calculated The similarity of the two of them list entries vector of column vector；

Scaling is carried out to the similarity, normalized obtains the attention weight of the subsequence；

The attention weight of each of characteristic fragment sequence subsequence is multiplied to obtain with the characteristic fragment sequence Updated characteristic fragment sequence for the first time.

5. the method according to claim 1, wherein described by second of updated characteristic fragment sequence The relationship weight between attribute is extracted in input second from attention model, after being updated according to the relationship weight to described second Characteristic fragment sequence be updated to obtain the updated characteristic fragment sequence of third time, comprising:

Using the subsequence as described second from three list entries vectors of attention model, three inputs sequence is calculated The similarity of the two of them list entries vector of column vector；

By the attention weight of each of second of updated characteristic fragment sequence subsequence and described second Updated characteristic fragment sequence is multiplied to obtain the updated characteristic fragment sequence of third time.

6. method according to claim 4 or 5, which is characterized in that its for calculating three list entries vectors In two list entries vectors similarity, comprising:

The two of them list entries vector of three list entries vectors is done into inner product operation, obtains three inputs sequence The similarity of the two of them list entries vector of column vector；Or three inputs sequence is calculated by cosine similarity mode The similarity of the two of them list entries vector of column vector.

7. a kind of pedestrian's attribute recognition approach, comprising:

The characteristic fragment sequence inputting shot and long term memory models are encoded, characteristic fragment sequential coding result is obtained；

Characteristic fragment sequential coding result input first is extracted into the relationship weight between characteristic fragment from attention model, The characteristic fragment sequential coding result is updated according to the relationship weight, obtains updated characteristic fragment for the first time Sequence；

Second of updated characteristic fragment sequence inputting shot and long term memory models are decoded, characteristic fragment sequence is obtained Column decoding result；

Characteristic fragment sequential decoding result input second is extracted into the relationship weight between attribute from attention model, according to The relationship weight is updated the characteristic fragment sequential decoding result to obtain the updated characteristic fragment sequence of third time；

8. method according to claim 1 or claim 7, which is characterized in that described according to the updated feature piece of the third time Duan Xulie carries out pedestrian's Attribute Recognition, obtains pedestrian's Attribute Recognition result, comprising:

By the updated characteristic fragment sequence of the third time by classifier carry out two classification prediction obtain attribute forecast as a result, Using the attribute forecast result as pedestrian's Attribute Recognition result.

9. a kind of pedestrian's property recognition means, which is characterized in that described device includes:

Relationship weight extraction module between characteristic fragment, for by the characteristic fragment sequence inputting first from attention model The relationship weight between characteristic fragment is extracted, the characteristic fragment sequence is updated according to the relationship weight, obtains the Primary updated characteristic fragment sequence；

Relationship weight extraction module between characteristic fragment and attribute is used for the first time updated characteristic fragment sequence Input feature vector segment and attribute attention model extract the relationship weight between characteristic fragment and attribute, according to the relationship weight The first time updated characteristic fragment sequence is updated to obtain second of updated characteristic fragment sequence；

Relationship weight extraction module between attribute is used for described second updated characteristic fragment sequence inputting second certainly Attention model extracts the relationship weight between attribute, according to the relationship weight to second of updated characteristic fragment Sequence is updated to obtain the updated characteristic fragment sequence of third time；

Pedestrian's Attribute Recognition module, for carrying out pedestrian's Attribute Recognition according to the updated characteristic fragment sequence of the third time, Obtain pedestrian's Attribute Recognition result.

10. a kind of computer equipment, including memory and processor, the memory is stored with computer program, the calculating When machine program is executed by the processor, so that the processor executes the step such as any one of claims 1 to 8 the method Suddenly.