CN107085716A

CN107085716A - Across the visual angle gait recognition method of confrontation network is generated based on multitask

Info

Publication number: CN107085716A
Application number: CN201710373017.5A
Authority: CN
Inventors: 何逸炜; 张军平
Original assignee: Fudan University
Current assignee: Fudan University
Priority date: 2017-05-24
Filing date: 2017-05-24
Publication date: 2017-08-22
Anticipated expiration: 2037-05-24
Also published as: CN107085716B

Abstract

The invention belongs to computer vision and machine learning field, specially a kind of across visual angle gait recognition method that confrontation network is generated based on multitask.Present invention mainly solves Gait Recognition the problem of big visual angle change drag Generalization Capability is reduced.For original pedestrian's sequence of frames of video, every two field picture is pre-processed first, and extracts gait exemplary feature；Then the hidden expression of gait is encoded to by neutral net, and carries out in latent space angular transformation；The gait exemplary feature that confrontation network reconfiguration goes out other visual angles is generated by multitask again；Finally it is identified using the hidden expression of gait.Compared to based on classification or other methods based on reconstruct, the present invention has stronger interpretation, and can lift recognition performance.

Description

Across the visual angle gait recognition method of confrontation network is generated based on multitask

Technical field

The invention belongs to computer vision, machine learning techniques field, and in particular to across the visual angle gait based on video is known Other method.

Background technology

One of the problem of across visual angle Gait Recognition problem based on video is computer vision and machine learning area research. Gait sequence of frames of video under given different visual angles, it is desirable to which gait frame sequence is judged according to computer vision or machine learning algorithm Whether the main body of row is same object.There have been many previous works in the current field, and it is big that its main method can be divided into three Class：Method based on reconstruct, the method based on subspace and the method based on deep learning.The following is some of this three classes method Bibliography：

[1]W.Kusakunniran,Q.Wu,J.Zhang,and H.Li,“Support vector regression for multi-view gait recognition based on local motion feature selection,”in Conference on Computer Vision and Pattern Recognition,pp.974–981,2010.

[2]M.Hu,Y.Wang,Z.Zhang,J.J.Little,and D.Huang,“View-invariant discriminative projection for multi-view gait-based human identification,” IEEE Transactions on Information Forensics and Security,vol.8,no.12,pp.2034– 2045,2013.

[3]W.Kusakunniran,Q.Wu,J.Zhang,H.Li,and L.Wang,“Recognizing gaits across views through correlated motion co-clustering,”IEEE Transactions on Image Processing,vol.23,no.2,pp.696–709,2014.

[4]S.Yu,H.Chen,Q.Wang,L.Shen,and Y.Huang,“Invariant feature extraction for gait recognition using only one uniform model,”Neurocomputing, vol.239,pp.81–93,2017.

[5]Z.Wu,Y.Huang,L.Wang,X.Wang,and T.Tan,“A comprehensive study on cross-view gait based human identification with deep CNNs,”IEEE Transactions on Pattern Analysis and Machine Intelligence,vol.39,no.2,pp.209–226,2017.

[6]Y.Feng,Y.Li,and J.Luo,“Learning effective gait features using LSTM,”in International Conference on Pattern Recognition,pp.320–325,2016.

[7]H.Iwama,M.Okumura,Y.Makihara,and Y.Yagi,“The OU-ISIR gait database comprising the large population dataset and performance evaluation of gait recognition,”IEEE Transactions on Information Forensics and Security,vol.7, no.5,pp.1511–1521,2012.

[8]Y.Makihara,R.Sagawa,Y.Mukaigawa,T.Echigo,and Y.Yagi,“Gait recognition using a view transformation model in the frequency domain,”in European Conference on Computer Vision,pp.151–163,Springer,2006.

[9]W.Kusakunniran,Q.Wu,H.Li,and J.Zhang,“Multiple views gait recognition using view transformation model based on optimized gait energy image,”in International Conference on Computer Vision,pp.1058–1064,2009.

[10]X.Xing,K.Wang,T.Yan,and Z.Lv,“Complete canonical correlation analysis with application to multi-view gait recognition,”Pattern Recognition,vol.50,pp.107–117,2016.。

The first method based on reconstruct is main based on reconstructing gait masterplate, such as the VTM moulds in [1,3,8,9] Type, this method predicts the value of each pixel in exemplary feature under aspect, this method by training multiple models, respectively Big, the low problem of recognition accuracy with computing cost.In order to reduce computing cost, the method [4] based on self-encoding encoder is come same When reconstruct each pixel of exemplary feature, and extract the feature representation of unchanged view angle.

The gait feature of different visual angles is projected to a public subspace by the method [2,10] based on subspace, and in public affairs Altogether similarity is calculated in subspace.But subspace method is typically only capable to, the linear dependence modeling feature, have ignored non- Linear dependence, discrimination is relatively low.

Recently, the model based on deep neural network has obtained larger application in computer vision field.[5,6] will be deep Spending network is used for Gait Recognition, and the feature representation of unchanged view angle has been extracted automatically.Although compared with previous methods on recognition accuracy Larger lifting has been obtained, but depth model lacks interpretation.

The content of the invention

It is an object of the invention to provide across the visual angle Gait Recognition based on video that a kind of discrimination height, computing cost are saved Method.

The relevant substance of the present invention is introduced first.

First, periodical energy figure

(a) in order to set up more effective gait exemplary feature, it is necessary to be pre-processed to original pedestrian's video sequence.It is right Each frame in video sequence, the separation of background, extracts gait profile diagram, afterwards occupies gait profile diagram before carrying out first In and be aligned to picture center, the gait profile diagram standardized；

(b) the gait profile graphic sequence based on standardization, is obtained often first by the cycle detection technology in Gait Recognition The period position of the standardization of one frame.A gait cycle is defined between two adjacent Local Minimums standardization period positions Time-domain.Afterwards, profile graphic sequence is divided into one or more gait cycles, and discarding can not constitute a complete gait week The frame of phase.According to the gait profile diagram sequence B of centralization_tWith the period position r of corresponding standardization_t, construction is with n_cIndividual passage Periodical energy figure (PEI) exemplary feature.Value in periodical energy figure PEI at (x, y) coordinate of k-th of passage is counted as the following formula Calculate:

Wherein：

M represents the time-domain window size that each passage is covered, and T (k) determines the time-domain of each passage covering Scope.By this method, the spatial information of profile graphic sequence can be divided to by n according to the period position of each frame_cThe individual time On domain, to encode time and the spatial information of gait simultaneously with this.

2nd, multitask generation confrontation network

The multitask generation confrontation network for Gait Recognition of the present invention is as shown in figure 1, main become by encoder, visual angle Change layer, maker and the part of arbiter four composition.Wherein：

(a) encoder E：In order to obtain the hidden expression of certain viewing angles, using convolutional neural networks as model in coding Device.The input x of encoder_uIt is the PEI masterplates that visual angle proposed by the present invention is u.The scale size of input is 64 × 64 × n_c。PEI Each passage of masterplate use time pond and is extracted effectively by independent feeding encoder, afterwards after full articulamentum The hidden expression z of feature_uIt is used as the output of encoder.For time pond, the mode used is average pond；

(b) view transformation layer V：It is assumed that gait data is as the change profile at visual angle is in the manifold of a higher-dimension.Manifold In sample the change at visual angle can be realized under conditions of identity information is kept towards specific direction movement.Given latent space Hidden expression z_u, view transformation can be described as：

Wherein, h_iRepresent the conversion vector by visual angle i-1 to visual angle i.z_vIt is the hidden expression under the v of visual angle；

(c) maker G：By hidden expression z_vAs input, the gait masterplate under the v of visual angle can be generated.Maker It is made up of five warp laminations, and uses ReLU as excitation function.Because directly reconstruct PEI masterplates have certain difficulty, Therefore, a passage of random selection PEI masterplates is reconstructed, and ensures hidden to represent that the letter of all passages can be preserved with this Breath.The input for defining maker is [z_v, c], c represents for the one-hot coding of passage；

(d) arbiter D：Dimension and encoder of the framework of arbiter except not using the operation of time pondization and output layer Different outer, remaining structure is identical with encoder.The output dimension for defining arbiter is n_v+n_c+n_d, wherein n_vIt is the quantity at visual angle, n_cIt is the quantity of PEI passages, n_dIt is the quantity of different identity in training set.The output correspondence n of arbiter_v+n_c+n_dHeight differentiates The parameter in addition to last layer is shared between device, sub- arbiter.Also, each arbiter is responsible for different differentiation tasks Judge whether that the sample of generation belongs to specific distribution.For example, in a model, preceding n_v+n_cIndividual sub- arbiter enables maker Generation meets the distribution of certain viewing angles and passage, then n_dIndividual sub- arbiter can make maker generation belong to dividing for specific identity Sample corresponding to cloth.

3rd, loss function

The present invention trains multitask generation confrontation network using two kinds of loss functions.

(a) lose pixel-by-pixel：In order to strengthen the hidden ability for representing to keep identity information, generation template is minimized first and true Loss pixel-by-pixel between real masterplate.According to exemplary feature x_vWith pseudo- exemplary featureCalculating loss pixel-by-pixel is：

(b) multitask confrontation loss：According to exemplary feature x_vWith pseudo- exemplary featureCalculate multitask confrontation loss：

Wherein, E represents the expectation for correspondence sample set, | | | |₁Represent L1 norms, vectorial s be identity, passage with The one-hot coding of angle information.D () represents the output of arbiter, and the dimension of arbiter output is identical with vector s dimension, to Nonzero element in amount s determines the distribution that pseudo- exemplary feature should belong to；

Defining final loss function is：

L=L_p+αL_a (6)

α is used for weighing loss pixel-by-pixel as hyper parameter and lost with multitask confrontation, defines after final loss function, Use the parameter of Back Propagation Algorithm alternating more new encoder, view transformation layer, maker and arbiter.

Across the visual angle gait recognition method proposed by the present invention that confrontation network is generated based on multitask, is concretely comprised the following steps：

(1) pedestrian's sequence of frames of video under input different visual angles, constructs gait exemplary feature：

Vector x_iIt is the gait exemplary feature under the i of visual angle, n_vIt is the quantity at all visual angles；

(2) for any different visual angle u, v, corresponding gait exemplary feature x is encoded using convolutional neural networks_uIt is extremely hidden Space, and obtain in latent space hidden be expressed as z_u；

(3) to hidden expression z in latent space_uView transformation is carried out, angle v is converted into, obtains hidden expression z_v；

(4) by hidden expression z_vAnd the one-hot coding of passage generates the life in confrontation network as input by multitask Pseudo- exemplary feature after into network under output angle v

(5) according to exemplary feature x_uWith pseudo- exemplary featureL is lost in calculating pixel-by-pixel_p；

(6) the differentiation network in confrontation network is generated using multitask, according to exemplary feature x_uWith pseudo- exemplary featureMeter Calculate multitask confrontation loss L_a；

(7) loss resists loss with multitask pixel-by-pixel for weighting, according to total losses L=L_p+αL_aTrain multitask generation pair Anti- network, α is used for weighing loss pixel-by-pixel as hyper parameter and lost with multitask confrontation.

In the present invention, the construction step of described gait masterplate is：

(1) preceding background separation is carried out to each frame of original video frame sequence, extracts gait profile diagram；And by gait wheel Wide figure translation zooms to picture centre, obtains the gait profile graphic sequence { B of centralization₁,B₂,B₃,…,B_n}；

(2) to each frame, the period position r of normalized in the gait profile graphic sequence of centralization_t, represent centralization Gait profile diagram t frames standardization period position；

(3) according to the gait profile diagram sequence B of centralization_tWith the period position r of corresponding standardization_t, construction is with n_cIt is individual The periodical energy figure exemplary feature of passage；Value in periodical energy figure PEI at (x, y) coordinate of k-th of passage is calculated as follows:

Wherein：

M represents the time-domain window size that each passage is covered, and T (k) determines the time-domain of each passage covering Scope.

In the present invention, described view transformation step is：

Hidden expression z under given visual angle u_u, transforming to visual angle v calculating process can be described as：

Wherein, h_iRepresent the conversion vector by visual angle i-1 to visual angle i.

In the present invention, described loss L_pWith L_aCalculation procedure be：

(1) according to exemplary feature x_vWith pseudo- exemplary featureCalculating is lost pixel-by-pixel：

(2) according to exemplary feature x_vWith pseudo- exemplary featureCalculate multitask confrontation loss：

Wherein, E represents the expectation for correspondence sample set, | | | |₁Represent L1 norms, vectorial s be identity, passage with The one-hot coding of angle information.D () represents the output of arbiter.

The inventive method uses the Nonlinear Modeling ability of convolutional neural networks, is extracted the specific hidden expression in visual angle.It is logical Cross and view transformation is carried out in latent space, reduce computing cost；And the ability of distribution modeling is carried based on generation confrontation network The feature with more expressiveness has been taken out, recognition efficiency is greatly improved.

Brief description of the drawings

Fig. 1：The detailed model flow chart of the present invention.

Fig. 2：OU-ISIR, CASIA-B, USF data set sample are shown.

Fig. 3:Average Accuracy under different walking states on CASIA-B data sets.

Embodiment

After the specific steps and model that describe the present invention, test of the invention in several gait data collection is shown below Effect.

Experiment employs three data sets, including OU-ISIR data sets, CASIA-B data sets and USF data sets.Fig. 2 Illustrate some samples of these three data sets.

OU-ISIR data sets one have 4007 different people, wherein male 2135, women 1872, age distribution Scope was from 1 years old to 94.The data have 4 different angles.Respectively 55 °, 65 °, 75 °, 85 °.Carried using the method for invention PEI masterplates are taken out, and set port number to be 3, after the picture of PEI interpolation to 64 × 64 pixel sizes, multitask generation are put into It is trained in confrontation network.

CASIA-B data sets one have 124 different people, 11 different visual angles.Wherein, everyone is at each visual angle On have 6 groups of gait sequences normally walked, 2 groups are carried the gait sequence that packet rows are walked, 2 groups of gait sequences of overcoat for wearing. CASIA-B has bigger angular field of view compared to OU-ISIR, but number is relatively fewer.We set PEI port number to be 10, The Gait Recognition accuracy rate tested respectively under different walking states.

USF is another conventional gait data collection, and one has 122 different people, everyone have 5 kinds it is various forms of Gait sequence, close to real scene.In an experiment, we are only tested using the gait sequence under different visual angles.I Set PEI port number be 10.

Experiment uses Rank-1 recognition accuracies as performance indications.By being used most in the latent space of view transformation layer Nearest Neighbor Classifier is identified.

Experimental example 1：The recognition performance of multitask generation confrontation network

This part Experiment illustrates different models, across recognition accuracy under visual angle.Method as a comparison, we select Self-encoding encoder, canonical correlation analysis, linear discriminant analysis, convolutional neural networks and local tensors discrimination model.Table 1 is illustrated Method of the invention and other method compare on three data sets.It can be seen that, the present invention has very big compared to other method Lifting.

Experimental example 2：Influence of the different loss functions to model performance

When table 2 is illustrated using different loss functions, performance change of the model on CASIA-B data sets.It can be seen that, Lost with reference to multitask confrontation and loss can be with the recognition performance of lift scheme pixel-by-pixel；And different loss functions are used alone When, the performance of model can be reduced.

Experimental example 3：Influence of the different walking states to model performance

Fig. 3 shows across the visual angle recognition accuracy on CASIA-B under different walking states, and one has three kinds of different rows Walk state：Normal walking, carries packet rows and walks, and wears overcoat walking.From the figure, it can be seen that accuracy rate during normal walking is most Height, the gait sequence for wearing overcoat is more notable compared to reduction of the gait sequence of bag to model performance is carried.

Experimental example 4：Influence of different masterplate and the PEI port numbers to recognition accuracy

Table 3 illustrates the influence to recognition accuracy using different masterplates and PEI port numbers.We use gait energy Scheme (GEI) and time step morphotype version (CGI), and analyze average recognition accuracy when PEI port numbers distinguish 3,5,10.Can be with See, the PEI masterplates that we invent compare 6EI and C6I, there is higher recognition accuracy, and with the increasing of PEI masterplate numbers Plus, recognition accuracy can be lifted further.

Table 1：Recognition accuracy (%) under distinct methods

Table 2：Model Identification accuracy rate (%) under different loss functions

	54°	90°	126°
				Loss+multitask confrontation loss pixel-by-pixel	82.4	73.1	81.3
Lose pixel-by-pixel	81.5	71.7	83.5
				Multitask confrontation loss	74.6	68.6	75.4

The influence of table 3 different masterplates and PEI port numbers to recognition accuracy

Claims

1. across the visual angle gait recognition method of confrontation network is generated based on multitask, it is characterised in that concretely comprise the following steps：

(2) for any different visual angle u, v, corresponding gait exemplary feature x is encoded using convolutional neural networks_uTo latent space, And obtain in latent space hidden be expressed as z_u；

(4) by hidden expression z_vAnd the one-hot coding of passage generates the generation network in confrontation network as input by multitask Pseudo- exemplary feature under output angle v afterwards

(6) the differentiation network in confrontation network is generated using multitask, according to exemplary feature x_uWith pseudo- exemplary featureCalculate many Task confrontation loss L_a；

(7) loss resists loss with multitask pixel-by-pixel for weighting, according to total losses L=L_p+αL_aTrain multitask generation confrontation net Network.

2. gait recognition method according to claim 1, it is characterised in that the construction step of described gait masterplate is：

(1) preceding background separation is carried out to each frame of original video frame sequence, extracts gait profile diagram；And by gait profile diagram Translation zooms to picture centre, obtains the gait profile graphic sequence { B of centralization₁,B₂,B₃,…,B_n}；

(2) to each frame, the period position r of normalized in the gait profile graphic sequence of centralization_t, represent the step of centralization The period position of the standardization of the t frames of state profile diagram；

(3) according to the gait profile diagram sequence B of centralization_tWith the period position r of corresponding standardization_t, construction is with n_cIndividual passage Periodical energy figure (PEI) exemplary feature；Value in periodical energy figure at (x, y) coordinate of k-th of passage is calculated as follows:

<mrow> <msub> <mi>PEI</mi> <mi>k</mi> </msub> <mrow> <mo>(</mo> <mi>x</mi> <mo>,</mo> <mi>y</mi> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mn>1</mn> <mi>n</mi> </mfrac> <munder> <mo>&Sigma;</mo> <mrow> <msub> <mi>r</mi> <mi>t</mi> </msub> <mo>&Element;</mo> <mi>T</mi> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> </mrow> </munder> <msub> <mi>B</mi> <mi>t</mi> </msub> <mrow> <mo>(</mo> <mi>x</mi> <mo>,</mo> <mi>y</mi> <mo>)</mo> </mrow> </mrow>

Wherein：

M represents the time-domain window size that each passage is covered, and T (k) determines the scope of the time-domain of each passage covering.

3. gait recognition method according to claim 1, it is characterised in that described view transformation step is：

Hidden expression z under given visual angle u_u, transform to the hidden expression z under the v of visual angle_vCalculating process be：

<mrow> <msub> <mi>z</mi> <mi>v</mi> </msub> <mo>=</mo> <msub> <mi>z</mi> <mi>u</mi> </msub> <mo>+</mo> <munderover> <mo>&Sigma;</mo> <mrow> <mi>i</mi> <mo>=</mo> <mi>u</mi> </mrow> <mrow> <mi>v</mi> <mo>-</mo> <mn>1</mn> </mrow> </munderover> <msub> <mi>h</mi> <mi>i</mi> </msub> </mrow>

4. gait recognition method according to claim 1, it is characterised in that described loss L_pWith L_aCalculation procedure be：

Wherein, E represents the expectation for correspondence sample set, | | | |₁L1 norms are represented, vectorial s is identity, passage and angle The one-hot coding of information.D () represents the output of arbiter.

5. gait recognition method according to claim 1, it is characterised in that the multitask generation confrontation network it is main by Encoder, view transformation layer, maker and the part of arbiter four composition；Wherein：

(a) encoder：Encoder in using convolutional neural networks as model, the input x of encoder_uIt is the PEI that visual angle is u Masterplate；The scale size of input is 64 × 64 × n_c；Each passage of PEI masterplates is made afterwards by independent feeding encoder Extract that effective feature is hidden to represent z with time pond and after full articulamentum_uIt is used as the output of encoder；

(b) view transformation layer：It is assumed that gait data is as the change profile at visual angle is in the manifold of a higher-dimension, the sample in manifold This is towards specific direction movement to keep realizing under conditions of identity information the change at visual angle；The hidden expression of given latent space z_u, view transformation can be described as：

<mrow> <msub> <mi>z</mi> <mi>v</mi> </msub> <mo>=</mo> <msub> <mi>z</mi> <mi>u</mi> </msub> <mo>+</mo> <msubsup> <mi>&Sigma;</mi> <mrow> <mi>i</mi> <mo>=</mo> <mi>u</mi> </mrow> <mrow> <mi>v</mi> <mo>-</mo> <mn>1</mn> </mrow> </msubsup> <msub> <mi>h</mi> <mi>i</mi> </msub> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>3</mn> <mo>)</mo> </mrow> </mrow>

Wherein, h_iRepresent the conversion vector by visual angle i-1 to visual angle i, z_vIt is the hidden expression under the v of visual angle；

(c) maker：By hidden expression z_vAs input, the gait masterplate under the v of visual angle is generated；Maker is by five warps Lamination is constituted, and uses ReLU as excitation function；One passage of random selection PEI masterplates is reconstructed, and is ensured with this Hidden expression can preserve the information of all passages；The input for defining maker is [z_v, c], c represents for the one-hot coding of passage；

(d) arbiter：Except not using, time pondization is operated the framework of arbiter and the dimension of output layer is different with encoder Outside, remaining structure is identical with encoder；The output dimension for defining arbiter is n_v+n_c+n_d, wherein n_vIt is the quantity at visual angle, n_cIt is The quantity of PEI passages, n_dIt is the quantity of different identity in training set；The output correspondence n of arbiter_v+n_c+n_dIndividual sub- arbiter, son The parameter in addition to last layer is shared between arbiter；Also, each arbiter is responsible for different differentiation tasks to judge The sample whether generated belongs to specific distribution.