CN109635728A

CN109635728A - A kind of isomery pedestrian recognition methods again based on asymmetric metric learning

Info

Publication number: CN109635728A
Application number: CN201811515924.XA
Authority: CN
Inventors: 赖剑煌; 程海杰; 张权
Original assignee: Sun Yat Sen University
Current assignee: Sun Yat Sen University
Priority date: 2018-12-12
Filing date: 2018-12-12
Publication date: 2019-04-16
Anticipated expiration: 2038-12-12
Also published as: CN109635728B

Abstract

The isomery pedestrian recognition methods again based on asymmetric metric learning that the invention discloses a kind of, depth characteristic under different modalities is carried out asymmetric measurement by this method, step is: different modalities depth characteristic being projected to the communal space respectively using the sparse self-encoding encoder of two not shared parameters, global restriction is introduced simultaneously and local restriction goes to constrain the distance between different modalities depth characteristic, is reduced the inter- object distance between different modalities feature and is increased between class distance；The constraint result of global restriction and local restriction is propagated backward in trained network as supervisory signals and is used to correct parameters.The present invention makes network ignore modal information as much as possible and focuses more on identity information by mode gap between diminution different modalities, to improve pedestrian's feature representation power and pedestrian's matching accuracy.

Description

A kind of isomery pedestrian recognition methods again based on asymmetric metric learning

Technical field

The present invention relates to computer vision fields, more particularly, to a kind of isomery row based on asymmetric metric learning People's recognition methods again.

Background technique

With the fast development of modern society, urban population density is higher and higher, and safety problem also increasingly causes people Attention.In order to prevent and avoid the generation of security incident in time, a large amount of monitoring camera is mounted and applies in public field Institute.In face of the monitoring data of complicated monitoring network and magnanimity, how to realize and automatically analyze and interpret the monitoring of the multiple-camera ken The information that system provides, generation and the good social security of maintenance to crime prevention have positive facilitation.Therefore, pedestrian A hot research content for having become computer vision field is identified again.

Pedestrian identifies the crucial composition portion of (person re-identification) as video monitoring research field again Point, the purpose is to monitor in other camera kens of network to some target pedestrian appeared in the monitoring camera ken A large amount of pedestrians in quickly and accurately this target pedestrian is identified.Pedestrian again identification technology application can greatly reduce Artificial participation in video monitoring is realized to the pedestrian and its behavior quick in monitor video and is accurately analyzed.Currently, main Recognition methods is mainly that the external appearance characteristic for passing through extraction pedestrian and color characteristic (RGB feature) go to match to the pedestrian of stream again, These methods can be considered as RGB-RGB single mode pedestrian matching.However, these methods have a strong hypothesis: assuming that Clothes of the same pedestrian when different cameras occur remain unchanged as much as possible, can be regarded as pedestrian in short-term and know again Not.Therefore, when pedestrian clothes occur conspicuousness variation or when the color characteristic of pedestrian become not using when, these The performance of method will sharply decline, because color characteristic at this time is more the interference shown to model, largely On the different pedestrians for wearing same color clothes can be misjudged to be same a group traveling together.So in recent years to overcome in extreme condition The shortcomings that lower color characteristic fails, the data for introducing other mode go to make up the deficiency of RGB data, such as infrared data (IR number According to), RGB-IR cross-module state pedestrian can be regarded as and match (isomery pedestrian matching), isomery pedestrian identifies that maximum challenge exists again Mode gap between how reducing different modalities.Currently, there is scholar to propose to be gone to reduce different moulds with the method for depth zero padding Mode gap between state, but the recognition result of this method is inaccurate, and is not able to satisfy application request still.

Summary of the invention

The purpose of the present invention is to overcome the shortcomings of the existing technology with it is insufficient, provide a kind of based on asymmetric metric learning Isomery pedestrian recognition methods again, this method can overcome under extreme condition color characteristic failure and isomery pedestrian, and identification accuracy is not again High disadvantage makes network ignore modal information as much as possible and focus more on identity by mode gap between diminution different modalities Information, to improve pedestrian's feature representation power and pedestrian's matching accuracy.

The purpose of the present invention is realized by the following technical solution: a kind of isomery pedestrian based on asymmetric metric learning is again Recognition methods, comprising steps of

During training pattern, the pedestrian image under both modalities which is inputted, extracts depth characteristic respectively；

Depth characteristic under different modalities is subjected to asymmetric measurement, step is: using the sparse of two not shared parameters Different modalities depth characteristic is projected to the communal space respectively by self-encoding encoder, while introducing global restriction and local restriction goes to constrain Distance between different modalities depth characteristic reduces the inter- object distance between different modalities feature and increases between class distance；It will be global The constraint result of constraint and local restriction propagates backward in trained network as supervisory signals for correcting parameters；

The loss of global characteristics and local feature is calculated, according to depth characteristic with overall situation loss, local losses and non-right Claim global restriction and the sum of local restriction in measurement to reach to be minimised as target and go optimization training pattern.

The present invention through the above steps, as long as giving the pedestrian of any both modalities which recognition training data again, so that it may instruct Isomery pedestrian identification model again is practised, has the advantages that precision is high, fireballing to isomery pedestrian matching.

Preferably, the depth characteristic of image under different modalities is extracted, step is:

Firstly, using the ResNet50 disaggregated model of the pre-training on ImageNet data set as core network, will lead It is divided into three branches after dry network；

Then, from top to bottom, each branch extracts the high-level characteristic of disaggregated model, and by its horizontal homogeneous piecemeal；

Then, each branch is obtained into several global characteristics and the part of fixed size by the operation of pondization and dimensionality reduction Feature；

Finally, above-mentioned global characteristics and local feature are stitched together in order, the depth characteristic of input picture is obtained, That is the complete characterization expression of pedestrian.

For the mode gap reduced between isomery pedestrian's data, it is non-right that the present invention carries out the depth characteristic under different modalities Claim measurement, step is:

Firstly, the depth characteristic of extraction is divided into F^BAnd F^RTwo groups, Wherein B, R respectively represent RGB mode, IR mode,Indicate i-th of depth characteristic vector；

Then, by two groups of feature F^BAnd F^R, respectively by the sparse self-encoding encoder SAE of two not shared parameters, each is dilute It dredges self-encoding encoder to be made of two full articulamentums, respectively as encoder E and decoder D, encoder E is responsible for different modalities spy Sign projects to the communal space, and decoder D is responsible for being remapped to the feature of coding and the consistent sky of input feature vector space size Between；

Then, building reconstruct loss, is denoted as l_r, for constraining the output and input of SAE, it is consistent it as far as possible:

l_r=| | f^B, D^B(E^B(f^B))||₂+||f^R, D^R(E^R(f^R))||₂；

f^B, f^R, E^B, E^R, D^B, D^RRespectively represent feature, the encoder and decoder of mode B and mode R；

Finally, introducing global restriction in the communal space and being used to constrain the gap between different modalities feature distribution, introduce part Constraint is believed for reducing the inter- object distance between different modalities feature and increasing between class distance using above-mentioned constraint result as supervision Number backpropagation returns training pattern and removes amendment parameters.

Further, global restriction is used to constrain the gap between different modalities feature distribution, is denoted as l_global=W (E^B(f^B), E^R(f^R))², wherein W meets for two any given distribution X=N (m_X, C_X) and Y=N (m_r, C_Y), m, C respectively represent X distribution and Y The mean value and variance of distribution, have

Further, local restriction is used to reduce the inter- object distance between different modalities feature and increases between class distance, note For l_local=(max (d (f, p))-min (d (f, n))+α), p ∈ A (f), n ∈ B (f), A (f), B (f) are respectively represented and feature f There is the characteristic set of common identity information and different identity information, d () represents the Euclidean distance between two features, and α is to use In control positive sample to and negative sample to the hyper parameter of spacing.

Further, in order to keep the communal space more effective, a sparse loss l is constructed_sparseRemove the defeated of constraint hidden layer Out, l_sparse=| | E^B(f^B)||₁+||E^R(f^R)||₁。

Further, each sparse self-encoding encoder is made of two full articulamentums with ReLU activation primitive.

Preferably, to the depth characteristic extracted, the loss for calculating global characteristics is gone with Triplet Loss function, is used Softmax function goes to calculate the loss of local feature, with the reconstruct damage in global loss, local losses and asymmetric measurement It loses, sparse loss, the sum of global restriction and local restriction reach and be minimised as target and go optimization training pattern.To improve model Feature representation power.

Compared with the prior art, the invention has the following advantages and beneficial effects:

1, the big, pedestrian of the invention for the modal information gap between different modalities in existing isomery pedestrian again identification mission With the big disadvantage of difficulty, propose that asymmetric measurement is carried out to the depth characteristic under different modalities mutually goes together under different modalities to reduce The characteristic distance in the human world may be used on going in arbitrary feature extraction network providing supervision message for training network and realize that end is arrived The training at end effectively improves the quality of pedestrian's feature extraction and accelerates network convergence.

2, different loss function coorinated trainings are taken with having differentiation for different features in the present invention, compared to single damage Function training is lost, the present invention is purposive to make network ignore modal information as much as possible and focus more on pedestrian's identity information, from And the more complete feature representation of pedestrian is obtained, therefore it is better than existing method from far away in precision.

3, the global and local thought combined is employed herein to go to extract pedestrian's feature, compared to single features, The present invention has obtained the more complete feature representation of pedestrian, therefore increases in precision.

Detailed description of the invention

Fig. 1 is the general function frame diagram of the present embodiment method.

Specific embodiment

The attached figures are only used for illustrative purposes and cannot be understood as limitating the patent；To those skilled in the art, The omitting of some known structures and their instructions in the attached drawings are understandable.With reference to the accompanying drawings and examples to of the invention Technical solution is described further.

As shown in Figure 1, a kind of isomery pedestrian recognition methods again based on asymmetric metric learning of the present embodiment, mainly includes Three feature extraction, asymmetric measurement, classification steps, are below specifically described each step.

One, feature extraction

Feature extraction is carried out to pedestrian, step is:

Then, from top to bottom, each branch extracts the high-level characteristic of disaggregated model and is 1 piece, 2 by its horizontal homogeneous cutting Block and 3 pieces；

Then, by each branch by the operation of pondization and dimensionality reduction obtain fixed size be 256 dimension 3 global characteristics and 5 local features；

Finally, the feature that 8 256 are tieed up is stitched together in order to obtain the feature of one 2048 dimension, as defeated Enter the depth characteristic of image, i.e. the complete characterization expression of pedestrian.Features described above is used for subsequent asymmetric measurement and classification.

The global and local thought combined is employed herein to go to extract pedestrian's feature, compared to single features, originally Invention has obtained the more complete feature representation of pedestrian, therefore increases in precision.

Two, asymmetric measurement

The step is that the feature under different modalities is projected and rebuild by different projection matrixes, to reduce isomery Mode gap between pedestrian's data, step is:

Firstly, for the feature of extraction(B/R respectively represents RGB Mode/IR mode), the feature of extraction is divided by F by mode selector^BAnd F^RTwo groups；

Then, by two groups of feature F^BAnd F^R, respectively by the sparse self-encoding encoder SAE of two not shared parameters, each is dilute It dredges self-encoding encoder to be made of two full articulamentums with ReLU activation primitive, respectively as encoder E and decoder D, coding Device E is responsible for different modalities Projection Character to the communal space, and decoder D is responsible for for the feature of coding being remapped to special with input The consistent space of space size is levied, then building reconstruct loss, is denoted as l_r, for constraining the output and input of SAE, it is made to the greatest extent may be used It is able to maintain consistent.l_r=| | f^B, D^B(E^B(f^B))||₂+||f^R, D^R(E^R(f^R))||₂, f^B, f^R, E^B, E^R, D^B, D^RRespectively represent mode B Feature, encoder and decoder with mode R.

Meanwhile in order to keep the communal space more effective, a sparse loss l is constructed_sparseThe output of constraint hidden layer is gone, l_sparse=| | E^B(f^B)||₁+||E^R(f^R)||₁；

Finally, introducing global restriction l in the communal space_globalWith local restriction l_localIt goes between constraint different modalities feature Distance, and other modules for returning network as supervisory signals backpropagation go the parameter of amendment characteristic extraction step, make spy Sign, which is extracted, to be ignored modal information as far as possible and pays close attention to pedestrian's identity information, to improve the expression of characteristics of image.

Global restriction is used to constrain the gap between different modalities feature distribution, is denoted as l_global=W (E^B(f^B), E^R(f^R))², Wherein W meets for two any given distribution X=N (m_X, C_X) and Y=N (m_Y, C_Y), m, C respectively represent X distribution and Y points The mean value and variance of cloth, have Part Constraint is denoted as l for reducing the inter- object distance between different modalities feature and increasing between class distance_local=(max (d (f, p))-min (d (f, n))+α), p ∈ A (f), n ∈ B (f), A (f)/B (f), which is respectively represented, has common identity information and different identity with feature f The characteristic set of information, d () represent the Euclidean distance between two features, α be for control positive sample to and negative sample pair The hyper parameter of spacing.

In the present invention using two not shared parameter sparse self-encoding encoder respectively by different modalities Projection Character to share Space.Global restriction l is introduced simultaneously_globalWith local restriction l_localThe distance between constraint different modalities feature is gone, different modalities are made Inter- object distance between feature reduces and between class distance increases, and effectively makes to ignore modal information in characteristic extraction procedure as far as possible And id information is paid close attention to, to improve the expression of characteristics of image.

Three, classify

The step is to take different loss coorinated trainings while input feature vector is had to differentiation, to carry out effectively about to pedestrian's feature Beam, step are: for the depth characteristic extracted from characteristic extraction step, being gone to calculate 3 global spies with Triplet Loss function The loss for calculating five local features is gone in the loss of sign with Softmax function, then minimizes global loss, office by joint Optimization mould is removed in the sum of reconstruct loss, sparse loss, global restriction and local restriction in portion's loss and asymmetric metric module Type improves aspect of model expressiveness.

Different loss function coorinated trainings are taken with having differentiation for different features in the present invention, compared to individual losses Function training, the present invention is purposive to make network ignore modal information as much as possible and focus more on pedestrian's identity information, thus The more complete feature representation of pedestrian is obtained, therefore is better than existing method from far away in precision.

The experimental results showed that the present invention knows the Rank1 on data set SYSU-MM01 in current maximum cross-module state pedestrian again It is promoted respectively from 24.43% and 26.92% to 66.26% and 66.7% with mAP, there is very big performance to mention compared with other methods It rises.

It can implement the technology that the present invention describes by various means.For example, these technologies may be implemented in hardware, consolidate In part, software or combinations thereof.For hardware embodiments, processing module may be implemented in one or more specific integrated circuits (ASIC), digital signal processor (DSP), programmable logic device (PLD), field-programmable logic gate array (FPGA), place Manage device, controller, microcontroller, electronic device, other electronic units for being designed to execute function described in the invention or In a combination thereof.

It, can be with the module of execution functions described herein (for example, process, step for firmware and/or Software implementations Suddenly, process etc.) implement the technology.Firmware and/or software code are storable in memory and are executed by processor.Storage Device may be implemented in processor or outside processor.

Those of ordinary skill in the art will appreciate that: realize that all or part of the steps of above method embodiment can pass through The relevant hardware of program instruction is completed, and program above-mentioned can store in a computer-readable storage medium, the program When being executed, step including the steps of the foregoing method embodiments is executed；And storage medium above-mentioned includes: ROM, RAM, magnetic disk or light The various media that can store program code such as disk.

Obviously, the above embodiment of the present invention be only to clearly illustrate example of the present invention, and not be pair The restriction of embodiments of the present invention.For those of ordinary skill in the art, may be used also on the basis of the above description To make other variations or changes in different ways.There is no necessity and possibility to exhaust all the enbodiments.It is all this Made any modifications, equivalent replacements, and improvements etc., should be included in the claims in the present invention within the spirit and principle of invention Protection scope within.

Claims

1. a kind of isomery pedestrian recognition methods again based on asymmetric metric learning, which is characterized in that comprising steps of

Depth characteristic under different modalities is subjected to asymmetric measurement, step is: using the sparse self-editing of two not shared parameters Different modalities depth characteristic is projected to the communal space respectively by code device, while introducing global restriction and local restriction goes constraint different Distance between mode depth characteristic reduces the inter- object distance between different modalities feature and increases between class distance；By global restriction It is propagated backward in trained network as supervisory signals for correcting parameters with the constraint result of local restriction；

The loss of global characteristics and local feature is calculated, according to depth characteristic with overall situation loss, local losses and asymmetric degree The sum of global restriction and local restriction in amount reach be minimised as target go optimization training pattern.

2. the isomery pedestrian recognition methods again according to claim 1 based on asymmetric metric learning, which is characterized in that mention The depth characteristic of image under different modalities is taken, step is:

Firstly, using the ResNet50 disaggregated model of the pre-training on ImageNet data set as core network, by backbone network It is divided into three branches after network；

Then, each branch is obtained into several global characteristics and the part spy of fixed size by the operation of pondization and dimensionality reduction Sign；

Finally, above-mentioned global characteristics and local feature are stitched together in order, the depth characteristic of input picture is obtained, at once The complete characterization of people is expressed.

3. the isomery pedestrian recognition methods again according to claim 1 based on asymmetric metric learning, which is characterized in that will Depth characteristic under different modalities carries out asymmetric measurement, and step is:

Firstly, the depth characteristic of extraction is divided into F^BAnd F^RTwo groups, Wherein B, R respectively represent RGB mode, IR mode, f_i ^mIndicate i-th of depth characteristic vector；

Then, by two groups of feature F^BAnd F^R, respectively by the sparse self-encoding encoder SAE of two not shared parameters, each it is sparse from Encoder is made of two full articulamentums, and respectively as encoder E and decoder D, encoder E is responsible for throwing different modalities feature To the communal space, decoder D is responsible for being remapped to the feature of coding and input feature vector space size consistent space shadow；

l_r=| | f^B, D^B(E^B(f^B))||₂+||f^R, D^R(E^R(f^R))||₂；

Finally, introducing global restriction in the communal space and being used to constrain the gap between different modalities feature distribution, introduce local restriction For reducing the inter- object distance between different modalities feature and increasing between class distance, and above-mentioned constraint result is anti-as supervisory signals Amendment parameters are removed to training pattern is propagated back to.

4. the isomery pedestrian recognition methods again according to claim 3 based on asymmetric metric learning, which is characterized in that complete Office's constraint is denoted as l for constraining the gap between different modalities feature distribution_global=W (E^B(f^B), E^R(f^R))², wherein W meet pair In two any given distribution X=N (m_X, C_X) and Y=N (m_Y, C_Y), m, C respectively represent mean value and the side of X distribution and Y distribution Difference has

5. the isomery pedestrian recognition methods again according to claim 3 based on asymmetric metric learning, which is characterized in that office Portion's constraint is denoted as l for reducing the inter- object distance and increase between class distance that different modalities feature is asked_local=(max (d (f, p))- Min (d (f, n))+α), p ∈ A (f), n ∈ B (f), A (f), B (f), which are respectively represented, has common identity information and different bodies from feature f The characteristic set of part information, d () represent the Euclidean distance between two features, α be for control positive sample to and negative sample To the hyper parameter of spacing.

6. the isomery pedestrian recognition methods again according to claim 3 based on asymmetric metric learning, which is characterized in that structure Build a sparse loss l_sparseRemove the output of constraint hidden layer, l_sparse=| | E^B(f^B)||₁+||E^R(f^R)||₁。

7. the isomery pedestrian recognition methods again according to claim 3 based on asymmetric metric learning, which is characterized in that every One sparse self-encoding encoder is made of two full articulamentums with ReLU activation primitive.

8. the isomery pedestrian recognition methods again according to claim 1 based on asymmetric metric learning, which is characterized in that right The depth characteristic extracted goes the loss for calculating global characteristics with Triplet Loss function, goes to calculating office with Softmax function The loss of portion's feature, with reconstruct loss, the sparse loss, global restriction in global loss, local losses and asymmetric measurement Reach with the sum of local restriction and is minimised as target and goes optimization training pattern.