CN110533004A

CN110533004A - A kind of complex scene face identification system based on deep learning

Info

Publication number: CN110533004A
Application number: CN201910845089.4A
Authority: CN
Inventors: 黄玲; 郝宇
Original assignee: Harbin University of Science and Technology
Current assignee: Harbin University of Science and Technology
Priority date: 2019-09-07
Filing date: 2019-09-07
Publication date: 2019-12-03

Abstract

The invention discloses a kind of complex scene face identification system based on deep learning, is related to technical field of face recognition；Its recognition methods is as follows: Step 1: the basis based on Faster-RCNN network improves and does Face datection；Super-resolution rebuilding is done Step 2: improving based on SRGAN network；Recognition of face is done Step 3: improving based on CAPSNET network；The present invention is based on SRGAN networks to improve, and carries out super-resolution rebuilding to the collected picture of institute, is conducive to recognition of face；Resolution is improved, and easy to operate, the time can be saved, and is capable of the face of accurate detection of complex scene.

Description

A kind of complex scene face identification system based on deep learning

Technical field

The invention belongs to technical field of face recognition, and in particular to a kind of complex scene recognition of face based on deep learning System.

Background technique

In recent years, international situation was extremely severe, and violence terrorist incident, group's conflict happen occasionally, and security protection is by various circles of society Close concern.On the one hand, say that anti-terrorism situation complex is severe from international angle, security protection by government, enterprise, organ, The great attention in city, community.And wherein " bright as snow engineering ", " day net engineering " are even more to bias toward using video monitoring system as base Plinth establishes the public safety prevention and treatment at " all standing, no dead angle ".Therefore, in recent years we it is not difficult to find that being seen everywhere in life Relevant video surveillance applications provide more safety guarantee to people.

With the installation of more and more monitoring devices, many corresponding problems are also brought.It is important that wherein Be exactly recognition of face problem, face recognition technology emerges one after another by prolonged development, all kinds of technologies, the depth of research also from Initially relatively simple uniform background is to current various complex environments, such as illumination, color, background, noise, posture, expression. For many mature face recognition technologies both for the face under restrained condition, so-called restrained condition is exactly face institute at present Complex jamming condition ideal in environment, not excessive, these methods can obtain the face under the state very high Accuracy of identification.However in actual complex environment, for example, supermarket, airport, the public arenas such as station.The stream of people is non-under these scenes Chang Mi, and face is blocked by illumination, and a variety of causes such as expression posture influence, and the image majority of face is fuzzy, incompleteness , this adds difficulty to conventional recognition methods.Also, since existing monitoring device is clapped in farther away situation It takes the photograph, and the resolution ratio of itself is just very low, even more increases the difficulty of recognition of face.

Therefore, the image just because of the monitoring collection under complex environment blocks, and the image of low resolution is included Information is less, will lead to the decline of accuracy of identification naturally, the problem of from current physical presence in view of, under complex scene This research field of recognition of face has very big potential value and industry requirement, if under complex environment and constraint environment The precision of human face identification is consistent, this will be to monitoring deployment, criminal's tracking, intelligent entrance guard, case reduction, trade financing etc. A series of occasions for needing to identify face greatly facilitate, and play the due social value of face recognition technology.

Domestic and international present Research analysis:

1.1, human face detection tech present Research: at the beginning of human face detection tech development, Face datection can only be in no background picture In detect the position of face.With the progress of the technology, part human face detection tech can accurately detect nature The face of multiple angles under scene such as may determine that whether two faces are the same person, can be according to the year of human face analysis people Age, gender and expression etc..In recent years, with the emergence of depth learning technology, occur it is some newer, based on deep learning Face datection algorithm.

Detection model in kind based on deep learning has very much, and such as more famous in recent years you only sees primary network (You Only Look Once, YOLO), single detector (SingleShot Detector, SSD), the detection model based on region is wanted It is talked about from (Region-CNN, R-CNN), R-CNN is it is to be understood that increase the exhaustive range of feature, and then discovery has wherein The feature of value.Probably steps are as follows: finding n candidate window by selective search to the image of width input, utilizes CNN carries out feature extraction to them, and n subgraph is uniformly zoomed to m*m, then carries out convolution operation, utilizes (support vector machine, SVM) algorithm carries out Classification and Identification to feature vector.R-CNN is to every a kind of progress SVM Training, gives a mark to each region according to the feature class of output, final to determine to retain or refuse the region.This magnanimity of R-CNN Exhaustive method will obviously bring huge calculation amount.In kind detection method of the another kind based on deep learning is use space gold word Tower basin (Spatial Pyramid Pooling, SPP-Net), the feature of SPP maximum are no longer to be concerned about the ruler of input picture It is very little, but according to the other number of last output class, the pond layer of multiple and different ranges is generated by algorithm, by them to input Carry out the processing of parallel pondization, keep final feature output number identical as classification number is generated, then carry out again the comparison of classification with Determine, the network that such technology generates is called SPP-Net.The characteristic pattern that the network only needs to calculate complete image is primary, so Pass through the feature of pond beggar window afterwards, to be kept fixed the output of length, than RCNN elder generation partition window then again to each window The efficiency for carrying out convolution wants fast 30 to 170 times, and has better accuracy rate.In recent years, SPP-Net is of wide application: It is the pond in order to solve the problems, such as comentropy, by maximum probability pond if Wang proposes a kind of sparse pyramid pondization strategy Layer is replaced with spatial pyramid pond layer, and by the verifying of KTH data set, which is effective. Mukuta proposes a kind of rectangular projection being considered as orthogonal pond in function space, and the functional form of partial descriptor is orthogonal It is projected in lower order polynomial expressions space.This method solve spatial pyramid matchings to use the system of local feature in image region Characteristic is counted as global characteristics, and has evaluated the robustness of this method, the results show this method is effective.

R-CNN and SPPNet based on convolutional neural networks have very strong target detection accuracy rate, but they are still deposited In some shortcoming and defect, thus in Face datection will be carried out using the Faster R-CNN based on depth learning technology. Fast R-CNN is improved on the basis of SPPNet, and it has been grafted on VGG16 and has been formed by network, and SPP is changed At pooling layers of RoI Layer, and do not use SVM classifier, but by SoftmaxClassifer and Bounding-Box Regressors joint training mode carrys out undated parameter, realizes whole network and trains end to end.RoI Pooling Layer can be understood as the simplification of SPP-Net, can include the pond layer of different scale in SPP, and RoI Layer It only include a kind of scale, it is the fractionation that picture is first carried out to same scale, and each sub-block is exactly RoI, then to all RoI Max-Pool is carried out, the maximum value of each Block is obtained.Fast R-CNN remains the net before the 5th pond layer of VGG16 Network, behind connect oneself RoI Pooling Layer, softmax classification is then carried out by full articulamentum, is ultimately formed whole A network.Later people's habit adds a RPN network in the front, for carrying out the screening of a candidate frame, institute to picture Become the form of " RPN+Fast R-CNN " with whole network, this namely our Faster R-CNN for often saying.At this stage The application range of FasterR-CNN is very extensive, and FasterR-CNN is used for target detection as Liu is proposed, will The model convolutional network of FasterR-CNN network has changed VGG16 model into, and a large amount of result demonstrates Faster R-CNN network Better achievement can be obtained really for target detection.Jiang proposes the FasterR-CNN for Face datection, is advising greatly There is leading result compared to other method for detecting human face on mould human face data collection WIDER and FDDB.

1.2, image super-resolution reconstructs present Research:

Super-resolution (Super-Resolution SR) technology refers to from observing that low-resolution image reconstructs corresponding height Image in different resolution possesses extensive use in scenes such as medical images.SR can be divided into two classes: rebuild from multiple low-resolution images high Image in different resolution, and from single low-resolution method for reconstructing, i.e., (Single Image Super-Resolution, SISR).

In recent years, due to the success that deep learning is obtained in numerous areas, so that the Super-resolution reconstruction based on deep learning Construction method has become a hot topic of research.2014, Dong etc. took the lead in for convolutional neural networks being applied in image super-resolution field, It is proposed super-resolution (the Super-Resolution using Convolutional Neural using convolutional neural networks Network, SRCNN) algorithm, which passes through the convolutional neural networks Structure learning low resolution of layer to high-resolution association Relationship, the high-definition picture effect reconstructed is greatly improved compared to conventional method, but 3 layers of network hierarchical structure is too shallow, It is difficult to obtain the feature of image profound level.Later, Dong etc. proposes quick super-resolution convolutional neural networks algorithm (Fast Super-Resolution using Convolutional Neural Network, FSRCNN), the algorithm is to SRCNN algorithm It is improved, 3 layers of convolutional neural networks structure is deepened to 8 layers, while using deconvolution instead of bicubic interpolation Low-resolution image is up-sampled, achieves effect more better than SRCNN, but 8 layer network structures are still shallower, reconstruct Effect it is limited.Kim proposes to be based on depth recurrent neural network (Deeply-Recursive Convolutional Network For image super resolution, DRCN) super-resolution algorithms, compare the lesser local receptor field of SRCNN, DRCN Algorithm further utilizes more neighborhood territory pixels by increasing local receptor field size, while the algorithm uses recurrent neural net Network reduces excessive network parameter, achieves preferable effect.Generation in deep learning is fought network by Ledig etc. (Generative Adversarial Network, GAN) is applied in the super-resolution rebuilding of image, proposes based on generation Fight network super-resolution (Super-Resolution using a Generative Adversarial Network, SRGAN) algorithm,

Low resolution picture sample is input to the study of generator network training, Lai Shengcheng high-resolution pictures by the algorithm, then is used Arbiter network distinguishes that the high-resolution pictures of its input are the high-resolution pictures from original true picture or generation, When arbiter can not identify the true and false of picture, illustrate that generator network has generated the high-resolution pictures of high quality.It is real It tests the result shows that the picture effect that SRGAN algorithm generates is visually more true to nature compared to other deep learning methods.By face figure After picture is rebuild, monitoring personnel can be more clearly seen more minutias, mention for the recognition of face under complex scene Supply that there is apparent, visual sense feeling better image.

1.3, Research on Face Recognition Technology status:

The face recognition technology of comparative maturity is roughly divided into two classes at this stage, and one kind is after deep learning emerges based on deep learning Face recognition technology, it is another kind of, be traditional face recognition technology.It is wherein more famous to have: to be existed by Y.Sun et al. It proposes within 2014, DeepID algorithm has 99% recognition accuracy on LFW data set.There are also the faces based on traditional technology Recognizer, as opencv provide Eigenfaces, Fisherfaces, LocalBinary, PatternsHistogramsCLBP, although these methods can complete recognition of face, all more complicated, operation efficiency It is low.In October, 2017, professor Hinton was put forward for the first time capsule neural network (CapsuleNetwork, CAPSNET), this novel The deep learning network architecture is to develop on the basis of convolutional neural networks (CNN).After this, Chen will be routed through Journey is embedded into optimization process together with the every other parameter in neural network, overcomes and manually finds lacking for optimal routing Point.Mohammad proposes a kind of spectrum capsule network, and this method measures what lower layer's capsule in one-dimensional linear subspace was voted Consistency, this method demonstrate the stability and convergence rate of capsule neural network.However these methods all only considered space Structural information limits the performance of capsule network.Recently, Capsnet has been introduced into more fields.Jiswawl is proposed CapsuleGan, this is one and capsule network is used to replace frame of the Standard convolution neural network as discriminator. Parnian proposes the Capsnet for staging.James describes a kind of new variant capsule neural network, this is It is a kind of can be with the neomorph of learning tasks.Arya, which is proposed, accelerates Capsnet by the consistent Dynamic routing mechanisms of application, a kind of The quickly Capsnet of screening lung cancer.

Summary of the invention

To solve the problems, such as background technique；The purpose of the present invention is to provide a kind of complex scene people based on deep learning Face identifying system.

A kind of complex scene face identification system based on deep learning of the invention, its recognition methods are as follows:

Step 1: the basis based on Faster-RCNN network improves and does Face datection:

It is further on the basis of Faster R-CNN to be improved；First improves last loss function, damages at the center of addition Positive and negative sample proportion remains as 1:1 after mistake, therefore is limited to 1:1 to the ratio of negative sample positive in each mini batch, and The maximum N number of sample addition of penalty values is selected from positive negative sample every time to train next time；Second improvement is in the training stage Picture can pass through the scaling of different scale；

Super-resolution rebuilding is done Step 2: improving based on SRGAN network:

Super-resolution rebuilding is carried out to input picture using the variant form of SRGAN, is taken on the basis of original SRGAN arbiter On further increase the quantity of characteristic pattern, quick connection is added on the basis of legacy network；Since too deep network is difficult to instruct Practice, therefore the residual error number of blocks of generator part doubles；Network will be generated and fight the loss function of network separately, wherein Differentiate that the loss function of network is divided into two parts, a part is by the input arbiter judgement of true high-resolution pictures training data The cross entropy error of result and its true value afterwards；Another part inputs the height generated in generator network by low resolution picture Resolution chart is input to the cross entropy of arbiter again；And the whole loss function of arbiter is exactly being superimposed by this two parts 's；Two parts loss function, is then merged together by the loss function for calculating generator part, carries out loss function design；

Recognition of face is done Step 3: improving based on CAPSNET network:

Capsule network base units are Capsule, and the network architecture is divided into 6 layers, and first layer is convolutional layer, and the second layer is main capsule Layer, third layer are digital capsule layer, and the 4th layer to layer 6 is respectively 3 full articulamentums；Encoder can be divided by this 6 layers again And decoder；Wherein three first layers are encoders, and latter three layers are decoders；In order to allow capsule neural network to be more suitable for identifying face, Therefore a kind of multiple dimensioned capsule neural network is proposed on the basis of original Capsnet, multiple dimensioned capsule neural network is adopted With six layer network structures, first layer is Standard convolution layer, and the second layer is multiple dimensioned capsule coding unit, and third layer is digital capsule Layer, the 4th to layer 6 is full articulamentum.

Compared with prior art, the invention has the benefit that

One, it is improved based on SRGAN network, super-resolution rebuilding is carried out to the collected picture of institute, is conducive to recognition of face；

Two, resolution is improved, and easy to operate, the time can be saved, and be capable of the face of accurate detection of complex scene.

Specific embodiment

Present embodiment uses following technical scheme:

One, the basis based on Faster-RCNN network, which improves, does Face datection:

It is further on the basis of Faster R-CNN to be improved.Because improvement network it is good with it is bad largely with The loss function finally defined has very big relationship, so detection accuracy is higher in order to obtain face, the improvement of first aspect from Last loss function is started with, and wants to increase center loss (center loss) on the basis of softmax loss, in order to add Positive and negative sample proportion remains as 1:1 after entering center loss, therefore is limited to the ratio of negative sample positive in each mini batch 1:1, and select the maximum N number of sample addition of penalty values from positive negative sample every time and train next time.Because under actual conditions Face datection environment may be very complicated, face it is possible that overlapping and it is too small situations such as, therefore in order to make under complex scene Face is easy to detect, therefore second taken corrective measure is to pass through the scaling of different scale in training stage picture.

Two, it is improved based on SRGAN network and does super-resolution rebuilding:

Super-resolution rebuilding is carried out to input picture using the variant form of SRGAN, the structure of neural network, width and depth with And loss function is an important factor for influencing network performance.In order to reduce the appearance of former SRGAN network chessboard effect, can be used Layer is up-sampled to weaken the appearance of chessboard effect as far as possible.In terms of network-wide, because the video memory of general video card is limited, because This takes the quantity that characteristic pattern is further increased on the basis of original SRGAN arbiter, but due to excessive characteristic pattern number Amount can make original unstable networks, therefore in order to keep network more stable, it is easier to which training is added on the basis of legacy network Quick connection.Since too deep network is difficult to train, the residual error number of blocks of generator part can be doubled.In order to make Improved SRGAN network losses function is smaller, improved to loss function as follows: will generate network and fights the loss letter of network Number separates, wherein differentiates that the loss function of network is divided into two parts, and a part is defeated by true high-resolution pictures training data The cross entropy error of result and its true value after entering arbiter judgement.Another part inputs generator net by low resolution picture The high-resolution pictures generated in network are input to the cross entropy of arbiter again.And the whole loss function of arbiter be exactly by this two Partial stack and come.Two parts loss function, is then merged together, is damaged by the loss function for calculating generator part Lose function design.

Three, it is improved based on CAPSNET network and does recognition of face:

Capsule network base units are Capsule, and the network architecture is divided into 6 layers, and first layer is convolutional layer, and the second layer is main capsule (Primary Caps) layer, third layer are digital capsule (Digit Caps) layer, and the 4th layer to layer 6 is respectively 3 full connections Layer.Due to having similitude with autocoder, and encoder and decoder can be divided by this 6 layers.Wherein three first layers are Encoder, latter three layers are decoders.In order to allow capsule neural network to be more suitable for identifying face, therefore original Capsnet's On the basis of propose a kind of multiple dimensioned capsule neural network, multiple dimensioned capsule neural network still remains capsule neural network Huge advantage, network is very shallow, but accuracy of identification is very high.Multiple dimensioned capsule neural network uses six layer network structures, first layer It is Standard convolution layer, the second layer is multiple dimensioned capsule coding unit, and third layer is digital capsule layer, and the 4th to layer 6 is Quan Lian Connect layer.

Experimental situation and platform:

Tensorflow using deep learning framework, using tf.contrib.learn, Tensorfilow-Slim, The advanced API such as TensorLayer, TFLearn is emulated.The computer behaviour that Tensorflow is based on is configured in experimental situation Make system and selects windows.In view of time required for training and computer configuration are to the last training parameter shadow of neural network Sound is huge.Therefore it is NVIDIA GTX 2080TI, memory 32G that I has purchased CPU model I9 a 9900K, GPU at one's own expense Calculating.

It is obvious to a person skilled in the art that invention is not limited to the details of the above exemplary embodiments, Er Qie In the case where without departing substantially from spirit or essential attributes of the invention, the present invention can be realized in other specific forms.Therefore, no matter From the point of view of which point, the present embodiments are to be considered as illustrative and not restrictive, and the scope of the present invention is by appended power Benefit requires rather than above description limits, it is intended that all by what is fallen within the meaning and scope of the equivalent elements of the claims Variation is included within the present invention.

In addition, it should be understood that although this specification is described in terms of embodiments, but not each embodiment is only wrapped Containing an independent technical solution, this description of the specification is merely for the sake of clarity, and those skilled in the art should It considers the specification as a whole, the technical solutions in the various embodiments may also be suitably combined, forms those skilled in the art The other embodiments being understood that.

Claims

1. a kind of complex scene face identification system based on deep learning, it is characterised in that: its recognition methods is as follows:

Super-resolution rebuilding is done Step 2: improving based on SRGAN network:

Recognition of face is done Step 3: improving based on CAPSNET network: