CN104346607A

CN104346607A - Face recognition method based on convolutional neural network

Info

Publication number: CN104346607A
Application number: CN201410620574.9A
Authority: CN
Inventors: 胡静
Original assignee: Shanghai Dianji University
Current assignee: Shanghai Dianji University
Priority date: 2014-11-06
Filing date: 2014-11-06
Publication date: 2015-02-11
Anticipated expiration: 2034-11-06
Also published as: CN104346607B

Abstract

The invention provides a face recognition method based on a convolutional neural network, comprising: performing necessary preprocessing on the face image to obtain an ideal face image; selecting the ideal face image as the input of the convolutional neural network Enter U ₀ , the output of U ₀ enters U _G , and the output of U _G is used as the input of U _S1 ; the S neuron of U _S1 extracts edge components in different directions in the input image through supervised training as the first feature extraction and Output to the input of special U _C1 ; the output of U _C1 is used as the input of U _S2 , and U _S2 completes the second feature extraction and is used as the input of U _C2 ; the output of U _C2 is used as the input of U _S3 , and U _S3 completes the third feature extraction The feature extraction of U _C3 is used as the input of U C3; the output of U _C3 is used as the input of U _S4 , and U _S4 obtains the weight, threshold and neuron cell plane number of each layer through supervised competitive learning and is used as the input of U _C4 ; As the output layer of the network, _C4 outputs the final pattern recognition result of the network determined by the maximum output result of U _S4 . The invention can improve the face recognition rate in complex scenes.

Description

Based on the face identification method of convolutional neural networks

Technical field

The present invention relates to a kind of face identification method based on convolutional neural networks.

Background technology

Face recognition technology utilizes Computer Analysis facial image, extracts effective characteristic information, identifies the technology of personal identification.First it judge to there is face in image? if existed, determine position, the size information of often opening face further.And extract further according to these information and often open pattern feature potential in face, the face in itself and known face database is contrasted, thus identifies the classification information of often opening face.Wherein, judge that the process that whether there is face in piece image is exactly Face datection, the process image after extraction feature and known face database contrasted is exactly recognition of face.

Researcher achieved a large amount of achievement in Face datection and recognition of face in recent years, and detection perform and recognition performance all improve a lot.In recent years, a large amount of Face datection algorithms was suggested, and these algorithms roughly can be divided into 3 classes: (1) based on the method for features of skin colors, the method for (2) knowledge based model, the method for (3) Corpus--based Method theory.Wherein, artificial neural network (ANN) method is by training network structure, the statistical property of pattern is lain among network structure and parameter, for this kind of complexity of face, the pattern that is difficult to explicit description, method based on ANN has unique advantage, Rowiey employs the face that two-layer ANN detects multi-pose, ground floor is used for estimating the human face posture of input picture window, the second layer is three human-face detectors, is used for respectively detecting front face, half side-view face, side face.First one width input picture estimates its human face posture through human face posture detecting device, after carrying out corresponding pre-service, it can be used as three human-face detectors of the second layer, finally determines position and the attitude of face image.

Recognition of face is broadly divided into following several class methods: (l) is based on the method for geometric properties; (2) based on elastic model matching process; (3) neural net method; (4) based on the method for linear processes subspace.At present, the a lot of algorithms existed have good recognition effect to the simple facial image of scene, but in field of video monitoring, video image is by illumination, orientation, noise and the impact such as different faces and expression, even if current high performance face recognition algorithms still can not reach ideal recognition result in such a situa-tion.

The difficulty of recognition of face is embodied in following several aspect:

L the imaging angle of () camera, namely attitude all has a great impact most of face recognition algorithms, particularly based on the algorithm of simple geometry feature.Two facial images belonging to same person may cause the similarity of these two images to belong to the images of different people not as good as two due to the impact of attitude.

(2) change of illumination can change facial image half-tone information, very large for the impact of some recognizers based on gray feature.

(3) change of expressing one's feelings also can cause the decline of recognition performance.

(4) facial image also may be subject to the age, blocks and the impact of the factor such as facial image yardstick, can in the performance affecting face recognition algorithms in varying degrees.

Convolutional neural networks is developed recently, and causes a kind of efficient identification method extensively paid attention to.The sixties in 20th century, when studying the neuron for local sensitivity and set direction in cat cortex, Hubel and Wiesel finds that the network structure of its uniqueness can reduce the complicacy of Feedback Neural Network effectively, then propose convolutional neural networks (Convolutional Neural Networks-is called for short CNN).Now, CNN has become one of study hotspot of numerous scientific domain, particularly in pattern classification field, because this network avoids the complicated pre-service in early stage to image, directly can input original image, thus obtain and apply more widely.K.Fukushima is that first of convolutional neural networks realizes network at the new cognitron that 1980 propose.Subsequently, more researcher improves this network.Wherein, representative achievement in research is that Alexander and Taylor proposes " improvement cognitron ", and the method combines various advantage of improving one's methods and avoids error back propagation consuming time.

Usually, the basic structure of CNN comprises two layers, and one is feature extraction layer, and each neuronic input is connected with the local acceptance domain of front one deck, and extracts the feature of this local.Once after this local feature is extracted, the position relationship between it and other features is also decided thereupon; It two is Feature Mapping layers, and each computation layer of network is made up of multiple Feature Mapping, and each Feature Mapping is a plane, and in plane, all neuronic weights are equal.Feature Mapping structure adopts sigmoid function that influence function core is little as the activation function of convolutional network, makes Feature Mapping have shift invariant.In addition, because the neuron on a mapping face shares weights, the number of freedom of network parameter is thus decreased.Each convolutional layer in convolutional neural networks is used for asking the computation layer of local average and second extraction followed by one, and this distinctive twice feature extraction structure reduces feature resolution.

CNN is mainly used to the X-Y scheme identifying displacement, convergent-divergent and other form distortion unchangeability.Because the feature detection layer of CNN is learnt by training data, so when using CNN, avoiding the feature extraction of display, and implicitly learning from training data; Moreover due to the neuron weights on same Feature Mapping face identical, so network can collateral learning, this is also that convolutional network is connected with each other relative to neuron a large advantage of network.Convolutional neural networks has unique superiority with the special construction that its local weight is shared in speech recognition and image procossing, its layout is closer to the biological neural network of reality, weights share the complicacy reducing network, and particularly the image of multidimensional input vector directly can input the complexity that this feature of network avoids data reconstruction in feature extraction and assorting process.

Here is the principles illustrated that convolutional neural networks carries out that face is identification:

As shown in Figure 1, complete complex scene human face automatic recognition system mainly comprises scene image collection and pre-service, Face datection and location, face characteristic extract and the several module of recognition of face.

In Fig. 1, scene image collection and pretreatment module to dynamic acquisition to image process, to overcome noise, improve recognition effect, mainly comprise image enhaucament with filtering noise, correct uneven illumination, strengthen contrast and make complex scene image have certain differentiability; Face datection and locating module are in the image of dynamic acquisition, be automatically found the position that will identify face, common method comprises the Face detection algorithm based on complexion model, the Face detection algorithm of Corpus--based Method model, the Face detection algorithm of feature based model; It is the work needing to do after Face detection that face characteristic extracts, and conventional method comprises the feature extraction based on Euclidean distance, based on the feature extraction of KL conversion; Based on the feature extraction of SVD, based on the feature extraction etc. of ICA; Last module is face recognition module, completes the identification to each facial image, and common method mainly comprises two large classes; One class is still image identification, and another kind of is dynamic image identification.In these two large class methods, artificial neural network (ANN) method is by training network structure, the statistical property of pattern is lain among network structure and parameter, for this kind of complexity of face, the pattern that is difficult to explicit description, the method based on ANN has unique advantage.What the present invention adopted is exactly neural network recognization method.

According to whether carrying out feature extraction, Neural Network for Face Recognition system can be divided into two large classes: have the recognition system of characteristic extraction part and the recognition system without characteristic extraction part.The former is actually the combination of classic method and neural network method technology, the experience that this method can make full use of people comes obtaining mode feature and neural network classification ability to identify face, feature extraction must can react the feature of whole face, just can reach higher discrimination; Latter saves feature extraction, using whole facial image directly as the input of neural network.Although this mode adds the complexity of neural network structure to a certain extent, comparatively the former improves a lot for the interference free performance of network and discrimination.The CNN that will adopt in the present invention just belongs to Equations of The Second Kind neural network.

Summary of the invention

The object of the present invention is to provide a kind of face identification method based on convolutional neural networks, can face identification rate and anti-interference be improved.

For solving the problem, the invention provides a kind of face identification method based on convolutional neural networks, comprising:

Step one, carries out necessary pre-service in early stage to facial image, obtains desirable facial image;

Step 2, chooses desirable facial image and enters input layer U as the input of convolutional neural networks ₀, input layer U ₀output enter difference extract layer U _g, U _gthe output of layer is as the ground floor U of feature extraction layer S _s1input;

Step 3, ground floor U _s1s neuron by Training, the marginal element extracting different directions in input picture is as primary feature extraction and export the ground floor U of Feature Mapping layer C to _c1input, wherein, described Feature Mapping layer C is the nervous layer be made up of complicated neuron, and it is fixing that the input of Feature Mapping layer C connects, and can not revise, each Feature Mapping is a plane, and in plane, all neuronic weights are equal;

Step 4, ground floor U _c1output as the second layer U of feature extraction layer S _s2input, second layer U _s2complete secondary feature extraction and as the second layer U of Feature Mapping layer C _c2input;

Step 5, the second layer U of Feature Mapping layer C _c2output as the third layer U of feature extraction layer S _s3input, third layer U _s3complete third time feature extraction and as the third layer U of Feature Mapping layer C _c3input;

Step 6, the third layer U of Feature Mapping layer C _c3output as the 4th layer of U of feature extraction layer S _s4input, the 4th layer of U _s4the weights of each layer, threshold value and neuronal cell number of planes is obtained and as the 4th layer of U of Feature Mapping layer C by supervising the mode of competitive learning _c4input;

Step 7, the 4th layer of U _c4as the output layer of network, export by the 4th layer of U _s4the final pattern recognition result of the network that determines of output maximum result.

Further, in the above-mentioned methods, described pre-service in early stage comprises the pre-service of location and segmentation.

Further, in the above-mentioned methods, difference extract layer U in step 2 _goutput be shown below:

u_{G} (n, k) = \max {[{(- 1)}^{k} \underset{| v | < A_{G}}{Σ} a_{G} (v) \cdot u_{0} (n + v)], 0} (k = 1,2),

In formula, U ₀the output of one deck before representative, the neuron of n representative input, v represents appointed area, v summation representative is contained to the neuron summation of appointed area, a _g(ξ) be the intensity that neuron connects, difference extract layer U _ghave 2 neuron planes, as k=2, for strengthening central nervous unit, during k=1, central nervous unit is restrained in representative, A _gthe radius of v, U _gnamely each neuronic all input connection of layer must meet a constraint condition

\underset{| v | < A_{G}}{Σ} a_{G} (v) = 0 .

Further, in the above-mentioned methods, in step 3 to six, in feature extraction layer S, the neuronic response function of every one deck S is shown below:

u_{sl} (n, k) = \frac{θ_{l}}{1 - θ_{l}} \max {[\frac{1 + Σ_{κ = 1}^{K_{Cl - 1}} \underset{| v | < A_{Sl}}{Σ} a_{Sl} (v, κ, k) \cdot u_{Cl - 1} (n + v, κ)}{1 + θ_{l} \cdot v_{l} (n)} - 1], 0}

In formula, a _sl(v, κ, k) (>=0) is last layer Feature Mapping layer C neuron u _cl-1(n+v, κ), to this layer of neuronic contiguous function of S, it is identical that all neuronic input of same neuron plane connects, θ _lthe neuronic threshold value of l layer S, A _slthe radius of v, as l=1, u _cl-1(n, κ) is u _g(n, k), now, K _cl-1=2.

Further, in the above-mentioned methods, in step 3 to six, except U _c4outside layer, its excess-three layer U of Feature Mapping layer C _c1, U _c2and U _c3c neuron response function be shown below:

u_{cl} (n, k) = \frac{\max {\underset{| v | < A_{Cl}}{Σ} a_{Cl} (v) \cdot u_{Sl} (n + v, k), 0}}{1 + \max {\underset{| v | < A_{Cl}}{Σ} a_{Cl} (v) \cdot u_{Sl} (n + v, k), 0}}

In formula, a _clv () is the input of C layer.

Compared with prior art, tool of the present invention has the following advantages:

(1) network is made to have higher distortion tolerance when identifying to input amendment by feature extraction structure in layer;

(2) convolutional neural networks is by avoiding explicitly characteristic extraction procedure, implicitly obtaining from training sample constructing the larger feature of training sample space contribution, having higher discrimination and anti-interference compared with legacy network;

(3) adopt different neurons and the array configuration of learning rules, further increase the recognition capability of network;

(4) by the study to the facial image under desirable pretreatment condition, optimize the weighting parameter of each layer in network system, substantially increase the face identification rate in complex scene.Experimental result shows the traditional recognition method such as the method is obviously better than Structure Method, template matching method.

Accompanying drawing explanation

Fig. 1 is existing human face automatic identifying method block diagram;

Fig. 2 is the neural network network structure of one embodiment of the invention.

Embodiment

For enabling above-mentioned purpose of the present invention, feature and advantage become apparent more, and below in conjunction with the drawings and specific embodiments, the present invention is further detailed explanation.

As shown in Figure 2, the invention provides a kind of face identification method based on convolutional neural networks, comprising:

Step S1, carries out necessary pre-service in early stage to facial image, obtains desirable facial image; Concrete, described pre-service in early stage comprises location, the pre-service of segmentation etc.;

Step S2, chooses desirable facial image and enters input layer U as the input of convolutional neural networks ₀, input layer U ₀output enter difference extract layer U _g, U _gthe output of layer is as the ground floor U of feature extraction layer S _s1input; Concrete, in Fig. 2, feature extraction layer S is the nervous layer of simple (simple) neuron composition, completes feature extraction, and it is variable that its input connects, and constantly corrects in learning process; Concrete, difference extract layer U _goutput such as formula shown in (1):

u_{G} (n, k) = \max {[{(- 1)}^{k} \underset{| v | < A_{G}}{Σ} a_{G} (v) \cdot u_{0} (n + v)], 0} (k = 1,2) - - - (1)

In formula, U ₀the output of one deck before representative, the neuron of n representative input, v represents appointed area, v summation representative is contained to the neuron summation of appointed area, a _g(ξ) be the intensity that neuron connects, difference extract layer U _ghave 2 neuron planes, as k=2, for strengthening central nervous unit, during k=1, central nervous unit is restrained in representative, A _gthe radius of v, U _gnamely each neuronic all input of layer connects also must meet a constraint condition just can play the effect that difference is extracted;

Step S3, ground floor U _s1s neuron by Training, the marginal element extracting different directions in input picture is as primary feature extraction and export the ground floor U of Feature Mapping layer C to _c1input, wherein, described Feature Mapping layer C is the nervous layer be made up of complicated (complex) neuron, it is fixing that the input of Feature Mapping layer C connects, can not revise, display receptive field is energized the approximate change of position, and each Feature Mapping is a plane, and in plane, all neuronic weights are equal; Concrete, in feature extraction layer S, the neuronic response function of every one deck S is such as formula shown in (2)

u_{sl} (n, k) = \frac{θ_{l}}{1 - θ_{l}} \max {[\frac{1 + Σ_{κ = 1}^{K_{Cl - 1}} \underset{| v | < A_{Sl}}{Σ} a_{Sl} (v, κ, k) \cdot u_{Cl - 1} (n + v, κ)}{1 + θ_{l} \cdot v_{l} (n)} - 1], 0} - - - (2)

In formula, a _sl(v, κ, k) (>=0) is last layer Feature Mapping layer C neuron u _cl-1(n+v, κ), to this layer of neuronic contiguous function of S, it is identical that all neuronic input of same neuron plane connects, θ _lthe neuronic threshold value of l layer S, A _slit is the radius of v.As l=1, u _cl-1(n, κ) is u _g(n, k), now, K _cl-1=2;

Step S4, ground floor U _c1output as the second layer U of feature extraction layer S _s2input, second layer U _s2complete secondary feature extraction and as the second layer U of Feature Mapping layer C _c2input;

Step S5, the second layer U of Feature Mapping layer C _c2output as the third layer U of feature extraction layer S _s3input, third layer U _s3complete third time feature extraction and as the third layer U of Feature Mapping layer C _c3input;

Step S6, the third layer U of Feature Mapping layer C _c3output as the 4th layer of U of feature extraction layer S _s4input, the 4th layer of U _s4the weights of each layer, threshold value and neuronal cell number of planes is obtained and as the 4th layer of U of Feature Mapping layer C by supervising the mode of competitive learning _c4input;

Step S7, the 4th layer of U _c4as output layer and the identification layer of network, export by the 4th layer of U _s4the final pattern recognition result of the network that determines of output maximum result.Concrete, in network, last one deck of Feature Mapping layer C is identification layer, provides the result of pattern-recognition.Through study, network automatically can identify input pattern, and not by the distortion of input picture, the impact of convergent-divergent and displacement.

Preferably, except U _c4outside layer, its excess-three layer U of Feature Mapping layer C _c1, U _c2and U _c3c neuron response function such as formula shown in (3):

u_{cl} (n, k) = \frac{\max {\underset{| v | < A_{Cl}}{Σ} a_{Cl} (v) \cdot u_{Sl} (n + v, k), 0}}{1 + \max {\underset{| v | < A_{Cl}}{Σ} a_{Cl} (v) \cdot u_{Sl} (n + v, k), 0}} - - - (4)

In formula, a _clv () is the input of C layer.

As can be seen from Figure 2, network is by input layer U ₀, difference extract layer U _g, 4 groups of S layers and 4 layers of C layers composition, main flow figure is as follows: U ₀→ U _g→ U _s1→ U _c1→ U _s2→ U _c2→ U _s3→ U _c3→ U _s4→ U _c4.Wherein difference extract layer U _gcorresponding to the centrocyte in retina, by adding strong center receptive field neuron plane and suppress central nervous unit plane two parts to form, U _gthe output of layer is as the ground floor U of feature extraction layer S _s1input; U _s1s neuron in layer is by Training, and extract the marginal element of different directions in input picture, its output is as U _c1input; The second layer U of feature extraction layer S _s2with third layer U _s3neuron be without supervision competitive learning self-organization neuron; U _s4layer correctly identifies all samples by the training supervising competitive learning; U _c4layer is output layer and the identification layer of network, the pattern recognition result that display network is final.

The present invention is by the sample learning to facial image, the weighting parameter of optimization neural network every layer, thus improve the discrimination of complex scene human face to a great extent, feature extraction structure repeatedly makes network have higher antijamming capability, here complex scene mainly refers to that facial image is subject to illumination, expression, the scene of attitude factor impact, there is good fault-tolerant ability, parallel processing capability and self-learning capability, can processing environment information complicated, background knowledge is unclear, problem in the indefinite situation of inference rule, sample is allowed to have larger defect, distortion, travelling speed is fast, the resolution that adaptive performance is good and higher, can be applicable to pattern-recognition, abnormality detection, the fields such as image procossing.

In this instructions, each embodiment adopts the mode of going forward one by one to describe, and what each embodiment stressed is the difference with other embodiments, between each embodiment identical similar portion mutually see.For system disclosed in embodiment, owing to corresponding to the method disclosed in Example, so description is fairly simple, relevant part illustrates see method part.

Professional can also recognize further, in conjunction with unit and the algorithm steps of each example of embodiment disclosed herein description, can realize with electronic hardware, computer software or the combination of the two, in order to the interchangeability of hardware and software is clearly described, generally describe composition and the step of each example in the above description according to function.These functions perform with hardware or software mode actually, depend on application-specific and the design constraint of technical scheme.Professional and technical personnel can use distinct methods to realize described function to each specifically should being used for, but this realization should not thought and exceeds scope of the present invention.

Obviously, those skilled in the art can carry out various change and modification to invention and not depart from the spirit and scope of the present invention.Like this, if these amendments of the present invention and modification belong within the scope of the claims in the present invention and equivalent technologies thereof, then the present invention is also intended to comprise these change and modification.

Claims

1. based on a face identification method for convolutional neural networks, it is characterized in that, comprising:

2. as claimed in claim 1 based on the face identification method of convolutional neural networks, it is characterized in that, described pre-service in early stage comprises the pre-service of location and segmentation.

3., as claimed in claim 1 based on the face identification method of convolutional neural networks, it is characterized in that, difference extract layer U in step 2 _goutput be shown below:

\begin{matrix} u_{G} (n, k) = \max {[{(- 1)}^{k} \underset{| v | < A_{G}}{Σ} a_{G} (v) \cdot u_{0} (n + v)], 0} & (k = 1,2), \end{matrix}

\underset{| v | < A_{G}}{Σ} a_{G} (v) = 0 .

4., as claimed in claim 1 based on the face identification method of convolutional neural networks, it is characterized in that, in step 3 to six, in feature extraction layer S, the neuronic response function of every one deck S is shown below:

u_{sl} (n, k) = \frac{θ_{l}}{1 - θ_{l}} \max {[\frac{1 + Σ_{κ = 1}^{K_{Cl - 1}} \underset{| v | < A_{Sl}}{Σ} a_{Sl} (v, κ, k) \cdot u_{Cl - 1} (n + v, κ)}{1 + θ_{l} \cdot v_{l} (n)} - 1], 0}

5., as claimed in claim 1 based on the face identification method of convolutional neural networks, it is characterized in that, in step 3 to six, except U _c4outside layer, its excess-three layer U of Feature Mapping layer C _c1, U _c2and U _c3c neuron response function be shown below:

u_{cl} (n, k) = \frac{\max {\underset{| v | < A_{Cl}}{Σ} a_{Cl} (v) \cdot u_{Sl} (n + v, k), 0}}{1 + \max {\underset{| v | < A_{Cl}}{Σ} a_{Cl} (v) \cdot u_{Sl} (n + v, k), 0}}

In formula, a _clv () is the input of C layer.