CN108090403A

CN108090403A - A kind of face dynamic identifying method and system based on 3D convolutional neural networks

Info

Publication number: CN108090403A
Application number: CN201611041473.1A
Authority: CN
Inventors: 巫立峰; 赵文忠
Original assignee: Yinchen Intelligent Identfiying Science & Technology Co Ltd Shanghai
Current assignee: Yinchen Intelligent Identfiying Science & Technology Co Ltd Shanghai
Priority date: 2016-11-22
Filing date: 2016-11-22
Publication date: 2018-05-29

Abstract

The present invention provides a kind of face dynamic identifying method and system based on 3D convolutional neural networks, and including extracting picture frame from video flowing, track human faces target obtains the corresponding face sequence of human face target；Face sequence is pre-processed, to obtain the face sequence for meeting preassigned；Pretreated face sequence inputting 3D convolutional neural networks are trained, the weights of each layer of 3D convolutional neural networks are updated, to obtain trained 3D convolutional neural networks；By the trained 3D convolutional neural networks of pretreated face sequence inputting, the face characteristic of extraction face sequence；Face characteristic with the feature templates of object library is compared, returns to the recognition of face information to match in object library with current face's feature.The face dynamic identifying method and system based on 3D convolutional neural networks of the present invention extracts face sequence inputting 3D convolutional neural networks from video, to learn the face characteristic in video, improves the precision of video human face identification.

Description

A kind of face dynamic identifying method and system based on 3D convolutional neural networks

Technical field

The present invention relates to a kind of face identification method and system, more particularly to a kind of people based on 3D convolutional neural networks Face dynamic identifying method and system.

Background technology

At present, video monitoring equipment has been widely used, and is dispersed throughout the public place in city.By identifying monitor video In personnel identity come find, track target also become a kind of mode for effectively improving work efficiency.Since monitoring device may It is distributed in various scenes, is influenced by factors such as light, angle, device resolutions, it is difficult to collect the face figure of high quality As carrying out authentication.

Since target face often occurs in the multiple image of video, same people in video multiple are made full use of Facial image becomes a feasible approach to improve accuracy of identification.Existing face recognition technology mainly includes following two：

First, the feature of traditional Manual definition, such as SIFT, HOG, Gabor characteristic are utilized.

However, the above method there are it is apparent the shortcomings that.First, the design of manual features is often appointed for specific classification Business, effective manual features are not necessarily effective in other tasks in some tasks；Secondly, even if being directed to specific task, Also it is difficult to design suitable feature to describe things, especially under the influence of extraneous factor variation.

2nd, using the method for deep learning, from substantial amounts of training data learning feature, convolutional neural networks are such as used. By the way of convolutional neural networks learning characteristic, solve the problems, such as that feature is difficult to design.But the side for passing through deep learning When method carries out dynamic human face identification, mainly by extracting face frame by frame to video data, high-quality facial image is selected, will be screened Human face photo out is corrected with being alignd, and then the photo with object library is compared one by one.This method is actually By Technology of Static Human Face Recognition be applied to dynamic scene, face compare one by one be lost each facial image of face interior sequences it Between relevant information.In practical applications, since monitoring objective need not carry out the cooperation of posture, expression, and typically exist In operation, the face collected often has drift angle, motion blur etc.；Further, since monitoring device is possibly mounted at various environment Under, light condition is frequently not fine.In these cases, select suitable human face photo that this is compared from face sequence Come it is just relatively difficult, it is also relatively difficult by merging to obtain final result from each comparison result.

The content of the invention

In view of the foregoing deficiencies of prior art, it is an object of the invention to provide one kind to be based on 3D convolutional neural networks Face dynamic identifying method and system, from video extract face sequence inputting 3D convolutional neural networks, to learn in video Face characteristic, so as to improve video human face identification precision.

In order to achieve the above objects and other related objects, the present invention provides a kind of face based on 3D convolutional neural networks and moves State recognition methods, comprises the following steps：Picture frame is extracted from video flowing, it is corresponding to obtain human face target for track human faces target Face sequence；Face sequence is pre-processed, to obtain the face sequence for meeting preassigned；By pretreated face sequence Row input 3D convolutional neural networks are trained, and the weights of each layer of 3D convolutional neural networks are updated, to obtain trained 3D volumes Product neutral net；By the trained 3D convolutional neural networks of pretreated face sequence inputting, the face of extraction face sequence Feature；Face characteristic with the feature templates of object library is compared, returns to what is matched in object library with current face's feature Recognition of face information.

In one embodiment of the invention, the picture frame extracted from video flowing is the key frame of video flowing.

In one embodiment of the invention, the pretreatment includes the screening of face sequence, the equalization of image, image are returned One change, face correction, one kind in image scaling or combination；The preassigned include size, facial angle, picture luminance, One kind or combination in clarity.

In one embodiment of the invention, pretreated face sequence inputting 3D convolutional neural networks are trained, more The weights of each layer of new 3D convolutional neural networks, are comprised the following steps with obtaining trained 3D convolutional neural networks：

Several pretreated face sequence images are obtained according to frame sequence；

Using m images as one group, as m≤n, image F is selected respectively by frame sequence_k~F_k+m-1As one group, common n-m+1 Group, wherein, k=0,1,2 ..., n-m；As m ＞ n, all n photos are selected as one group；

The pixel of each image in each group image is read, it is w × h × m to stack structure size according to frame sequence for every group of image Voxel matrix；Wherein, w is the wide pixel of image, and h is the high pixel of image；As m ＞ n, it is repeated in stacking n figures As the matrix until filling up m dimensions；

Calculate pixel average of the acquired all w × h × m voxels matrix on each coordinate position；

3D convolutional Neurals are inputted after pixel on w × h × each coordinate position of m voxel matrixes is subtracted pixel average Network updates the weights of each layer of 3D convolutional neural networks, to obtain trained 3D convolutional neural networks.

In one embodiment of the invention, the weights of each layer of 3D convolutional neural networks are updated, to obtain trained 3D convolution Neutral net comprises the following steps：

A) randomly select one subtract pixel average after w × h × m voxel matrixes, m gray-scale map of extraction, x and y The gradient map and x and y directions time interval in direction are 1 light stream figure, generate the data block D of w × h × c0₀, wherein c0=m × 3 +(m-1)×2；

B) convolution kernel of n1 k1 × k1 × m1 is used, in data block D₀M1 adjacency channel on carry out 3D convolution, it is raw Into the data block D of n1 groups w1 × h1 × c1₁₁,D₁₂,...,D_1n1, wherein w1, h1 represent respectively the characteristic pattern after convolution width and Height, c1=(m-m1+1) × 3+ (m-m1) × 2, n1, k1, m1 are custom parameter；

C) using the core of k2 × k2 respectively to D₁₁,D₁₂,...,D_1n12D ponds are carried out, each passage adopt Sample, the data block D of generation n1 groups w2 × h2 × c1₂₁,D₂₂,...,D_2n1, wherein w2, h2 is respectively the width of the characteristic pattern behind pond And height, k2 are custom parameter；

D) convolution kernel of n3 k3 × k3 × m3 is used, respectively in D₂₁,D₂₂,...,D_2n1M3 adjacency channel on carry out 3D convolution, the data block D of generation n1 × n3 groups w3 × h3 × c3₃₁,D₃₂,...,D_3(n1xn3), wherein w3, h3 represent convolution respectively Wide and high, the c3=(m-m1-m3+2) × 3+ (m-m1-m3+1) × 2 of characteristic pattern afterwards, n3, k3, m3 are custom parameter；

E) using the core of k4 × k4 respectively to D₃₁,D₃₂,...,D_3(n1xn3)2D ponds are carried out, each passage is carried out down Sampling, the data block D of generation n1 × n3 groups w4 × h4 × c3₄₁,D₄₂,...,D_4(n1xn3), after wherein w4, h4 are respectively pond Characteristic pattern it is wide and high；

F) using the convolution kernel of w4 × h4 respectively in D₄₁,D₄₂,...,D_4(n1xn3)Each passage on carry out convolution, generate The vectorial V6 that one length is n1 × n3 × c3；

G) vectorial V6 is inputted into a full articulamentum for including n7 Hidden unit, exports the vector that a length is n7 V7, n7 are custom parameter；

H) the output result of full articulamentum is calculated to the damage of current 3D convolutional neural networks using softmax loss functions It loses, and the backpropagation for carrying out gradient will be lost；

I) update a)-g) each layer weights；

J) iteration step a)-i) until 3D convolutional neural networks are restrained, to obtain trained 3D convolutional Neurals net Network.

Meanwhile the present invention also provides a kind of face dynamic recognition system based on 3D convolutional neural networks, including face with Track module, face sequence preprocessing module, 3D convolutional neural networks training module, face characteristic extraction module and face alignment mould Block；

For the face tracking module for extracting picture frame from video flowing, track human faces target obtains human face target pair The face sequence answered；

The face sequence preprocessing module is for pre-processing face sequence, to obtain the people for meeting preassigned Face sequence；

The 3D convolutional neural networks training module is used for pretreated face sequence inputting 3D convolutional neural networks It is trained, the weights of each layer of 3D convolutional neural networks is updated, to obtain trained 3D convolutional neural networks；

The face characteristic extraction module is used for the trained 3D convolutional Neurals net of pretreated face sequence inputting Network extracts the face characteristic of face sequence；

The face alignment module is returned for face characteristic to be compared with the feature templates of object library in object library The recognition of face information to match with current face's feature.

In one embodiment of the invention, in the face tracking module, the picture frame extracted from video flowing is video flowing Key frame.

In one embodiment of the invention, in the face sequence preprocessing module, the pretreatment includes face sequence Screening, the equalization of image, the normalization of image, face correction, one kind in image scaling or combination；The preassigned bag Include one kind in size, facial angle, picture luminance, clarity or combination.

In one embodiment of the invention, the 3D convolutional neural networks training module performs following operation：

Pretreated face sequence inputting 3D convolutional neural networks are trained, update each layer of 3D convolutional neural networks Weights, comprised the following steps with obtaining trained 3D convolutional neural networks：

I) update a)-g) each layer weights；

As described above, the face dynamic identifying method and system based on 3D convolutional neural networks of the present invention, has following Advantageous effect：

(1) based on the study to face Time-space serial, the accuracy of identification of video human face is improved；

(2) movement, posture, light, angle change are can adapt to, improves the robustness of dynamic human face identification；

(3) result for simplifying dynamic human face comparison merges flow, only generates a comparison result.

Description of the drawings

Fig. 1 is shown as the flow chart of the face dynamic identifying method based on 3D convolutional neural networks of the present invention；

Fig. 2 is shown as the expanded schematic diagram of the 3D convolutional neural networks of the present invention in time；

Fig. 3 is shown as the structure diagram of the face dynamic recognition system based on 3D convolutional neural networks of the present invention.

Component label instructions

1 face tracking module

2 face sequence preprocessing modules

3 3D convolutional neural networks training modules

4 face characteristic extraction modules

5 face alignment modules

Specific embodiment

Illustrate embodiments of the present invention below by way of specific specific example, those skilled in the art can be by this specification Disclosed content understands other advantages and effect of the present invention easily.The present invention can also pass through in addition different specific realities The mode of applying is embodied or practiced, the various details in this specification can also be based on different viewpoints with application, without departing from Various modifications or alterations are carried out under the spirit of the present invention.

It should be noted that the diagram provided in the present embodiment only illustrates the basic conception of the present invention in a schematic way, Then in schema only display with it is of the invention in related component rather than component count, shape and size during according to actual implementation paint System, kenel, quantity and the ratio of each component can be a kind of random change during actual implementation, and its assembly layout kenel also may be used It can be increasingly complex.

1958, Hubel and Wisel passed through the research to cat visual cortex cell, it was found that one kind is referred to as " selecting in direction The neuronal cell of selecting property cell (Orientation Selective Cell) ".When pupil is found that the side of object at the moment Edge, and when this edge is directed toward some direction, this neuronal cell will enliven, and then propose receptive field The concept of (Receptive Field).This discovery excites further thinking of the people for nervous system.Scientists pair Human visual system deeper into the study found that the information processing of the vision system of people be classification, carried from rudimentary V1 areas Take edge feature, the shape in Zai Dao V2 areas or the part of target etc., then to higher, the behavior etc. of entire target, target.

1984, Japanese scholars Fukushima proposed neocognitron (Neocognitron) based on receptive field concept Artificial neural network, a visual pattern is resolved into many subpatterns by neocognitron, subsequently into Subsystem Based on Layered Structure Model stepwise phase Characteristic plane even is handled, so as to attempt vision system model, even if can have displacement or slight in object It can also complete to identify when deformation.

On the basis of neocognitron, further research and development has gone out convolutional neural networks (Convolutional Neural networks, CNNs).Convolutional neural networks are a kind of feedforward neural networks, its artificial neuron can respond one Surrounding cells in partial coverage.Convolutional neural networks are (corresponding by one or more convolutional layers and the full-mesh layer on top Classical neutral net) composition, while also include associated weights and pond layer.This structure enables convolutional neural networks sharp With the two-dimensional structure of input data.Convolutional neural networks are calculated on the different position of input matrix using shared weights. Similar, as a result of the mode of Multi-layer technology feature, after being trained with large-scale data, convolutional neural networks are being schemed As the fields such as identification, natural language recognition achieve prominent achievement.

3D convolutional neural networks (3D-CNNs) are the convolutional neural networks that convolution kernel is 3D.It is rolled up to two dimensional image During product operation, convolution kernel is also a two-dimensional matrix.The difference of 3D convolutional neural networks is, input is one and is stacked Two-dimensional image sequence, i.e., three-dimensional data rather than single two-dimensional image, meanwhile, convolution kernel is also three-dimensional.It is rolled up calculating During product, the corresponding element of two three-dimensional matrices is multiplied respectively sums again.

Therefore, in face dynamic identifying method and system of the invention based on 3D convolutional neural networks, due to model Input is a video human face sequence, while using three-dimensional convolution kernel, in the training of model, network can learn to same In one scene under different light, angle and posture same face relevant information, so as to effectively increase the accurate of recognition of face Degree.

With reference to Fig. 1, the face dynamic identifying method of the invention based on 3D convolutional neural networks comprises the following steps：

Step S1, picture frame is extracted from video flowing, track human faces target obtains the corresponding face sequence of human face target.

Preferably, the key frame of picture frame selecting video stream.The key frame can choose the I in the video of mpeg format Frame.Since I frames are without using motion compensation, therefore save complete scene image information.For GOP (Group of Pictures, picture group) bigger video, I frames and P frames can be used.

Wherein, the human face target tracked can be one or more faces.Acquired face sequence is in time Continuously, it is made of each facial image in picture frame.

Step S2, face sequence is pre-processed, to obtain the face sequence for meeting preassigned.

Specifically, pretreatment includes the screening of face sequence image, the equalization of image, the normalization of image, face are rectified Just, one kind in image scaling etc. or combination.Wherein, when carrying out the screening of face sequence image, according to the quality of facial image Such as light, angle screen out the facial image that picture quality is poor, cannot meet the requirements.

Preassigned includes size, facial angle, picture luminance, one kind in clarity or combination.

Step S3, pretreated face sequence inputting 3D convolutional neural networks are trained, update 3D convolutional Neurals The weights of each layer of network, to obtain trained 3D convolutional neural networks.

Specifically, step S3 comprises the following steps：

31) several pretreated face sequence images are obtained according to frame sequence.

Specifically, the image in pretreated n face sequences by frame sequence is arranged, is respectively labeled as F_i, i=0,1, 2,...,n-1；And distribute an identical label for the face sequence image of same person.

32) using m images as one group, as m≤n, image F is selected respectively by frame sequence_k~F_k+m-1As one group, common n-m+ 1 group, wherein, k=0,1,2 ..., n-m；As m ＞ n, all n photos are selected as one group.

33) pixel of each image in each group image is read, it is w × h × m to stack structure size according to frame sequence for every group of image Voxel matrix；Wherein, w is the wide pixel of image, and h is the high pixel of image；As m ＞ n, it is repeated in stacking n Matrix of the image until filling up m dimensions.

34) pixel average of the acquired all w × h × m voxels matrix on each coordinate position is calculated.

35) 3D convolution is inputted after the pixel on w × h × each coordinate position of m voxel matrixes being subtracted pixel average Neutral net updates the weights of each layer of 3D convolutional neural networks, to obtain trained 3D convolutional neural networks.

It should be noted that for subtracting the n-m+1 group voxel matrixes obtained after pixel average, in training 3D convolution During neutral net, a collection of voxel matrix is randomly selected every time and is trained.

As shown in Fig. 2, training 3D convolutional neural networks comprise the following steps：

A) randomly select one subtract pixel average after w × h × m voxel matrixes, m gray-scale map of extraction, x and y The gradient map and x and y directions time interval in direction are 1 light stream figure, generate the data block D of w × h × c0₀, wherein c0=m × 3 +(m-1)×2。

B) convolution kernel of n1 k1 × k1 × m1 is used, in data block D₀M1 adjacency channel on carry out 3D convolution, it is raw Into the data block D of n1 groups w1 × h1 × c1₁₁,D₁₂,...,D_1n1, wherein w1, h1 represent respectively the characteristic pattern after convolution width and Height, c1=(m-m1+1) × 3+ (m-m1) × 2, n1, k1, m1 are custom parameter.

C) using the core of k2 × k2 respectively to D₁₁,D₁₂,...,D_1n12D ponds are carried out, each passage adopt Sample, the data block D of generation n1 groups w2 × h2 × c1₂₁,D₂₂,...,D_2n1, wherein w2, h2 is respectively the width of the characteristic pattern behind pond And height, k2 are custom parameter.

D) convolution kernel of n3 k3 × k3 × m3 is used, respectively in D₂₁,D₂₂,...,D_2n1M3 adjacency channel on carry out 3D convolution, the data block D of generation n1 × n3 groups w3 × h3 × c3₃₁,D₃₂,...,D_3(n1xn3), wherein w3, h3 represent convolution respectively Wide and high, the c3=(m-m1-m3+2) × 3+ (m-m1-m3+1) × 2 of characteristic pattern afterwards, n3, k3, m3 are custom parameter.

E) using the core of k4 × k4 respectively to D₃₁,D₃₂,...,D_3(n1xn3)2D ponds are carried out, each passage is carried out down Sampling, the data block D of generation n1 × n3 groups w4 × h4 × c3₄₁,D₄₂,...,D_4(n1xn3), after wherein w4, h4 are respectively pond Characteristic pattern it is wide and high.

F) using the convolution kernel of w4 × h4 respectively in D₄₁,D₄₂,...,D_4(n1xn3)Each passage on carry out convolution, generate The vectorial V6 that one length is n1 × n3 × c3.

G) vectorial V6 is inputted into a full articulamentum for including n7 Hidden unit, exports the vector that a length is n7 V7, n7 are custom parameter.

H) the output result of full articulamentum is calculated to the damage of current 3D convolutional neural networks using softmax loss functions It loses, and the backpropagation for carrying out gradient will be lost.

I) update a)-g) each layer weights；

It should be noted that training 3D convolutional neural networks need repeatedly to carry out right value update, until model is restrained.

Assuming that w=60, h=40, m=7, then train the flow of 3D convolutional networks as follows：

60 × 40 × 7 three-dimensional data block is inputted into network, 7 gray-scale maps of extraction, the gradient map in x and y directions and x and y Direction time interval is 1 light stream figure, generates 60 × 40 × 33 data block；

Convolutional calculation is carried out on 3 three adjacent passages using the convolution kernels of 27 × 7 × 3,2 group 54 × 34 of generation × 23 data block；

Down-sampling is carried out to each passage using 2 × 2 window, obtains 2 group 27 × 17 × 23 of data block；

Convolutional calculation is carried out on 3 three adjacent passages using the convolution kernels of 37 × 7 × 3,6 group 21 × 12 of generation × 13 data block；

Down-sampling is carried out to each passage using 3 × 3 window, obtains 6 group 7 × 4 × 13 of data block；

Convolution operation is carried out to each passage using 7 × 4 2D convolution kernels, obtains the vector that length is 78；

Full articulamentum of the vector input comprising 128 Hidden units for being 78 by length, export a length be 128 to Amount；

By the output result of full articulamentum using the voxel matrix of softmax loss functions calculating 60 × 40 × 7 Loss, and the backpropagation for carrying out gradient will be lost；

Update the weights of above layers.

Step S4, by the trained 3D convolutional neural networks of pretreated face sequence inputting, face sequence is extracted Face characteristic.

Step S5, face characteristic with the feature templates of object library is compared, returned special with current face in object library Levy the recognition of face information to match.

Specifically, the feature templates to match with face characteristic are searched in object library, and then are returned and this feature template Corresponding face information, i.e. face recognition result.

With reference to Fig. 3, the face dynamic recognition system of the invention based on 3D convolutional neural networks includes the people being sequentially connected Face tracking module 1, face sequence preprocessing module 2,3D convolutional neural networks training module 3, face characteristic extraction module 4 and people Face comparing module 5.

For face tracking module 1 for extracting picture frame from video flowing, it is corresponding to obtain human face target for track human faces target Face sequence.

Face sequence preprocessing module 2 is for pre-processing face sequence, to obtain the face for meeting preassigned Sequence.

3D convolutional neural networks training module 3 is used to carry out pretreated face sequence inputting 3D convolutional neural networks Training updates the weights of each layer of 3D convolutional neural networks, to obtain trained 3D convolutional neural networks.

Specifically, 3D convolutional neural networks training module 3 performs following operation：

I) update a)-g) each layer weights；

Update the weights of above layers.

Face characteristic extraction module 4 is used for the trained 3D convolutional neural networks of pretreated face sequence inputting, Extract the face characteristic of face sequence.

Face alignment module 5 for face characteristic to be compared with the feature templates of object library, return in object library with The recognition of face information that current face's feature matches.

In conclusion the face dynamic identifying method and system based on 3D convolutional neural networks of the present invention is based on to face The study of Time-space serial improves the accuracy of identification of video human face；Movement, posture, light, angle change are can adapt to, is improved The robustness of dynamic human face identification；The result for simplifying dynamic human face comparison merges flow, only generates a comparison result.Institute With the present invention effectively overcomes various shortcoming of the prior art and has high industrial utilization.

The above-described embodiments merely illustrate the principles and effects of the present invention, and is not intended to limit the present invention.It is any ripe Know the personage of this technology all can carry out modifications and changes under the spirit and scope without prejudice to the present invention to above-described embodiment.Cause This, those of ordinary skill in the art is complete without departing from disclosed spirit and institute under technological thought such as Into all equivalent modifications or change, should by the present invention claim be covered.

Claims

1. a kind of face dynamic identifying method based on 3D convolutional neural networks, it is characterised in that：Comprise the following steps：

Picture frame is extracted from video flowing, track human faces target obtains the corresponding face sequence of human face target；

Face sequence is pre-processed, to obtain the face sequence for meeting preassigned；

Pretreated face sequence inputting 3D convolutional neural networks are trained, update the power of each layer of 3D convolutional neural networks Value, to obtain trained 3D convolutional neural networks；

By the trained 3D convolutional neural networks of pretreated face sequence inputting, the face characteristic of extraction face sequence；

Face characteristic with the feature templates of object library is compared, returns to the people to match in object library with current face's feature Face identification information.

2. the face dynamic identifying method according to claim 1 based on 3D convolutional neural networks, it is characterised in that：From regarding The picture frame that frequency extracts in flowing is the key frame of video flowing.

3. the face dynamic identifying method according to claim 1 based on 3D convolutional neural networks, it is characterised in that：It is described Pretreatment includes one kind in the screening of face sequence, the equalization of image, the normalization of image, face correction, image scaling Or combination；The preassigned includes size, facial angle, picture luminance, one kind in clarity or combination.

4. the face dynamic identifying method according to claim 1 based on 3D convolutional neural networks, it is characterised in that：It will be pre- Treated, and face sequence inputting 3D convolutional neural networks are trained, and update the weights of each layer of 3D convolutional neural networks, with Comprise the following steps to trained 3D convolutional neural networks：

Using m images as one group, as m≤n, image F is selected respectively by frame sequence_k~F_k+m-1As one group, common n-m+1 groups, In, k=0,1,2 ..., n-m；As m ＞ n, all n photos are selected as one group；

The pixel of each image in each group image is read, is the three-dimensional that every group of image stacks that structure size is w × h × m according to frame sequence Picture element matrix；Wherein, w is the wide pixel of image, and h is the high pixel of image；As m ＞ n, it is straight to be repeated in n images of stacking To the matrix for filling up m dimensions；

3D convolutional Neural nets are inputted after pixel on w × h × each coordinate position of m voxel matrixes is subtracted pixel average Network updates the weights of each layer of 3D convolutional neural networks, to obtain trained 3D convolutional neural networks.

5. the face dynamic identifying method according to claim 4 based on 3D convolutional neural networks, it is characterised in that：Update The weights of each layer of 3D convolutional neural networks are comprised the following steps with obtaining trained 3D convolutional neural networks：

A) randomly select one subtract pixel average after w × h × m voxel matrixes, extraction m gray-scale map, x and y directions Gradient map and x and y directions time interval be 1 light stream figure, generate w × h × c0 data block D₀, wherein c0=m × 3+ (m- 1)×2；

B) convolution kernel of n1 k1 × k1 × m1 is used, in data block D₀M1 adjacency channel on carry out 3D convolution, generate n1 groups The data block D of w1 × h1 × c1₁₁,D₁₂,...,D_1n1, wherein w1, h1 represent wide and high, the c1=of the characteristic pattern after convolution respectively (m-m1+1) × 3+ (m-m1) × 2, n1, k1, m1 are custom parameter；

C) using the core of k2 × k2 respectively to D₁₁,D₁₂,...,D_1n12D ponds are carried out, down-sampling, generation are carried out to each passage The data block D of n1 groups w2 × h2 × c1₂₁,D₂₂,...,D_2n1, wherein w2, h2 is respectively wide and high, the k2 of the characteristic pattern behind pond For custom parameter；

D) convolution kernel of n3 k3 × k3 × m3 is used, respectively in D₂₁,D₂₂,...,D_2n1M3 adjacency channel on carry out 3D volumes Product, the data block D of generation n1 × n3 groups w3 × h3 × c3₃₁,D₃₂,...,D_3(n1xn3), after wherein w3, h3 represent convolution respectively Wide and high, the c3=(m-m1-m3+2) × 3+ (m-m1-m3+1) × 2 of characteristic pattern, n3, k3, m3 are custom parameter；

E) using the core of k4 × k4 respectively to D₃₁,D₃₂,...,D_3(n1xn3)2D ponds are carried out, down-sampling is carried out to each passage, Generate the data block D of n1 × n3 groups w4 × h4 × c3₄₁,D₄₂,...,D_4(n1xn3), wherein w4, h4 is respectively the characteristic pattern behind pond It is wide and high；

F) using the convolution kernel of w4 × h4 respectively in D₄₁,D₄₂,...,D_4(n1xn3)Each passage on carry out convolution, generate one Length is the vectorial V6 of n1 × n3 × c3；

G) vectorial V6 is inputted into a full articulamentum for including n7 Hidden unit, exports vectorial V7, n7 that a length is n7 For custom parameter；

H) the output result of full articulamentum is calculated to the loss of current 3D convolutional neural networks using softmax loss functions, and The backpropagation for carrying out gradient will be lost；

I) update a)-g) each layer weights；

J) iteration step a)-i) until 3D convolutional neural networks are restrained, to obtain trained 3D convolutional neural networks.

6. a kind of face dynamic recognition system based on 3D convolutional neural networks, it is characterised in that：Including face tracking module, people Face sequence preprocessing module, 3D convolutional neural networks training module, face characteristic extraction module and face alignment module；

For the face tracking module for extracting picture frame from video flowing, it is corresponding to obtain human face target for track human faces target Face sequence；

The face sequence preprocessing module is for pre-processing face sequence, to obtain the face sequence for meeting preassigned Row；

The 3D convolutional neural networks training module is used to carry out pretreated face sequence inputting 3D convolutional neural networks Training updates the weights of each layer of 3D convolutional neural networks, to obtain trained 3D convolutional neural networks；

The face characteristic extraction module is used to, by the trained 3D convolutional neural networks of pretreated face sequence inputting, carry Take the face characteristic of face sequence；

The face alignment module is returned in object library for face characteristic to be compared with the feature templates of object library with working as The recognition of face information that preceding face characteristic matches.

7. the face dynamic recognition system according to claim 6 based on 3D convolutional neural networks, it is characterised in that：It is described In face tracking module, the picture frame extracted from video flowing is the key frame of video flowing.

8. the face dynamic recognition system according to claim 6 based on 3D convolutional neural networks, it is characterised in that：It is described In face sequence preprocessing module, it is described pretreatment include the screening of face sequence, the equalization of image, image normalization, Face correction, one kind in image scaling or combination；The preassigned includes size, facial angle, picture luminance, clarity one or combination of.

9. the face dynamic recognition system according to claim 6 based on 3D convolutional neural networks, it is characterised in that：It is described 3D convolutional neural networks training module performs following operation：

Pretreated face sequence inputting 3D convolutional neural networks are trained, update the power of each layer of 3D convolutional neural networks Value, is comprised the following steps with obtaining trained 3D convolutional neural networks：

10. the face dynamic recognition system according to claim 9 based on 3D convolutional neural networks, it is characterised in that：More The weights of each layer of new 3D convolutional neural networks, are comprised the following steps with obtaining trained 3D convolutional neural networks：

I) update a)-g) each layer weights；