CN107194380A

CN107194380A - The depth convolutional network and learning method of a kind of complex scene human face identification

Info

Publication number: CN107194380A
Application number: CN201710531386.2A
Authority: CN
Inventors: 唐良智; 王兵; 魏湘臣
Original assignee: Shanghai Hefu Artificial Intelligence Technology (group) Co Ltd
Current assignee: Shanghai Hefu Artificial Intelligence Technology (group) Co Ltd
Priority date: 2017-07-03
Filing date: 2017-07-03
Publication date: 2017-09-22

Abstract

The present invention proposes the depth convolutional network and learning method of a kind of complex scene human face identification, and the network structure is optimized due to four full convolution modules and three full link block compositions using random tonsure descent algorithm to the parameter of depth convolutional network.The recognition of face depth convolutional network of design is expected to obtain in true application scenarios, for complex illumination, all has stable recognition performance under various visual angles, in high-performance GPU support, the demand of engineer applied is met under higher calculating degree.

Description

The depth convolutional network and learning method of a kind of complex scene human face identification

Technical field

The present invention relates to a kind of depth convolutional network of complex scene human face identification and learning method, belong to computer skill Art field.

Background technology

The face recognition algorithms of early stage are mainly included to image preprocessing, propose such as LBP conventional faces feature and answer The feature extracted is learnt and classified with vector machine.Wherein Image Pretreatment Algorithm is particularly critical, specifically includes pin To the Preprocessing Algorithm under different illumination conditions, for the Denoising Algorithm of picture noise, and for different angle faces to picture Alignment algorithm.After the optimization specifically designed for some face recognition database, such as classical LFW facial recognition datas Storehouse, has been achieved for the result close with mankind's identification level.

And in recent years, with the quick increase of miscellaneous capture apparatus, the facial image exponentially increased is a large amount of Appear in all kinds of shooting images and video, especially from the shooting of cell phone apparatus and monitoring device.The thing followed is then It is to occur in that wilderness demand for face recognition application, including the authentication based on face, the figure based on recognition of face As retrieval, the application such as the Given Face identification based on monitor video.Face recognition technology of today has been taken leave of in small-scale data The period of algorithm optimization is carried out on storehouse, then entering needs accurately to know the facial image shot under a large amount of various environment Other application period.However, rapidly increasing with the digital picture including face shot from various environment, and photographer Under level and very different environment, recognition of face essence of the foregoing early stage face recognition technology in such true environment Degree is rapid to be declined, far from the application demand for meeting reality.

The content of the invention

In order to solve the above technical problems, one aspect of the present invention is：A kind of complex scene human face identification Depth convolutional network, it is characterised in that：Including multiple convolution modules and full link block, each convolution module or complete Convolution kernel, activation primitive and pond function of the link block under different scale are constituted,.

Further, the convolution module has 4 altogether, and full link block has 3 altogether, wherein, preceding 3 convolution modules difference By convolutional layer, relu is once constituted with maximum pond layer, and maximum pond layer is responsible for the operation of down-sampling, with multilayer pyramid down-sampling Structure to image carry out depth multi-layer filtering operation.

Further, specific algorithm of the convolutional network from first layer to last layer and hidden layer input and output with Equation below is realized：

1)Wherein X⁰∈R^224×224×3,w¹∈R^3×3×32。

2)X²=max { 0, X¹, wherein X¹∈R^224×224×32。

3)Wherein X²∈R^224×224×32, w³∈R^3×3×32×64。

4)X⁴=max { 0, X³, wherein X³∈R^224×224×64。

5)Wherein X⁴∈R^224×224×64, w⁵∈R^{3×3×64×128}

6)X⁶=max { 0, X⁵, wherein X⁵∈R^{224×224×128}

7)X⁷=maxpool (X⁶), wherein X⁶∈R^{224×224×128}

8)Wherein X⁷∈R^{112×112×128}, w⁸∈R^{3×3×128×128}

9)X⁹=max { 0, X⁸, wherein X⁸∈R^{112×112×128}

10)Wherein X⁹∈R^{112×112×128}, w¹⁰∈R^{3×3×128×256}

11)X¹¹=max { 0, X¹⁰, wherein X¹⁰∈R^{112×112×256}

12)Wherein X¹¹∈R^{112×112×256}, w¹²∈R^{3×3×256×256}

13)X¹³=max { 0, X¹², wherein X¹²∈R^{112×112×256}

14)X¹⁴=maxpool (X¹³), wherein X¹³∈R^{112×112×256}

15)Wherein X¹⁴∈R^56×56×256, w¹⁵∈R^{3×3×256×256}

16)X¹⁶=max { 0, X¹⁵, wherein X¹⁵∈R^56×56×256

17)Wherein X¹⁶∈R^56×56×256, w¹⁷∈R^{3×3×256×512}

18)X¹⁸=max { 0, X¹⁷, wherein X¹⁷∈R^56×56×512

19)Wherein X¹⁸∈R^56×56×512, w¹⁹∈R^{3×3×512×512}

20)X²⁰=max { 0, X¹⁹, wherein X¹⁹∈R^56×56×512

21)X²¹=maxpool (X²⁰), wherein X²⁰∈R^56×56×512

22)Wherein X²¹∈R^28×28×512, w²²∈R^{3×3×512×512}

23)X²³=max { 0, X²², wherein X¹⁹∈R^28×28×512

24)Wherein X²³∈R^28×28×512, w²⁴∈R^{3×3×512×512}

25)X²⁵=max { 0, X²⁴, wherein X²⁴∈R^28×28×512

26)Wherein X²⁵∈R^28×28×512, w²⁶∈R^{3×3×512×512}

27)X²⁷=max { 0, X²⁶, wherein X²⁶∈R^28×28×512

28)Wherein X²⁷∈R^28×28×512, w²⁸∈R^{3×3×512×512}

29)X²⁹=max { 0, X²⁸, wherein X²⁶∈R^14×14×512

30)X³⁰=maxpool (X²⁹), wherein X²⁹∈R^7×7×512

31)Wherein X³⁰∈R^7×7×512, w³¹∈R^{7×7×512×4096}

32)X³²=max { 0, X³¹, wherein X³¹∈R^1×1×4096

33)Wherein X³²∈R^1×1×4096, w³³∈R^{1×1×4096×4096}

34)X³⁴=max { 0, X³³, wherein X³³∈R^1×1×4096

35)Wherein X³⁴∈R^1×1×4096, w³⁵∈R^1×4096×K,

Wherein, Xi represents i-th layer of output, wⁱI-th layer of filter coefficient is represented, K represents the class for the face that need to be recognized Shuo not.

A kind of learning method of the depth convolutional network of complex scene human face identification is also provided, it is characterised in that：It is described Learning method is achieved by the steps of：

Step 1：Stochastical sampling, the number generated using stochastical sampling are carried out to the face picture collected in real scene According to initialization network data, manually by offset parameter zero setting；

Step 2：Data augmentation operation is carried out to existing face training image, the size of all training images is uniformly returned One changes to 224*224, is inputting into before network, is subtracting average image, is faster being restrained with reaching.

Specifically, the augmentation operation includes following concrete mode：

(1) 70%-90% of the face picture recognized is cut out at random, then carries out placed in the middle and zoom to 224*224, can be with It is effective to improve the discrimination for working as face to this；

(2) for the inclined situation of picture, original image manually rotates to 5 at random, 110,15 degree, then is trained；

(3) the random progress contrast stretching of original face training figure or contrast are compressed, it is complicated in illumination to improve In the environment of recognition of face performance.

Further, its learning process is optimized using stochastic gradient descent algorithm to the parameter of depth convolutional network.

The situation of prior art is different from, the beneficial effects of the invention are as follows：

The recognition of face depth convolutional network that the present invention is designed is expected to obtain in true application scenarios, for complex illumination, All there is stable recognition performance under various visual angles.In high-performance GPU support, engineer applied is met under higher calculating degree Demand.It is embodied in：There is powerful model table specifically designed for the depth convolutional network of complex scene human face identification mission Danone power, to the recognition of face object of complicated illumination condition and various visual angles.

Brief description of the drawings

Fig. 1 is the structure chart of the depth convolutional network of complex scene human face identification of the present invention.

Fig. 2 is the recognition of face depth convolutional network tactic pattern figure that the present invention is used under complex scene.

Embodiment

Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is carried out clear, complete Site preparation is described, it is clear that described embodiment is only a part of embodiment of the present invention, rather than whole embodiments.It is based on Embodiment in the present invention, it is every other that those of ordinary skill in the art are obtained under the premise of creative work is not made Embodiment, belongs to the scope of protection of the invention.

A kind of depth convolutional network of complex scene human face identification, as shown in figure 1, connecting including multiple convolution modules and entirely Connection module, convolution kernel, activation primitive and the pond function of each convolution module or full link block under different scale Composition, wherein first three convolution module are respectively by convolutional layer, and relu layers and maximum pond layer (max pooling) composition are maximum Pond layer is responsible for the operation of down-sampling, carries out the operation of depth multi-layer filtering to image with the structure of multilayer pyramid down-sampling.

In this application, in order to which aspect describes network structure, j-th volume in i-th of convolution module is represented with convi_j Product kernel function, tanhi_j represents j-th of tanh activation primitive in i-th of convolution module, and pooli represents i-th of maximum pond Function (max pooling function), and fci represent i-th of full-mesh layer, and in depth convolutional network, we use Xi generations The output that i-th layer of table, wⁱThe data relationship between i-th layer of filter coefficient, its each layer is represented as shown in Fig. 2 by as follows Formula, calculates the tensor number from concrete operations of the first layer of convolutional network to last layer and middle level hidden layer input and output According to：

1)Wherein X⁰∈R^224×224×3,w¹∈R^3×3×32。

2)X²=max { 0, X¹, wherein X¹∈R^224×224×32。

3)Wherein X²∈R^224×224×32, w³∈R^3×3×32×64。

4)X⁴=max { 0, X³, wherein X³∈R^224×224×64。

5)Wherein X⁴∈R^224×224×64, w⁵∈R^{3×3×64×128}

6)X⁶=max { 0, X⁵, wherein X⁵∈R^{224×224×128}

7)X⁷=maxpool (X⁶), wherein X⁶∈R^{224×224×128}

8)Wherein X⁷∈R^{112×112×128}, w⁸∈R^{3×3×128×128}

9)X⁹=max { 0, X⁸, wherein X⁸∈R^{112×112×128}

10)Wherein X⁹∈R^{112×112×128}, w¹⁰∈R^{3×3×128×256}

11)X¹¹=max { 0, X¹⁰, wherein X¹⁰∈R^{112×112×256}

12)Wherein X¹¹∈R^{112×112×256}, w¹²∈R^{3×3×256×256}

13)X¹³=max { 0, X¹², wherein X¹²∈R^{112×112×256}

14)X¹⁴=maxpool (X¹³), wherein X¹³∈R^{112×112×256}

15)Wherein X¹⁴∈R^56×56×256, w¹⁵∈R^{3×3×256×256}

16)X¹⁶=max { 0, X¹⁵, wherein X¹⁵∈R^56×56×256

17)Wherein X¹⁶∈R^56×56×256, w¹⁷∈R^{3×3×256×512}

18)X¹⁸=max { 0, X¹⁷, wherein X¹⁷∈R^56×56×512

19)Wherein X¹⁸∈R^56×56×512, w¹⁹∈R^{3×3×512×512}

20)X²⁰=max { 0, X¹⁹, wherein X¹⁹∈R^56×56×512

21)X²¹=maxpool (X²⁰), wherein X²⁰∈R^56×56×512

22)Wherein X²¹∈R^28×28×512, w²²∈R^{3×3×512×512}

23)X²³=max { 0, X²², wherein X¹⁹∈R^28×28×512

24)Wherein X²³∈R^28×28×512, w²⁴∈R^{3×3×512×512}

25)X²⁵=max { 0, X²⁴, wherein X²⁴∈R^28×28×512

26)Wherein X²⁵∈R^28×28×512, w²⁶∈R^{3×3×512×512}

27)X²⁷=max { 0, X²⁶, wherein X²⁶∈R^28×28×512

28)Wherein X²⁷∈R^28×28×512, w²⁸∈R^{3×3×512×512}

29)X²⁹=max { 0, X²⁸, wherein X²⁶∈R^14×14×512

30)X³⁰=maxpool (X²⁹), wherein X²⁹∈R^7×7×512

31)Wherein X³⁰∈R^7×7×512, w³¹∈R^{7×7×512×4096}

32)X³²=max { 0, X³¹, wherein X³¹∈R^1×1×4096

33)Wherein X³²∈R^1×1×4096, w³³∈R^{1×1×4096×4096}

34)X³⁴=max { 0, X³³, wherein X³³∈R^1×1×4096

35)Wherein X³⁴∈R^1×1×4096, w³⁵∈R^1×4096×K,

Wherein, K represents the classification number for the face that need to be recognized.

A kind of learning method of the depth convolutional network of complex scene human face identification is also provided, it is real as follows It is existing, step 1：At the beginning of stochastical sampling, the data generated using stochastical sampling being carried out to the face picture collected in real scene Beginningization network data, manually by offset parameter zero setting；

Wherein, augmentation operation mainly includes some specific contents：

(1) in true application scenarios, the situation that part is blocked often occurs in face picture.To efficiently identify this Class face picture, the 70% of our random cropping original pictures, 80%, 90%, then carry out placed in the middle, 224*224 is zoomed to, the data Augmentation algorithm can effectively improve the discrimination to blocking face.

(2) inclined situation often occurs in face picture, for such case, we by original training picture it is artificial with Machine rotates 5 degree, 10 degree or 15 degree, then is trained.

(3) to improve the performance recognized in illumination complex environment human face, in view of different light conditions can cause not Same contrast, it is proposed that original face training image is carried out into contrast stretching or contrast compression at random, then is carried out Training.

Embodiments of the invention are the foregoing is only, are not intended to limit the scope of the invention, it is every to utilize this hair Equivalent structure or equivalent flow conversion that bright specification and accompanying drawing content are made, or directly or indirectly it is used in other related skills Art field, is included within the scope of the present invention.

Claims

1. a kind of depth convolutional network of complex scene human face identification, it is characterised in that：Including the multiple convolution being sequentially connected Module and full link block, convolution kernel under different scale of each convolution module and full link block, activation primitive and Pond function composition.

2. a kind of depth convolutional network of complex scene human face identification according to claim 1, it is characterised in that：It is described Convolution module has 4 altogether, and full link block has 3 altogether, wherein, preceding 3 convolution modules are respectively by convolutional layer, and relu is once and maximum Pond layer composition, maximum pond layer is responsible for the operation of down-sampling, depth is carried out to image with the structure of multilayer pyramid down-sampling The operation of multi-layer filtering.

3. a kind of depth convolutional network of complex scene human face identification according to claim 1, it is characterised in that：It is described Specific algorithm and hidden layer input and output of the convolutional network from first layer to last layer are realized with equation below：

1)Wherein X⁰∈R^224×224×3,w¹∈R^3×3×32。

2)X²=max { 0, X¹, wherein X¹∈R^224×224×32。

3)Wherein X²∈R^224×224×32, w³∈R^3×3×32×64。

4)X⁴=max { 0, X³, wherein X³∈R^224×224×64。

5)Wherein X⁴∈R^224×224×64, w⁵∈R^{3×3×64×128}

6)X⁶=max { 0, X⁵, wherein X⁵∈R^{224×224×128}

7)X⁷=maxpool (X⁶), wherein X⁶∈R^{224×224×128}

8)Wherein X⁷∈R^{112×112×128}, w⁸∈R^{3×3×128×128}

9)X⁹=max { 0, X⁸, wherein X⁸∈R^{112×112×128}

10)Wherein X⁹∈R^{112×112×128}, w¹⁰∈R^{3×3×128×256}

11)X¹¹=max { 0, X¹⁰, wherein X¹⁰∈R^{112×112×256}

12)Wherein X¹¹∈R^{112×112×256}, w¹²∈R^{3×3×256×256}

13)X¹³=max { 0, X¹², wherein X¹²∈R^{112×112×256}

14)X¹⁴=maxpool (X¹³), wherein X¹³∈R^{112×112×256}

15)Wherein X¹⁴∈R^56×56×256, w¹⁵∈R^{3×3×256×256}

16)X¹⁶=max { 0, X¹⁵, wherein X¹⁵∈R^56×56×256

17)Wherein X¹⁶∈R^56×56×256, w¹⁷∈R^{3×3×256×512}

18)X¹⁸=max { 0, X¹⁷, wherein X¹⁷∈R^56×56×512

19)Wherein X¹⁸∈R^56×56×512, w¹⁹∈R^{3×3×512×512}

20)X²⁰=max { 0, X¹⁹, wherein X¹⁹∈R^56×56×512

21)X²¹=maxpool (X²⁰), wherein X²⁰∈R^56×56×512

22)Wherein X²¹∈R^28×28×512, w²²∈R^{3×3×512×512}

23)X²³=max { 0, X²², wherein X¹⁹∈R^28×28×512

24)Wherein X²³∈R^28×28×512, w²⁴∈R^{3×3×512×512}

25)X²⁵=max { 0, X²⁴, wherein X²⁴∈R^28×28×512

26)Wherein X²⁵∈R^28×28×512, w²⁶∈R^{3×3×512×512}

27)X²⁷=max { 0, X²⁶, wherein X²⁶∈R^28×28×512

28)Wherein X²⁷∈R^28×28×512, w²⁸∈R^{3×3×512×512}

29)X²⁹=max { 0, X²⁸, wherein X²⁶∈R^14×14×512

30)X³⁰=maxpool (X²⁹), wherein X²⁹∈R^7×7×512

31)Wherein X³⁰∈R^7×7×512, w³¹∈R^{7×7×512×4096}

32)X³²=max { 0, X³¹, wherein X³¹∈R^1×1×4096

33)Wherein X³²∈R^1×1×4096, w³³∈R^{1×1×4096×4096}

34)X³⁴=max { 0, X³³, wherein X³³∈R^1×1×4096

35)Wherein X³⁴∈R^1×1×4096, w³⁵∈R^1×4096×K,

Wherein, Xi represents i-th layer of output, wⁱI-th layer of filter coefficient is represented, K represents the classification number for the face that need to be recognized.

4. a kind of learning method of the depth convolutional network of complex scene human face identification, it is characterised in that：The learning method It is achieved by the steps of：

Step 1：At the beginning of stochastical sampling, the data generated using stochastical sampling being carried out to the face picture collected in real scene Beginningization network data, manually by offset parameter zero setting；

Step 2：Data augmentation operation is carried out to existing face training image, the size of all training images is unified into normalizing Change, inputting into before network, subtracting average image, faster restrained with reaching.

5. a kind of learning method of the depth convolutional network of complex scene human face identification according to claim 4, it is special Levy and be：The augmentation operation includes following concrete mode：

(1) 70%-90% of the face picture recognized is cut out at random, then carries out placed in the middle and zoom to 224*224, can be effective Raising to this work as face discrimination；

(3) the random progress contrast stretching of original face training figure or contrast are compressed, to improve in the complicated ring of illumination The performance of border human face identification.

6. a kind of learning method of the depth convolutional network of complex scene human face identification according to claim 4, it is special Levy and be：Its learning process is optimized using stochastic gradient descent algorithm to the parameter of depth convolutional network.