CN110929569A

CN110929569A - Face recognition method, device, equipment and storage medium

Info

Publication number: CN110929569A
Application number: CN201910990736.0A
Authority: CN
Inventors: 王健宗; 贾雪丽
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2019-10-18
Filing date: 2019-10-18
Publication date: 2020-03-27
Anticipated expiration: 2039-10-18
Also published as: CN110929569B; WO2021073418A1

Abstract

The invention relates to the technical field of artificial intelligence, and discloses a face recognition method, which comprises the steps of carrying out convolution operation twice on face portrait data to be recognized according to an input channel and an output channel in a pre-stored depth separable convolution module, then calculating the similarity of face portrait patterns obtained after the convolution operation, and judging whether the face portrait belongs to the same face or not based on the similarity; the invention also provides a face recognition device, equipment and a computer readable storage medium, which simplify the characteristics of the face portrait data based on adopting two convolution operations, greatly reduce the time of the whole face recognition network, the complexity of the recognition characteristics and the space complexity of the characteristics, reduce the parameter amount and the calculation amount of the operation of the terminal in the recognition process, reduce the consumption of the terminal and improve the operation efficiency.

Description

Face recognition method, device, equipment and storage medium

Technical Field

The invention relates to the technical field of artificial intelligence, in particular to a face recognition method, a face recognition device, face recognition equipment and a storage medium.

Background

With the continuous development of face recognition technology, especially in the deep learning development of recognition models, the face recognition technology is more and more applied and deployed in scenes of life. Human face recognition, as a large component of the field of computer vision, has become the focus of deep learning research and has achieved relatively good accuracy. The accuracy of the deep learning network is basically realized by stacking convolution kernels, the deeper and deeper convolution layers bring huge improvement on the parameter quantity, and although the accuracy of the network is improved to a certain extent, the running time of the network is greatly prolonged.

The mobile phone is used as a carrier of daily life, more and more face recognition is also carried out through the mobile phone, but the current recognition technology is applied to a large network, the large network is suitable for being deployed at a server end, the requirement on the computing power of equipment is higher, the data processing capability of the current mobile phone type terminal is not high, and if the current face recognition method is directly applied to the mobile phone, the operation speed and the use experience of the mobile phone can be greatly influenced. Therefore, it is very important to develop a neural network that can be operated on a mobile phone.

Disclosure of Invention

The invention mainly aims to provide a face recognition method, a face recognition device, face recognition equipment and a face recognition storage medium, and aims to solve the technical problem that the running speed of a terminal is influenced and the user experience is reduced due to excessive running consumption of the existing face recognition.

In order to achieve the above object, the present invention provides a face recognition method, comprising the steps of:

acquiring face portrait data to be recognized;

determining the size of a convolution kernel of the first convolution operation according to the number of input channels and output channels in a pre-stored deep separable convolution model;

inputting the face portrait data into the input channel, and performing a first convolution operation on the face portrait data according to the size of the convolution kernel to obtain a fragment set of the face portrait, wherein the fragment set comprises at least two face fragments, and the face fragments are displayed in the deep separable convolution model in the form of the convolution kernel;

performing a second convolution operation on the face fragments obtained through the first convolution operation to obtain face portrait patterns, wherein the second convolution kernel operation is to perform cross-channel standard convolution calculation on the face fragments by taking a unit convolution kernel as a convolution basis;

and calculating the similarity between the face portrait pattern and a face portrait prestored in the mobile terminal according to a preset combined Bayesian algorithm, and comparing the similarity with a preset threshold value to identify whether the face portrait pattern and the face portrait in the mobile terminal belong to the same person.

Optionally, if the deep separable convolution model includes: m input channels, N output channels and a point-by-point convolution filter;

the step of determining the size of the convolution kernel of the first convolution operation according to the number of the input channels and the output channels in the pre-stored deep separable convolution model comprises:

detecting a pixel value Dx Dy of the face portrait data, and extracting a step coefficient of each time set in the first convolution operation;

subtracting the step length coefficient from the pixel value to obtain the size of the convolution kernel;

m and N are integers which are larger than 1 respectively, Dx and Dy are the length and width of the face portrait data respectively, and Dx and Dy are positive integers which are larger than 1.

Optionally, the step of inputting the face portrait data to the input channel, and performing a first convolution operation on the face portrait data according to the size of the convolution kernel to obtain a fragment set of the face portrait includes:

the face portrait data is respectively input into M input channels, the input channels divide the face portrait data according to the size of the convolution kernel to obtain M first face image data, wherein the first face image data is a block set consisting of a plurality of blocks with different sizes;

and screening the image block set according to the preset void rate and sharing weight in the first convolution operation to obtain a fragment set comprising a plurality of different face fragments.

Optionally, the step of performing a second convolution operation on the face fragments obtained by the first convolution operation to obtain a face portrait pattern includes:

and sequentially inputting the face fragments output by the M input channels into the point-by-point convolution filter, so that the point-by-point convolution filter performs standard convolution calculation on the face fragments according to preset N unit convolution cores with the size of 1 x 1, and the face portrait pattern with the size of Df x Dg N is obtained, wherein Df is the length of the face portrait pattern, Dg is the width of the face portrait pattern, and both Df and Dg are positive integers larger than zero.

inputting the face portrait data into the input channel, so that the face portrait data is divided in the input channel according to the size of the convolution kernel and the step length coefficient to obtain M image blocks with different sizes, wherein the image block obtained in each input channel is obtained by selecting the input channel according to a void ratio and a sharing weight preset in the first convolution operation;

fusing the M image blocks to obtain a face image element map with the size of Di Dj M;

and Di is the length of the image block, Dj is the width of the image block, and Di and Dj are positive integers larger than zero.

inputting the face portrait element graph into the point-by-point convolution filter, so that the point-by-point convolution filter performs standard convolution calculation on the face portrait element graph according to preset N unit convolution cores with the size of 1 × 1, and the face portrait pattern with the size of Do × Dp is obtained, wherein Do is the length of the face portrait pattern, Dp is the width of the face portrait pattern, and both Do and Dp are positive integers larger than zero.

Optionally, the pre-stored deep separable convolution model is obtained by training in the following manner:

constructing a model training framework of the deep separable convolution model according to a separation convolution algorithm, wherein the model training framework comprises an input channel, an output channel, a deep convolution training unit and a point-by-point convolution training unit, the deep convolution training unit obtains a deep convolution filter through training, and the point-by-point convolution training unit obtains a point-by-point convolution filter through training;

determining a division specification for segmenting the face image based on the number of input channels and output channels of the model training framework, wherein the division specification comprises the length and the width of a small image and the size of a unit convolution kernel of convolution calculation;

acquiring face sample data from a website, and segmenting the face sample data according to the segmentation specification and the facial region to obtain a plurality of small images;

sequentially inputting the small images into the deep convolution training unit to extract features, synthesizing the extracted features into a feature map, and outputting M personal face fragment samples;

inputting M human face fragment samples into the point-by-point convolution training unit, and performing conversion convolution calculation on the human face fragment samples by taking the unit convolution kernel as a convolution basis to obtain human face portrait samples;

judging whether the similarity between the face image sample and the corresponding original small image reaches a preset similarity value or not;

if so, modifying the original parameters of the model training frame by taking the division specification as model parameters to obtain and store the deep separable convolution model;

if not, the division specification is readjusted to process the face sample data until the deep separable convolution model is obtained.

In addition, to achieve the above object, the present invention also provides a face recognition apparatus, including:

the acquisition module is used for acquiring face portrait data to be recognized;

the first convolution module is used for determining the convolution kernel size of the first convolution operation according to the number of the input channels and the output channels in the pre-stored deep separable convolution model; inputting the face portrait data into the input channel, and performing a first convolution operation on the face portrait data according to the size of the convolution kernel to obtain a fragment set of the face portrait, wherein the fragment set comprises at least two face fragments, and the face fragments are displayed in the deep separable convolution model in the form of the convolution kernel;

the second convolution module is used for performing second convolution operation on the face fragments obtained through the first convolution operation to obtain face portrait patterns, wherein the second convolution kernel operation is to perform cross-channel standard convolution calculation on the face fragments by taking a unit convolution kernel as a convolution basis;

and the recognition module is used for calculating the similarity between the face portrait pattern and a face portrait prestored in the mobile terminal according to a preset combined Bayesian algorithm, and comparing the similarity with a preset threshold value so as to recognize whether the face portrait pattern and the face portrait in the mobile terminal belong to the same person.

In another embodiment of the present invention, if the deep separable convolution model includes: m input channels, N output channels and a point-by-point convolution filter;

the first convolution module comprises a calculation unit, a calculation unit and a calculation unit, wherein the calculation unit is used for detecting the pixel value Dx Dy of the face portrait data and extracting the step size coefficient of each time set in the first convolution operation; and subtracting the step coefficient from the pixel value to obtain the size of the convolution kernel, wherein M and N are integers which are larger than 1 respectively, Dx and Dy are the length and width of the face portrait data respectively, and Dx and Dy are positive integers which are larger than 1.

In another embodiment of the present invention, the first convolution module further includes a depth convolution filtering unit, configured to input the face portrait data to M input channels respectively, where the input channels divide the face portrait data according to the size of the convolution kernel to obtain M first face image data, where the first face image data is a tile set composed of a plurality of different size tiles; and screening the image block set according to the preset void rate and sharing weight in the first convolution operation to obtain a fragment set comprising a plurality of different face fragments.

In another embodiment of the present invention, the second convolution module includes a point-by-point convolution filtering unit, configured to sequentially input the face fragments output by the M input channels into the point-by-point convolution filter, where the point-by-point convolution filter performs standard convolution calculation on the face fragments according to preset N unit convolution kernels with a size of 1 × 1, so as to obtain the face image pattern with a size Df × Dg × N, where Df is a length of the face image pattern, Dg is a width of the face image pattern, and Df and Dg are positive integers greater than zero.

In another embodiment of the present invention, the depth convolution filtering unit is further configured to input the face image data into the input channel, so that the face image data is divided according to the size of the convolution kernel and the step size coefficient in the input channel to obtain M blocks with different sizes, where the block obtained in each input channel is obtained by selecting the input channel according to a void rate and a sharing weight preset in the first convolution operation; and fusing the M image blocks to obtain a face image element map with the size of Di & ltj & gt & ltM & gt, wherein Di is the length of the image blocks, Dj is the width of the image blocks, and Di and Dj are positive integers larger than zero.

In another embodiment of the present invention, the point-by-point convolution filtering unit is further configured to input the face image element map into the point-by-point convolution filter, so that the point-by-point convolution filter performs standard convolution calculation on the face image element map according to preset N unit convolution kernels with a size of 1 × 1, to obtain the face image pattern with a size of Do × Dp × N, where Do is a length of the face image pattern, Dp is a width of the face image pattern, and both Do and Dp are positive integers greater than zero.

In another embodiment of the present invention, the first convolution module is further configured to train the deep separable convolution model by:

In addition, to achieve the above object, the present invention provides a face recognition apparatus, including: a memory, a processor and a face recognition program stored on the memory and executable on the processor, the face recognition program when executed by the processor implementing the steps of the face recognition method as claimed in any one of the above.

Furthermore, to achieve the above object, the present invention also provides a computer-readable storage medium, on which a face recognition program is stored, the face recognition program, when executed by a processor, implementing the steps of the face recognition method according to any one of the above.

The face recognition method provided by the invention is characterized in that the convolution operation is carried out twice on the face portrait data to be recognized according to the input channel and the output channel in the pre-stored depth separable convolution module, then the similarity of the face portrait patterns obtained after the convolution operation is calculated, whether the face portrait data belong to the same face or not is judged based on the similarity, and the characteristics of the face portrait data are simplified by the convolution operation twice, so that the time of the whole face recognition network, the complexity of the recognition characteristics and the space complexity of the characteristics are greatly reduced, the parameter amount and the calculation amount of the operation of the terminal in the recognition process are reduced, the consumption of the terminal is reduced, and the operation efficiency is improved.

Drawings

Fig. 1 is a schematic structural diagram of an operating environment of a mobile terminal according to an embodiment of the present invention;

fig. 2 is a schematic flow chart of a first embodiment of a face recognition method provided by the present invention;

FIG. 3 is a schematic structural diagram of a facial expression modification model according to the present invention;

FIG. 4 is a diagram illustrating a photo of a person with a face image with a converted expression according to the present invention;

FIG. 5 is a schematic flow chart of facial expression modification model training provided by the present invention;

fig. 6 is a schematic diagram of functional modules of the face recognition apparatus according to the present invention.

The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.

Detailed Description

It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

The present invention provides a face recognition apparatus, which may be a plug-in a mobile terminal, and is configured to execute the face recognition method according to the embodiment of the present invention, as shown in fig. 1, where fig. 1 is a schematic structural diagram of a mobile terminal operating environment according to the embodiment of the present invention.

As shown in fig. 1, the mobile terminal includes: a processor 101, e.g. a CPU, a communication bus 102, a user interface 103, a network interface 104, a memory 105. Wherein the communication bus 102 is used for enabling connection communication between these components. The user interface 103 may comprise a Display screen (Display), an input unit such as a Keyboard (Keyboard), and the network interface 104 may optionally comprise a standard wired interface, a wireless interface (e.g. WI-FI interface). The memory 105 may be a high-speed RAM memory or a non-volatile memory (e.g., a disk memory). The memory 105 may alternatively be a memory system separate from the processor 101 described above.

Those skilled in the art will appreciate that the hardware configuration of the mobile terminal shown in fig. 1 does not constitute a limitation of the face recognition apparatus or device, and may include more or less components than those shown, or some components may be combined, or a different arrangement of components.

As shown in fig. 1, the memory 105, which is a kind of computer-readable storage medium, may include therein an operating system, a network communication module, a user interface module, and a face recognition program for implementing expression generation. The operating system is a program for managing and controlling the face recognition device and software resource calling in the memory, and supports the running of the face recognition program and other software and/or programs.

In the hardware configuration of the mobile terminal shown in fig. 1, the network interface 104 is mainly used for accessing a network; the user interface 103 is mainly used for face representation data to be recognized, and some requirements and other parameters when recognizing a face, and the processor 101 may be used to call a face recognition program stored in the memory 105 and perform the following operations of the embodiments of the face recognition method.

Based on the hardware structure of the mobile terminal, the invention provides a face recognition method, which is mainly applied to small-sized terminal devices, such as mobile phones, IPADs, cameras and other devices, and referring to fig. 2, fig. 2 is a flowchart of the face recognition method provided by the embodiment of the invention. In this embodiment, the face recognition method specifically includes the following steps:

step S210, acquiring face portrait data to be recognized;

in the step, the acquiring of the face portrait data is specifically shooting and collecting through a camera on the smart phone, and after the picture is shot, extracting the face image from the picture according to a face image extraction technology, so that the extracted face image is used as the face portrait data to be recognized. In practical application, the image extraction can be realized by distinguishing a foreground image and a background image in a photo through an image extraction technology.

In the present embodiment, in the recognition processing of performing convolution calculation on face image data, it is generally performed by using a convolution kernel, that is, it is necessary to perform convolution calculation by converting the face image data into a convolution kernel and extracting a feature map.

Step S220, determining the convolution kernel size of the first convolution operation according to the number of input channels and output channels in the pre-stored deep separable convolution model;

in this step, the deep separable convolution model refers to a model having at least two layers of structures, and is obtained through model training, which may be understood as training in two layers, namely deep convolution training and point-by-point convolution training, and the two layers are realized as follows:

In practical application, firstly, a preset division specification of a face image is obtained; the division specification comprises the length and the width of a small image and the size of a unit convolution kernel of convolution calculation; the mobile terminal acquires face sample data from a website through the Internet, and the facial image is segmented according to the five sense organ region by the segmentation specification to obtain a plurality of small images;

then, sequentially inputting the small images into the depth convolution filter for feature extraction, synthesizing the extracted features into a feature map, and outputting M face image small blocks with the size Di x Dj;

further, a training unit of a point-by-point convolution filter in the deep separable convolution model and the unit convolution core are used for carrying out conversion convolution calculation on the small face portrait blocks to obtain a face portrait sample;

finally, comparing the face portrait pattern with the corresponding small image;

if the comparison is consistent, generating a new deep separable convolution model according to the division specification;

if the comparison is not consistent, the division specification is readjusted and model training is carried out.

Step S230, inputting the face portrait data into the input channel, and performing a first convolution operation on the face portrait data according to the size of the convolution kernel to obtain a fragment set of the face portrait, wherein the fragment set comprises at least two face fragments, and the face fragments are displayed in the pre-stored deep separable convolution model in the form of the convolution kernel;

in this embodiment, the first convolution operation may be understood as an image segmentation operation, specifically, performing feature extraction on the face image data to extract different face features in the face image, where each face feature corresponds to a face fragment.

The step is essentially to divide the face portrait data into a plurality of small blocks, acquire at least three feature maps from different receptive field angles of each small block, perform mutual fusion processing on the three feature maps to obtain a feature map corresponding to each small block, and splice the feature maps corresponding to each small block to convert the face portrait data into a convolution kernel.

In this step, the process may be directly obtained by synchronously converting the mobile terminal in the process of acquiring the face picture of the user, that is, the camera of the mobile terminal is not necessarily an existing camera, but is shot in an image convolution mode, specifically embodied in a pixel grid mode, and when the face image data is segmented by adopting the void convolution, the feature map may be extracted according to the level of the pixel grid, and then the feature map is spliced and synthesized to obtain a complete feature map.

Step S240, carrying out a second convolution operation on the face fragments obtained by the first convolution operation to obtain face portrait patterns, wherein the second convolution kernel operation is to carry out cross-channel standard convolution calculation on the face fragments by taking a unit convolution kernel as a convolution basis;

in this embodiment, the pre-stored deep separable convolution model includes M input channels, N output channels, a deep convolution filter, and a point-by-point convolution filter, where the first convolution operation is implemented by the deep convolution filter, specifically, the size of a convolution kernel is constructed based on the output requirements of the M input channels and the N output channels, and the face portrait data is segmented according to the constructed size of the convolution kernel by using a hole convolution to obtain M individual face portrait small blocks, and then the M individual face portrait small blocks are respectively input into the M input channels, and then a cross-channel standard convolution calculation is performed by the point-by-point convolution filter, so as to obtain a face portrait pattern.

In this step, when performing cross-channel standard convolution calculation for the point-by-point convolution filter, it is specifically necessary to perform calculation by combining with a unit convolution kernel, and the unit convolution kernel preferably employs a convolution kernel whose length and width are 1 respectively. In the step, N face image patterns are output after standard convolution calculation every time one face image small block is input.

And step S250, calculating the similarity between the human face portrait pattern and the human face portrait prestored in the mobile terminal according to a preset combined Bayesian algorithm, and comparing the similarity with a preset threshold value to judge whether the human face portrait is in the same human face.

In the embodiment, the calculated face portrait patterns are subjected to probability calculation, so that the face portrait patterns can be further screened, the face portrait patterns are closer to the face portrait data, and the recognition precision is improved. In practical application, if the combined Bayesian probability value obtained by the calculation is judged to be not greater than the preset threshold value, the recognized face portrait data is considered to belong to the same face, otherwise, the recognized face portrait data does not belong to the same face.

In practical applications, when the face image data recognized by the mobile terminal through the steps of S210-250 is judged to belong to the same face, the predetermined threshold refers to a deviation value of a user face image, which is stored in the mobile terminal in advance by the user and used for comparing the face image with the face image, of course, the deviation value may also be determined by directly comparing the face image with the user face image, and if the deviation value is greater than the predetermined threshold, the face image data is not considered to belong to the same face.

In this embodiment, if the deep separable convolution model includes: when M input channels, N output channels, a depth convolution filter, and a point-by-point convolution filter are used, the first convolution operation is implemented by the depth convolution filter in the model, the depth convolution filter performs a simplified processing on the input face image (i.e., a picture) by performing a feature, that is, the picture with large pixels is cut into a picture formed by fusing fragments of a plurality of pixels based on a segmentation processing, the depth convolution filter is provided with a plurality of input channels, each of the input channels performs the same processing on the picture when passing through, and finally, the output results of the input channels can be output individually or combined into one and then output.

In this embodiment, the step of determining the convolution kernel size of the first convolution operation according to the number of input channels and output channels in the deep separable convolution model for step S220 includes:

m and N are integers which are larger than 1 respectively, Dx and Dy are the length and width of the face portrait data respectively, and Dx and Dy are positive integers which are larger than 1. Preferably, Dx and Dy are equal, i.e. the length and width of the input picture are the same

At this time, when performing the first convolution operation, specifically: the face portrait data is respectively input into M input channels, the input channels divide the face portrait data according to the size of the convolution kernel to obtain M first face image data, wherein the first face image data is a block set consisting of a plurality of blocks with different sizes;

And then, performing a second convolution kernel operation based on a result of the first convolution operation, specifically, sequentially inputting the face fragments output by the M input channels into the point-by-point convolution filter, and performing standard convolution calculation on the face fragments by the point-by-point convolution filter according to preset N unit convolution kernels with the size of 1 x 1 to obtain the face portrait pattern with the size of Df × Dg N, wherein Df is the length of the face portrait pattern, Dg is the width of the face portrait pattern, and Df and Dg are both positive integers greater than zero.

Preferably, the data of the input channel in the deep separable convolution model is used as the length and width of the convolution kernel of the first convolution operation, the face image data is input into each input channel based on the length and width, each input channel divides the face image data according to the determined length and width of the convolution kernel, and finally a plurality of face image small blocks with the sizes of the convolution kernels are output.

In practical application, the deep separable convolution model is a structural model with multiple layers, and when the model performs the first convolution operation on image data through the hole convolution operation, multiple paths of featuremaps can be obtained by using different hole rates for the same convolution kernel in the same hole convolution sharing layer of the deep separable convolution model, and finally the multiple paths of featuremaps are spliced to serve as the input featuremaps of the next layer. Specifically, the cavity convolution sharing layer is similar to an inclusion structure, and the multiple cavity convolution layers shared by parameters obtain feature maps of different receptive fields in the same layer, but different from the inclusion structure, but the specific method is suitable for the cavity convolution, so that fewer weight parameters are used, the model parameters are reduced, the calculation speed is accelerated, and the same precision is ensured, wherein the cavity convolution operation is used for extracting the feature maps of the face portrait data according to the pixel color levels.

Further, in the above exemplary embodiment, the same 3 × 3 convolution kernel is used in the multi-convolution layer, i.e., the weights of the convolution kernels are shared. The method enables multipath convolution in the same layer to capture the same features but with different scales; the method can be understood as regularization, multi-path and multi-scale features captured by intra-layer convolution are limited by sharing weight, and w is captured by the same type of features under the same feature map and at different sampling rates.

In practical application, this step is actually realized by a deep convolution filter in a deep separable convolution model, but the image segmentation mode adopted by the filter is realized by adopting a hole convolution mode, specifically, the deep convolution filter acquires a feature map by using different hole rates for different regions of face portrait data, preferentially selects and acquires the feature maps of three accounts of different hole rates, and then splices and fuses all the feature maps acquired from the same region, so as to obtain a convolution kernel (namely, the face portrait small block) corresponding to the region in the portrait.

In the step, the face portrait data is segmented in a mode of cavity convolution, and the multi-scale recognition capability of the portrait data can be obtained. And meanwhile, the method of sharing the weight greatly reduces the possibility of overfitting. The convolution of the multipath sharing weight is realized in a deepwise convolution mode, so that the effect of multi-scale identification is achieved in the parameter quantity with the same magnitude, and the precision is improved.

In this embodiment, when the convolution operation is output for the first time, the convolution operation may be output in the form of a synthesized picture, specifically:

inputting the face portrait data into the input channel, wherein the input channel divides the face portrait data according to the size of the convolution kernel and the step length coefficient to obtain M image blocks with different sizes, and the image block obtained in each input channel is selected according to a void rate and a sharing weight preset in the first convolution operation;

And performing further convolution kernel calculation on the face image element graph, namely inputting the face image element graph into a point-by-point convolution filter, and performing standard convolution calculation on the face image element graph by the point-by-point convolution filter according to preset N unit convolution kernels with the size of 1 x 1 to obtain the face image pattern with the size of Do Dp N, wherein Do is the length of the face image pattern, Dp is the width of the face image pattern, and both Do and Dp are positive integers larger than zero.

In the following description, taking a specific example as an example, the convolution process for the first convolution operation is specifically implemented by a deep convolution filter in a deep separable convolution model, and one deep convolution filter is provided in each input channel, and each deep convolution filter is responsible for only one channel, the input channels are not affected by each other between the input channels, and one channel is convolved by only one convolution kernel, and a 5 × 5 pixel image data is taken as an example for description:

for a 5 × 5 pixel, three-channel color input picture (shape is 5 × 5 × 3), the depth convolution filter first performs a first convolution operation, and a three-channel image generates 3 feature maps after the operation, as shown in fig. 3.

The second convolution operation is specifically realized by combining the same feature maps after the feature maps are extracted after the depth convolution filter is completed to form feature maps (namely, face portrait small blocks), and the number of the feature maps is the same as the number of input channels, and further combining the feature maps to generate a new feature map, specifically, by using the point-by-point convolution filter. After receiving the feature map output by the deep convolution filter, the point-by-point convolution filter performs convolution calculation in a standard convolution calculation mode, wherein the size of a convolution kernel used in the standard convolution calculation is 1 × 1, and M is the number of input channels. Therefore, the convolution operation performs weighted combination on the feature maps in the previous step in the depth direction to generate a new feature map convolution kernel. There are several convolution kernels with several outputs Featuremap. As shown in fig. 4.

In practical application, when performing image segmentation, 5 landrakes (centers of two eyes, two corner points of mouth corners, nose tips) are extracted from each face image. These picture blocks are put into a mobilene for feature extraction, and the mobilene network structure is described above. And training 5 CNNs, wherein each CNN outputs a feature length of 128, outputs 5 x 128 features, reduces the 128 x 5 dimensional features of each face to 256 dimensional length by PCA, calculates the joint Bayesian probability value of the 256 dimensional features, and compares the joint Bayesian probability value with a preset threshold value to judge whether the faces are the same face.

Extracting a characteristic spectrum of the face portrait data to be recognized through cavity convolution, then performing standard convolution calculation through a deep separable convolution model to obtain a face portrait pattern, further calculating a combined Bayesian probability value for the face portrait pattern, and judging whether the face portrait data belongs to the same face or not based on the probability value, thereby realizing face recognition; the method realizes the recognition of the human face, simplifies the characteristics of human face portrait data, greatly reduces the time complexity and the space complexity of the whole network, and reduces the parameter and the calculation amount of the operation.

Extracting a feature map from the face portrait data to be recognized through cavity convolution, performing standard convolution calculation through a deep separable convolution model to obtain a face portrait pattern, further calculating a combined Bayesian probability value for the face portrait pattern, and judging whether the face portrait data belongs to the same face or not based on the probability value so as to realize face recognition; the face recognition is realized through the method, the features of the face portrait data are simplified, the time complexity and the space complexity of the whole network are greatly reduced, the parameters and the calculated amount of operation are reduced, the execution rate of the face recognition by equipment is improved, and the use experience of a user is improved.

As shown in fig. 5, the present invention is a recognition process when a picture is recognized based on the face recognition method provided by the embodiment of the present invention.

Step S510, an input picture to be distinguished is acquired, and the convolution kernel size D _ k _ 1 of the convolution is determined.

Step S520, inputting the picture into each input channel in the deep separable convolution model, and performing feature simplification.

In this step, the characteristic simplification is to convolve each input channel with 1 convolution kernel of D _ k _ 1, and to use M convolution kernels in total, and to perform M operations to obtain M characteristic maps of D _ f _ 1. The characteristic maps are respectively learned from different input channels and are independent of each other.

Because there are M input channels in the convolution kernel model, it is necessary to calculate D _ f × D _ f values, and the calculation amount is D _ k × D _ k each time, and the cycle is performed M times.

Step S530, taking the obtained M feature maps as input of M channels, and performing standard convolution by using N convolution kernels of 1 × M to obtain output of D _ f × N.

In this step, the computation amount of point-by-point convolution of the M feature maps is a formula by standard convolution, where D _ k is 1 and the computation amount is 1 × M × N × D _ f.

And step S540, calculating a combined Bayesian probability value based on the image obtained by the standard convolution, and judging whether the images belong to the same face in the equipment according to the probability value.

Based on this, if the standard conventional convolution is used, the total amount of calculation is more, and if the 3 × 3 convolution kernel is used, the depthwiseseparatableconolution can be reduced by about 9 times compared with the standard convolution.

In order to solve the above problem, an embodiment of the present invention further provides a face recognition apparatus, and referring to fig. 6, fig. 6 is a schematic diagram of functional modules of the face recognition apparatus provided in the embodiment of the present invention. In this embodiment, the apparatus comprises: the system comprises an acquisition module 61, a first convolution module 62, a second convolution module 63 and an identification block 64;

the acquisition module 61 is used for acquiring face portrait data to be recognized;

a first convolution module 62, configured to determine a convolution kernel size of a first convolution operation according to the number of input channels and output channels in a pre-stored deep separable convolution model; inputting the face portrait data into the input channel, and performing a first convolution operation on the face portrait data according to the size of the convolution kernel to obtain a fragment set of the face portrait, wherein the fragment set comprises at least two face fragments, and the face fragments are displayed in the deep separable convolution model in the form of the convolution kernel;

a second convolution module 63, configured to perform a second convolution operation on the face fragments obtained through the first convolution operation to obtain a face portrait pattern, where the second convolution kernel operation is to perform cross-channel standard convolution calculation on the face fragments by using a unit convolution kernel as a convolution basis;

and the recognition module 64 is configured to calculate similarity between the face portrait pattern and a face portrait pre-stored in the mobile terminal according to a preset joint bayesian algorithm, and compare the similarity with a preset threshold value to determine whether the face portrait is located on the same face.

Based on the same embodiment as the face recognition method of the embodiment of the present invention, the contents of the embodiment of the face recognition device are not described in detail in this embodiment.

The invention also provides a computer readable storage medium.

In this embodiment, the computer-readable storage medium stores a face recognition program, and the face recognition program, when executed by the processor, implements the steps of the face recognition method described in any one of the above embodiments. The method implemented when the face recognition program is executed by the processor may refer to each embodiment of the face recognition method of the present invention, and thus, redundant description is not repeated.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM), and includes instructions for causing a terminal (e.g., a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present invention.

The present invention is described in connection with the accompanying drawings, but the present invention is not limited to the above embodiments, which are only illustrative and not restrictive, and those skilled in the art can make various changes without departing from the spirit and scope of the invention as defined by the appended claims, and all changes that come within the meaning and range of equivalency of the specification and drawings that are obvious from the description and the attached claims are intended to be embraced therein.

Claims

1. A face recognition method is applied to a mobile terminal, and is characterized by comprising the following steps:

acquiring face portrait data to be recognized;

2. The method of face recognition according to claim 1, wherein the deep separable convolution model comprises: m input channels, N output channels and a point-by-point convolution filter;

3. The method of claim 2, wherein the step of inputting the face image data into the input channel and performing a first convolution operation on the face image data according to the convolution kernel size to obtain a face image fragment set comprises:

4. The method of claim 3, wherein the step of performing a second convolution operation on the face fragments obtained by the first convolution operation to obtain a face portrait pattern comprises:

and sequentially inputting the face fragments output by the M input channels into the point-by-point convolution filter, and performing standard convolution calculation on the face fragments by the point-by-point convolution filter according to preset N unit convolution cores with the size of 1 x 1 to obtain the face portrait pattern with the size of Df x Dg N, wherein Df is the length of the face portrait pattern, Dg is the width of the face portrait pattern, and both Df and Dg are positive integers larger than zero.

5. The method of claim 2, wherein the step of inputting the face image data into the input channel and performing a first convolution operation on the face image data according to the convolution kernel size to obtain a face image fragment set comprises:

inputting the face portrait data into the input channel, so that the input channel divides the face portrait data according to the size of the convolution kernel and the step size coefficient to obtain M image blocks with different sizes, wherein the image block obtained in each input channel is obtained by selecting the input channel according to a void ratio and a sharing weight preset in the first convolution operation;

6. The face recognition method of claim 5, wherein the step of performing a second convolution operation on the face fragments obtained by the first convolution operation to obtain a face portrait pattern comprises:

7. The face recognition method according to any one of claims 1 to 6, wherein the pre-stored deep separable convolution model is trained by:

if not, the division specification is adjusted to process the face sample data until the deep separable convolution model is obtained.

8. A face recognition apparatus, characterized in that the face recognition apparatus comprises:

9. A face recognition apparatus, characterized in that the face recognition apparatus comprises: a memory, a processor and a face recognition program stored on the memory and executable on the processor, the face recognition program when executed by the processor implementing the steps of the face recognition method according to any one of claims 1-7.

10. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored thereon a face recognition program, which when executed by a processor implements the steps of the face recognition method according to any one of claims 1-7.