CN110929569B

CN110929569B - Face recognition method, device, equipment and storage medium

Info

Publication number: CN110929569B
Application number: CN201910990736.0A
Authority: CN
Inventors: 王健宗; 贾雪丽
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2019-10-18
Filing date: 2019-10-18
Publication date: 2023-10-31
Anticipated expiration: 2039-10-18
Also published as: WO2021073418A1; CN110929569A

Abstract

The invention relates to the technical field of artificial intelligence and discloses a face recognition method, which comprises the steps of performing convolution operation on face portrait data to be recognized twice according to an input channel and an output channel in a pre-stored depth separable convolution module, then calculating similarity of face portrait patterns obtained after the convolution operation, and judging whether the face portrait data belong to the same face based on the similarity; the invention also provides a face recognition device, equipment and a computer readable storage medium, which simplify the characteristics of face portrait data based on adopting two convolution operations, greatly reduce the time of the whole face recognition network, the complexity of recognition characteristics and the space complexity of the characteristics, reduce the parameter amount and the calculation amount of operation of a terminal in the recognition process, reduce the consumption of the terminal and improve the operation efficiency.

Description

Face recognition method, device, equipment and storage medium

Technical Field

The present invention relates to the field of artificial intelligence technologies, and in particular, to a face recognition method, apparatus, device, and storage medium.

Background

With the continuous development of face recognition technology, especially in the development of deep learning of recognition models, more and more applications are deployed into living scenes. Face recognition is a large component of the computer vision field, has become an important point of deep learning research, and has achieved relatively good accuracy. The accuracy of the deep learning network is basically realized by stacking convolution kernels, and the deeper and deeper convolution layers bring about great improvement of the parameter number, and the running time of the network is greatly prolonged although the accuracy of the network is improved to a certain extent.

The mobile phone is used as a carrier of daily life, more and more face recognition is performed through the mobile phone, but the current recognition technology is applied to a large network, the large network is suitable for being deployed at a server end, the calculation power requirement on equipment is high, the data processing capability of the current mobile phone terminal is not high, and if the current face recognition method is directly applied to the mobile phone, the operation speed and the use experience of the mobile phone are greatly affected. It is important to develop a neural network that can run on a cell phone.

Disclosure of Invention

The invention mainly aims to provide a face recognition method, a face recognition device, face recognition equipment and a storage medium, and aims to solve the technical problem that the running speed of a terminal is influenced and user experience is reduced due to overlarge running consumption of the existing face recognition.

In order to achieve the above object, the present invention provides a face recognition method, which includes the steps of:

acquiring face portrait data to be identified;

determining the convolution kernel size of the first convolution operation according to the number of input channels and output channels in a pre-stored deep separable convolution model;

Inputting the face portrait data into the input channel, and performing a first convolution operation on the face portrait data according to the size of the convolution kernel to obtain a fragment set of the face portrait, wherein the fragment set comprises at least two face fragments, and the face fragments are displayed in a form of the convolution kernel in the deep separable convolution model;

performing a second convolution operation on the face fragments obtained through the first convolution operation to obtain a face portrait pattern, wherein the second convolution operation is to use a unit convolution kernel as a convolution basis to perform cross-channel standard convolution calculation on the face fragments;

and calculating the similarity between the face portrait pattern and the face portrait stored in the mobile terminal in advance according to a preset joint Bayesian algorithm, and comparing the similarity with a preset threshold value to identify whether the face portrait pattern and the face portrait in the mobile terminal belong to the same person.

Optionally, if the deep separable convolution model includes: m input channels, N output channels, and a point-wise convolution filter;

the step of determining the convolution kernel size of the first convolution operation according to the number of input channels and output channels in the pre-stored deep separable convolution model comprises the following steps:

Detecting pixel values Dx Dy of the face portrait data, and extracting step coefficients of each time set in the first convolution operation;

subtracting the step length coefficient from the pixel value to obtain the size of the convolution kernel;

wherein M and N are integers larger than 1, dx and Dy are the length and width of the face image data, and Dx and Dy are positive integers larger than 1.

Optionally, the step of inputting the face portrait data to the input channel and performing a first convolution operation on the face portrait data according to the convolution kernel size to obtain a fragment set of the face portrait includes:

the face portrait data are respectively input into M input channels, the input channels divide the face portrait data according to the size of the convolution kernel to obtain M first face image data, wherein the first face image data are a block set formed by a plurality of blocks with different sizes;

and screening the image block set according to the preset void ratio and the sharing weight in the first convolution operation to obtain a fragment set comprising a plurality of different face fragments.

Optionally, the step of performing a second convolution operation on the face fragments obtained by the first convolution operation to obtain a face portrait pattern includes:

And sequentially inputting the face fragments output by the M input channels into the point-by-point convolution filter, so that the point-by-point convolution filter performs standard convolution calculation according to preset N unit convolution checks with the size of 1*1 to obtain the face portrait pattern with the size of Df and the size of Dg, wherein Df is the length of the face portrait pattern, dg is the width of the face portrait pattern, and Df and Dg are positive integers greater than zero.

inputting the face portrait data into the input channel, so that the face portrait data are divided according to the size of the convolution kernel and the step size coefficient in the input channel to obtain M tiles with different sizes, wherein the tiles obtained in each input channel are obtained by selecting the input channel according to the preset void rate and the sharing weight in the first convolution operation;

fusing M blocks to obtain a face portrait element diagram with the size of Di Dj M;

Where Di is the length of the tile, dj is the width of the tile, and Di and Dj are positive integers greater than zero.

inputting the face portrait element diagram into the point-by-point convolution filter, so that the point-by-point convolution filter performs standard convolution calculation according to a preset N unit convolution check of 1*1 to obtain the face portrait pattern of which the size is Do x Dp x N, wherein Do is the length of the face portrait pattern, dp is the width of the face portrait pattern and Do and Dp are positive integers larger than zero.

Optionally, the pre-stored deep separable convolution model is trained by:

according to a separation convolution algorithm, constructing a model training framework of the deep separable convolution model, wherein the model training framework comprises an input channel, an output channel, a deep convolution training unit and a point-by-point convolution training unit, the deep convolution training unit obtains a deep convolution filter through training, and the point-by-point convolution training unit obtains a point-by-point convolution filter through training;

Determining a division specification for dividing the face image based on the number of input channels and output channels of the model training frame, wherein the division specification comprises the length and the width of a small image and the size of a unit convolution kernel of convolution calculation;

acquiring face sample data from a website, and dividing the face sample data according to the division specification and the five sense organs region to obtain a plurality of small images;

sequentially inputting the small images into the deep convolution training unit to extract features, synthesizing the extracted features into a feature map, and outputting M human face fragment samples;

inputting M face fragment samples into the point-by-point convolution training unit, and carrying out conversion convolution calculation on the face fragment samples by taking the unit convolution kernel as a convolution basis to obtain face portrait samples;

judging whether the similarity between the face portrait sample and the corresponding original small image reaches a preset similarity value or not;

if yes, the division specification is used as a model parameter, original parameters of the model training frame are modified, and the deep separable convolution model is obtained and stored;

and if not, readjusting the division specification to process the face sample data until the deep separable convolution model is obtained.

In addition, to achieve the above object, the present invention also provides a face recognition apparatus including:

the acquisition module is used for acquiring face portrait data to be identified;

the first convolution module is used for determining the convolution kernel size of the first convolution operation according to the number of input channels and output channels in a pre-stored deep separable convolution model; inputting the face portrait data into the input channel, and performing a first convolution operation on the face portrait data according to the size of the convolution kernel to obtain a fragment set of the face portrait, wherein the fragment set comprises at least two face fragments, and the face fragments are displayed in a form of the convolution kernel in the deep separable convolution model;

the second convolution module is used for carrying out a second convolution operation on the face fragments obtained through the first convolution operation to obtain a face portrait pattern, wherein the second convolution kernel operation is to carry out cross-channel standard convolution calculation on the face fragments by taking a unit convolution kernel as a convolution basis;

and the identification module is used for calculating the similarity between the face portrait pattern and the face portrait stored in the mobile terminal in advance according to a preset joint Bayesian algorithm, and comparing the similarity with a preset threshold value to identify whether the face portrait pattern and the face portrait in the mobile terminal belong to the same person.

In another embodiment of the present invention, if the deep separable convolution model includes: m input channels, N output channels, and a point-wise convolution filter;

the first convolution module comprises a calculation unit, a first convolution module and a second convolution module, wherein the calculation unit is used for detecting pixel values Dx and Dy of the face portrait data and extracting step length coefficients of each time set in the first convolution operation; and subtracting the step length coefficient from the pixel value to obtain the size of the convolution kernel, wherein M and N are integers larger than 1, dx and Dy are the length and the width of the face image data, and Dx and Dy are positive integers larger than 1.

In another embodiment of the present invention, the first convolution module further includes a deep convolution filtering unit, configured to input the face image data to M input channels, where the input channels divide the face image data according to the convolution kernel size to obtain M first face image data, where the first face image data is a tile set formed by a plurality of tiles with different sizes; and screening the image block set according to the preset void ratio and the sharing weight in the first convolution operation to obtain a fragment set comprising a plurality of different face fragments.

In another embodiment of the present invention, the second convolution module includes a point-by-point convolution filtering unit, configured to sequentially input the face fragments output by the M input channels into the point-by-point convolution filter, where the point-by-point convolution filter performs standard convolution calculation according to preset N unit convolution checks of 1*1 on the face fragments, to obtain the face image pattern with a size of df×dg×n, where Df is a length of the face image pattern, dg is a width of the face image pattern, and Df and Dg are both positive integers greater than zero.

In another embodiment of the present invention, the depth convolution filtering unit is further configured to input the face image data into the input channel, so that the face image data is divided according to the size of the convolution kernel and the step size coefficient in the input channel to obtain M tiles with different sizes, where the tiles obtained in each input channel are obtained by selecting the input channel according to a preset void ratio and a preset sharing weight in the first convolution operation; and fusing M image blocks to obtain a face image element diagram with the size of Di.Dj.M, wherein Di is the length of the image block, dj is the width of the image block, and Di and Dj are positive integers larger than zero.

In another embodiment of the present invention, the point-by-point convolution filtering unit is further configured to input the face image element map to the point-by-point convolution filter, so that the point-by-point convolution filter performs standard convolution calculation according to a preset N unit convolution check of 1*1 on the face image element map, to obtain the face image pattern with a size of Do x Dp x N, where Do is a length of the face image pattern, dp is a width of the face image pattern, and Do and Dp are both positive integers greater than zero.

In another embodiment of the present invention, the first convolution module is further configured to train the deep separable convolution model by:

In addition, to achieve the above object, the present invention also provides a face recognition apparatus including: memory, a processor and a face recognition program stored on the memory and executable on the processor, which when executed by the processor, performs the steps of the face recognition method as claimed in any one of the preceding claims.

In addition, in order to achieve the above object, the present invention also provides a computer-readable storage medium having stored thereon a face recognition program which, when executed by a processor, implements the steps of the face recognition method according to any one of the above.

The face recognition method provided by the invention is that the face portrait data to be recognized is subjected to convolution operation twice according to the input channel and the output channel in the pre-stored depth separable convolution module, then the similarity is calculated on the face portrait pattern obtained after the convolution operation, whether the face portrait data belongs to the same face is judged based on the similarity, and the characteristics of the face portrait data are simplified through the two convolution operations, so that the time of the whole face recognition network, the complexity of recognition characteristics and the spatial complexity of the characteristics are greatly reduced, the parameter quantity and the calculated quantity of the terminal in the recognition process are reduced, the consumption of the terminal is reduced, and the operation efficiency is improved.

Drawings

Fig. 1 is a schematic structural diagram of an operating environment of a mobile terminal according to an embodiment of the present invention;

fig. 2 is a schematic flow chart of a first embodiment of a face recognition method provided by the present invention;

Fig. 3 is a schematic structural diagram of a facial expression modification model provided by the invention;

FIG. 4 is a schematic diagram of the present invention after the facial image in the photo is transformed;

FIG. 5 is a schematic diagram of a training process of a facial expression modification model according to the present invention;

fig. 6 is a schematic diagram of functional modules of the face recognition device according to the present invention.

The achievement of the objects, functional features and advantages of the present invention will be further described with reference to the accompanying drawings, in conjunction with the embodiments.

Detailed Description

It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.

The invention provides a face recognition device which can be a plug-in a mobile terminal and is used for executing the face recognition method provided by the embodiment of the invention, as shown in fig. 1, fig. 1 is a schematic structural diagram of an operation environment of the mobile terminal according to the embodiment of the invention.

As shown in fig. 1, the mobile terminal includes: a processor 101, e.g. a CPU, a communication bus 102, a user interface 103, a network interface 104, a memory 105. Wherein the communication bus 102 is used to enable connected communication between these components. The user interface 103 may comprise a Display screen (Display), an input unit such as a Keyboard (Keyboard), and the network interface 104 may optionally comprise a standard wired interface, a wireless interface (e.g., WI-FI interface). The memory 105 may be a high-speed RAM memory or a stable memory (non-volatile memory), such as a disk memory. The memory 105 may alternatively be a storage system separate from the aforementioned processor 101.

It will be appreciated by those skilled in the art that the hardware structure of the mobile terminal shown in fig. 1 does not constitute a limitation of the face recognition apparatus or device, and may include more or less components than those illustrated, or may combine certain components, or may be arranged in different components.

As shown in fig. 1, an operating system, a network communication module, a user interface module, and a face recognition program for implementing expression generation may be included in the memory 105 as one type of computer-readable storage medium. The operating system is a program for managing and controlling the face recognition device and the software resource call in the memory, and supports the operation of the face recognition program and other software and/or programs.

In the hardware architecture of the mobile terminal shown in fig. 1, the network interface 104 is mainly used for accessing the network; the user interface 103 is mainly used for face portrait data to be recognized, and some parameters such as requirements when recognizing a face, and the processor 101 may be used to call a face recognition program stored in the memory 105 and perform operations of the following embodiments of the face recognition method.

Based on the hardware structure of the mobile terminal, the invention provides a face recognition method which is mainly applied to small terminal equipment, such as mobile phones, IPAD (internet protocol security), cameras and the like, and referring to fig. 2, fig. 2 is a flow chart of the face recognition method provided by the embodiment of the invention. In this embodiment, the face recognition method specifically includes the following steps:

Step S210, obtaining face portrait data to be identified;

in the step, the face portrait data is specifically obtained by shooting and collecting through a camera on the smart phone, and after a photo is shot, a face image is extracted from the photo according to a portrait extraction technology, so that the extracted face image is used as the face portrait data to be identified. In practical application, the image extraction can be realized by distinguishing the foreground image and the background image in the photo through a picture extraction technology.

In this embodiment, in the case of performing the recognition processing of the convolution calculation on the face image data, the calculation is generally performed by using a convolution kernel, that is, the feature map extraction is performed by converting the face image data into the convolution kernel.

Step S220, determining the convolution kernel size of the first convolution operation according to the number of input channels and output channels in a pre-stored deep separable convolution model;

in this step, the deep separable convolution model refers to a model with at least two layers of structures, specifically, the model is obtained through model training, and in the training process, the model can be understood as two layers of training, namely, deep convolution training and point-by-point convolution training, and the specific implementation processes of the two layers are as follows:

In practical application, firstly, acquiring preset division specifications of face images; the division specification is the length and width of the small image and the size of a unit convolution kernel of convolution calculation; the mobile terminal obtains face sample data from a website through the Internet, and the division specification divides the face image according to the five sense organs area to obtain a plurality of small images;

then, sequentially inputting the small images into the depth convolution filter to extract the features, synthesizing the extracted features into a feature map, and outputting M face portrait small blocks with the size of Di;

further, a training unit of a point-by-point convolution filter in the deep separable convolution model and the unit convolution check are used for carrying out conversion convolution calculation on the face portrait small block to obtain a face portrait sample;

Finally, comparing the face portrait pattern with a corresponding small image;

if the comparison is consistent, a new deep separable convolution model is generated according to the division specification;

if the comparison is inconsistent, readjusting the division specification and performing model training.

Step S230, inputting the face portrait data into the input channel, and performing a first convolution operation on the face portrait data according to the convolution kernel size to obtain a fragment set of the face portrait, wherein the fragment set comprises at least two face fragments, and the face fragments are displayed in a convolution kernel form in the pre-stored deep separable convolution model;

in this embodiment, the first convolution operation may be understood as an image segmentation operation, specifically, extracting features of face portrait data, extracting different face features in the face portrait, where each face feature corresponds to a face fragment.

The face portrait data are divided into a plurality of regional small blocks, at least three characteristic maps are obtained for different receptive field angles of each regional small block, the three characteristic maps are mutually fused to obtain a characteristic map corresponding to each regional small block, and the characteristic maps corresponding to each regional small block are spliced to convert the face portrait data into a convolution kernel.

In this step, the process may be directly obtained by converting the mobile terminal synchronously in the process of collecting the face photo of the user, that is, the camera of the mobile terminal is not necessarily an existing camera, but shooting is performed in an image convolution mode, specifically in a pixel grid mode, when the face image data is segmented by adopting hole convolution, the feature map may be extracted according to the level of the pixel grid, and then the feature map is spliced and synthesized to obtain a complete feature map, where, of course, the feature map is obtained in a certain size than the feature map obtained in the same size, and the feature map may be obtained by enlarging or reducing the size obtained for the first time.

Step S240, performing a second convolution operation on the face fragments obtained through the first convolution operation to obtain a face portrait pattern, wherein the second convolution operation is to perform cross-channel standard convolution calculation on the face fragments by taking a unit convolution kernel as a convolution basis;

in this embodiment, the pre-stored deep separable convolution model includes M input channels, N output channels, a depth convolution filter and a point-to-point convolution filter, where the first convolution operation is implemented by the depth convolution filter, specifically, the size of a convolution kernel is built based on output requirements of the M input channels and the N output channels, and the face image data is segmented according to the built convolution kernel size by adopting hole convolution, so that after M face image patches are obtained, the M face image patches are respectively input into the M input channels, and then the point-to-point convolution filter performs cross-channel standard convolution calculation, so as to obtain a face image pattern.

In this step, when performing a cross-channel standard convolution calculation for the point-by-point convolution filter, it is specifically necessary to perform the calculation in combination with a unit convolution kernel, and the unit convolution kernel preferably uses a convolution kernel with a length and a width of 1, respectively. In the step, every input face image small block outputs N face image patterns after standard convolution calculation.

Step S250, calculating the similarity between the face portrait pattern and the face portrait pre-stored in the mobile terminal according to a preset joint Bayesian algorithm, and comparing the similarity with a preset threshold value to judge whether the face portrait pattern and the face portrait are on the same face.

In the embodiment, the probability calculation is adopted for the calculated face portrait pattern, so that the face portrait pattern can be further screened, the face portrait pattern is more similar to the face portrait data, and the recognition accuracy is improved. In practical application, if the calculated joint Bayesian probability value is not larger than a preset threshold value, the recognized face portrait data are considered to belong to the same face, otherwise, the recognized face portrait data do not belong to the same face.

In practical application, when the face portrait data identified in the steps S210-250 are used in the mobile terminal and whether the face portrait data belongs to the same face is judged, the predetermined threshold value refers to a deviation value of a face image of a user stored in the mobile terminal in advance and used for comparing the face portrait, and certainly, the deviation value can also be directly compared with the face image of the user to determine the deviation value, and if the deviation value is larger than the predetermined threshold value, the face portrait data is not considered to belong to the same face.

In this embodiment, if the deep separable convolution model includes: when the M input channels, the N output channels, the depth convolution filter and the point-by-point convolution filter are used, the first convolution operation is implemented through the depth convolution filter in the model, the depth convolution filter performs feature simplification processing on an input face image (namely, a picture), namely, the picture with large pixels can be understood as being a picture formed by fusing fragments of a plurality of pixels based on segmentation processing, a plurality of input channels are arranged in the depth convolution filter, the same processing is performed on the picture after each pass, and finally, the output result of the plurality of input channels can be output independently or be output after being synthesized.

In this embodiment, the step of determining the convolution kernel size of the first convolution operation according to the number of input channels and output channels in the deep separable convolution model for step S220 includes:

Wherein M and N are integers larger than 1, dx and Dy are the length and width of the face image data, and Dx and Dy are positive integers larger than 1. Preferably, dx and Dy are equal, i.e. the length and width of the input picture are the same

At this time, when the first convolution operation is performed, specifically: the face portrait data are respectively input into M input channels, the input channels divide the face portrait data according to the size of the convolution kernel to obtain M first face image data, wherein the first face image data are a block set formed by a plurality of blocks with different sizes;

And then, based on the result of the first convolution operation, performing a second convolution kernel operation, namely sequentially inputting the face fragments output by the M input channels into the point-by-point convolution filter, and performing standard convolution calculation on the face fragments by the point-by-point convolution filter according to preset N unit convolution cores with the size of 1*1 to obtain the face image pattern with the size of Df and the size of Dg and N, wherein Df is the length of the face image pattern, dg is the width of the face image pattern, and Df and Dg are positive integers larger than zero.

Preferably, the length and width of the convolution kernel of the first convolution operation are taken as the data of the input channels in the deep separable convolution model, the face portrait data are input into each input channel based on the length and width, each input channel divides the face portrait data according to the determined length and width of the convolution kernel, and finally a plurality of face portrait small blocks with the size of the convolution kernel are output.

In practical application, the deep separable convolution model is a structural model with multiple layers, and when the model performs first convolution operation segmentation on image data through hole convolution operation, multiple paths of featuremap are obtained by using different hole rates through the same convolution kernel in the same hole convolution sharing layer of the deep separable convolution model, and finally the multiple paths of featuremap are spliced to be used as input featuremap of the next layer. Specifically, the cavity convolution sharing layer is similar to an acceptance structure, the multipath cavity convolution layer with shared parameters is similar to the acceptance structure, and in the same layer, feature graphs of different receptive fields are obtained, but different from the acceptance, but the specific method is that cavity convolution is applied, so that fewer weight parameters are used, model parameters are reduced, calculation speed is increased, and meanwhile, the same precision is ensured, wherein the cavity convolution operation is that feature graph extraction operation is performed on face image data according to pixel color layers.

Further, in the above example, the same 3*3 convolution kernel is used in the multipath convolution layer, i.e. the weights of the convolution kernels are shared. The method enables the multipath convolution in the same layer to capture the characteristics of the same scale but different scales; the method can be understood as regularization, by sharing weights, the multi-path and multi-scale features captured by intra-layer convolution are limited, and w is captured by the same type of features under different sampling rates of the same feature map.

In practical application, the step is actually realized by a depth convolution filter in a deep separable convolution model, but the image segmentation mode adopted by the filter is realized by adopting a cavity convolution mode, specifically, the depth convolution filter obtains a characteristic map by utilizing different cavity rates for different areas of face image data, preferably selects and obtains the characteristic maps with different cavity rates of three accounts, and then performs splicing and fusion on all the characteristic maps obtained from the same area, so as to obtain a convolution kernel (namely the face image small block) corresponding to the area in the image.

In this step, face portrait data is divided by means of hole convolution, and the ability of multi-scale recognition can be obtained for portrait data. The weight sharing mode also greatly reduces the possibility of overfitting. The convolution of the multipath shared weights is realized by using a deepwise convolution form, so that the effect of multi-scale identification is achieved in parameter quantities of the same magnitude, and the precision is improved.

In this embodiment, the output may also be in the form of a synthesized picture when the first convolution operation is output, specifically:

inputting the face portrait data into the input channels, and dividing the face portrait data by the input channels according to the convolution kernel size and the step size coefficient to obtain M tiles with different sizes, wherein the tiles obtained in each input channel are obtained by selecting according to the preset void rate and sharing weight in the first convolution operation;

And performing further convolution kernel calculation on the face portrait element diagram, namely inputting the face portrait element diagram into the point-by-point convolution filter, and performing standard convolution calculation on the face portrait element diagram by the point-by-point convolution filter according to preset N unit convolution cores with the size of 1*1 to obtain the face portrait pattern with the size of Do x Dp x N, wherein Do is the length of the face portrait pattern, dp is the width of the face portrait pattern and is a positive integer greater than zero.

In the following description, a specific example is taken as an example, and the convolution process for the first convolution operation is specifically implemented by a depth convolution filter in a deep separable convolution model, where each input channel is provided with a depth convolution filter, and each depth convolution filter is only responsible for one channel, and the input channels do not affect each other between the input channels, and one channel is only convolved by one convolution kernel, and the following description is taken as image data of one 5×5 pixel:

for a 5×5 pixel, three-channel color input picture (shape is 5×5×3), the depth convolution filter first performs a first convolution operation, and a three-channel image generates 3 feature maps after the operation, as shown in fig. 3.

The second convolution operation is specifically implemented by combining the same feature patterns after the feature patterns are extracted after the deep convolution filter is completed, so as to form feature patterns (namely face image small blocks), the number of the feature patterns is the same as the number of input channels, and further combining the feature patterns to generate new feature line patterns, specifically implemented by the point-by-point convolution filter. After receiving the feature map output by the depth convolution filter, the point-by-point convolution filter performs convolution calculation in a standard convolution calculation mode, and the size of a convolution kernel used in standard convolution calculation is 1×1, and m is the number of input channels. The convolution operation here will weight-combine the feature map of the previous step in the depth direction to generate a new feature map convolution kernel. There are several convolution kernels with several outputs Featuremap. As shown in fig. 4.

In practical application, when image segmentation is performed, specifically, 5 landmarks (center of two eyes, two corner points of mouth corner, nose tip) are extracted for each face image. These picture blocks are put into a mobilet for feature extraction, the mobilet network structure being described above. And then training 5 CNNs, wherein each CNN outputs a feature length of 128, 5 x 128 features are output, the 128 x 5-dimensional features of each face are reduced to 256-dimensional length by PCA, the combined Bayesian probability value of the features with the length of 256 dimensions is calculated, and the features are compared with a preset threshold value to judge whether the features are the same face.

Extracting a characteristic map of face portrait data to be identified through cavity convolution, and then carrying out standard convolution calculation through a deep separable convolution model to obtain a face portrait pattern, further calculating a joint Bayesian probability value of the face portrait pattern, and judging whether the face portrait pattern belongs to the same face or not based on the probability value, so that face identification is realized; the face recognition is realized by the method, the characteristics of face portrait data are simplified, the time complexity and the space complexity of the whole network are greatly reduced, and the parameter and the calculated amount of operation are reduced.

Based on the feature map extraction of the face portrait data to be identified through the cavity convolution, then carrying out standard convolution calculation through a deep separable convolution model to obtain a face portrait pattern, further calculating a joint Bayesian probability value for the face portrait pattern, and judging whether the face portrait pattern belongs to the same face or not based on the probability value, so that face identification is realized; the face recognition is realized by the method, the characteristics of face portrait data are simplified, the time complexity and the space complexity of the whole network are greatly reduced, the parameter and the calculated amount of operation are reduced, the execution rate of equipment on face recognition is improved, and the use experience of a user is improved.

As shown in fig. 5, the identification process is performed when the image is identified according to the face recognition method provided by the embodiment of the present invention.

Step S510, an input picture to be distinguished is obtained, and the convolution kernel size d_k×d_k×1 of the convolution is determined.

In step S520, the picture is input to each input channel in the deep separable convolution model, so as to simplify the features.

In this step, the feature simplification is specifically to convolve each input channel with 1 convolution kernel of d_k×d_k×1, and use M convolution kernels in total, and operate M times to obtain M feature maps of d_f×d_f×1. The feature maps are learned from different channels of input, respectively, independent of each other.

Since there are M input channels in the convolution kernel model, d_f values need to be calculated, where the calculation amount is d_k for each time, and the cycle is M times.

In step S530, the obtained M feature maps are used as input of M channels, and standard convolution is performed with N convolution kernels of 1×1×m, so as to obtain an output of d_f×d_f×n.

In this step, the calculated amount of point-by-point convolution of the M feature maps is a formula of convolution according to the standard, where d_k=1, and the calculated amount is 1×1×m×n×d_f×d_f.

Step S540, based on the pictures obtained by standard convolution, a joint Bayesian probability value is calculated, and whether the pictures belong to the same face in the equipment is judged according to the probability value.

Based on this, the total amount of computation is more if standard conventional convolution is used, and the amount of computation can be reduced by about 9 times compared to standard convolution if 3x3 convolution kernel is used.

In order to solve the above-mentioned problems, the embodiment of the present invention further provides a face recognition device, and referring to fig. 6, fig. 6 is a schematic diagram of functional modules of the face recognition device according to the embodiment of the present invention. In this embodiment, the apparatus includes: an acquisition module 61, a first convolution module 62, a second convolution module 63, an identification block 64;

The acquisition module 61 is used for acquiring face portrait data to be identified;

a first convolution module 62, configured to determine a convolution kernel size of the first convolution operation according to the number of input channels and output channels in the pre-stored deep separable convolution model; inputting the face portrait data into the input channel, and performing a first convolution operation on the face portrait data according to the size of the convolution kernel to obtain a fragment set of the face portrait, wherein the fragment set comprises at least two face fragments, and the face fragments are displayed in a form of the convolution kernel in the deep separable convolution model;

a second convolution module 63, configured to perform a second convolution operation on the face fragment obtained by the first convolution operation to obtain a face portrait pattern, where the second convolution kernel operation uses a unit convolution kernel as a convolution basis, and performs cross-channel standard convolution calculation on the face fragment;

the recognition module 64 is configured to calculate a similarity between the face portrait pattern and a face portrait stored in the mobile terminal according to a preset joint bayesian algorithm, and compare the similarity with a preset threshold to determine whether the face portrait pattern and the face portrait pattern are on the same face.

The embodiment of the face recognition device according to the present invention is not described in detail because the embodiment of the face recognition device according to the present invention is described in detail in the same manner as the face recognition method according to the embodiment of the present invention.

The invention also provides a computer readable storage medium.

In this embodiment, the computer-readable storage medium stores a face recognition program, which when executed by a processor, implements the steps of the face recognition method described in any one of the above embodiments. The method implemented when the face recognition program is executed by the processor may refer to various embodiments of the face recognition method of the present invention, and thus will not be described in detail.

From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. ROM/RAM), comprising instructions for causing a terminal (which may be a mobile phone, a computer, a server or a network device, etc.) to perform the method according to the embodiments of the present invention.

While the embodiments of the present invention have been described above with reference to the drawings, the present invention is not limited to the above-described embodiments, which are merely illustrative and not restrictive, and many modifications may be made thereto by those of ordinary skill in the art without departing from the spirit of the present invention and the scope of the appended claims, which are to be accorded the full scope of the present invention as defined by the following description and drawings, or by any equivalent structures or equivalent flow changes, or by direct or indirect application to other relevant technical fields.

Claims

1. The face recognition method is applied to the mobile terminal and is characterized by comprising the following steps of:

acquiring face portrait data to be identified;

determining a convolution kernel size of a first convolution operation according to the number of input channels and output channels in a pre-stored deep separable convolution model, the deep separable convolution model comprising: m input channels, N output channels, and a point-wise convolution filter;

Performing a second convolution operation on the face fragments obtained through the first convolution operation to obtain a face portrait pattern, wherein the second convolution operation is to perform cross-channel standard convolution calculation on the face fragments by taking a unit convolution kernel as a convolution basis;

according to a preset joint Bayesian algorithm, calculating the similarity between the face portrait pattern and a face portrait stored in the mobile terminal in advance, and comparing the similarity with a preset threshold value to identify whether the face portrait pattern and the face portrait in the mobile terminal belong to the same person or not;

subtracting the step size coefficient from the pixel value to obtain the convolution kernel size;

wherein M and N are integers larger than 1, dx and Dy are the length and width of the face portrait data, and Dx and Dy are positive integers larger than 1;

The step of performing a second convolution operation on the face fragments obtained through the first convolution operation to obtain a face portrait pattern comprises the following steps:

and sequentially inputting the face fragments output by the M input channels into the point-by-point convolution filter, wherein the point-by-point convolution filter performs standard convolution calculation on the face fragments according to preset N unit convolution cores with the size of 1*1 to obtain the face portrait pattern with the size of Df and the size of Dg, wherein Df is the length of the face portrait pattern, dg is the width of the face portrait pattern, and Df and Dg are positive integers larger than zero.

2. The face recognition method of claim 1, wherein the step of inputting the face representation data to the input channel and performing a first convolution operation on the face representation data according to the convolution kernel size to obtain a set of patches of the face representation comprises:

the face portrait data are respectively input into M input channels, the input channels divide the face portrait data according to the convolution kernel size to obtain M first face image data, wherein the first face image data are a block set formed by a plurality of blocks with different sizes;

3. The face recognition method of claim 1, wherein the step of inputting the face representation data to the input channel and performing a first convolution operation on the face representation data according to the convolution kernel size to obtain a set of patches of the face representation comprises:

inputting the face portrait data into the input channels, so that the input channels divide the face portrait data according to the convolution kernel size and the step size coefficient to obtain M tiles with different sizes, wherein the tiles obtained in each input channel are obtained by selecting the input channels according to the preset void rate and the sharing weight in the first convolution operation;

4. A face recognition method according to claim 3, wherein said step of performing a second convolution operation on said face fragments obtained by said first convolution operation to obtain a face representation pattern comprises:

5. The face recognition method of any one of claims 1-4, wherein the pre-stored deep separable convolution model is trained by:

and if not, adjusting the division specification to process the face sample data until the deep separable convolution model is obtained.

6. A face recognition device, characterized in that the face recognition device comprises:

The first convolution module is configured to determine a convolution kernel size of a first convolution operation according to a number of input channels and output channels in a pre-stored deep separable convolution model, where the deep separable convolution model includes: m input channels, N output channels, and a point-wise convolution filter; inputting the face portrait data into the input channel, and performing a first convolution operation on the face portrait data according to the size of the convolution kernel to obtain a fragment set of the face portrait, wherein the fragment set comprises at least two face fragments, and the face fragments are displayed in a form of the convolution kernel in the deep separable convolution model;

the second convolution module is used for carrying out a second convolution operation on the face fragments obtained through the first convolution operation to obtain a face portrait pattern, wherein the second convolution operation is to carry out cross-channel standard convolution calculation on the face fragments by taking a unit convolution kernel as a convolution basis;

the recognition module is used for calculating the similarity between the face portrait pattern and the face portrait stored in the mobile terminal in advance according to a preset joint Bayesian algorithm, and comparing the similarity with a preset threshold value to recognize whether the face portrait pattern and the face portrait in the mobile terminal belong to the same person or not;

The first convolution module comprises a calculation unit, a first convolution module and a second convolution module, wherein the calculation unit is used for detecting pixel values Dx and Dy of the face portrait data and extracting step length coefficients of each time set in the first convolution operation; subtracting the step length coefficient from the pixel value to obtain the size of the convolution kernel, wherein M and N are integers larger than 1, dx and Dy are the length and width of the face image data respectively, and Dx and Dy are positive integers larger than 1;

the second convolution module comprises a point-by-point convolution filtering unit, and is used for sequentially inputting the face fragments output by the M input channels into the point-by-point convolution filter, the point-by-point convolution filter performs standard convolution calculation according to preset N unit convolution checks with the size of 1*1, and the face portrait pattern with the size of Df and the size of Dg and N is obtained, wherein Df is the length of the face portrait pattern, dg is the width of the face portrait pattern, and Df and Dg are positive integers larger than zero.

7. The face recognition device of claim 6, wherein the first convolution module further comprises:

the depth convolution filtering unit is used for respectively inputting the face portrait data to M input channels, and the input channels divide the face portrait data according to the convolution kernel size to obtain M first face image data, wherein the first face image data is a block set formed by a plurality of blocks with different sizes;

8. The face recognition apparatus of claim 7, wherein the depth convolution filtering unit is further configured to:

9. An electronic device, characterized in that the electronic device comprises: memory, a processor and a face recognition program stored on the memory and executable on the processor, which when executed by the processor, implements the steps of the face recognition method according to any one of claims 1-5.

10. A computer readable storage medium, characterized in that the computer readable storage medium has stored thereon a face recognition program, which when executed by a processor, implements the steps of the face recognition method according to any one of claims 1-5.