CN114120430B

CN114120430B - Mask face recognition method based on double-branch weight fusion homology self-supervision

Info

Publication number: CN114120430B
Application number: CN202210090466.XA
Authority: CN
Inventors: 王东; 肖传宝; 王月平
Original assignee: Hangzhou Moredian Technology Co ltd
Current assignee: Hangzhou Moredian Technology Co ltd
Priority date: 2022-01-26
Filing date: 2022-01-26
Publication date: 2022-04-22
Anticipated expiration: 2042-01-26
Also published as: CN114120430A

Abstract

The application relates to a mask face recognition method based on double-branch weight fusion homology self-supervision, wherein the method comprises the following steps: dynamically adjusting the learning rate and the online mask augmentation ratio by a cosine curve alternating optimization method, and training a basic face recognition model to obtain a trained bottom layer feature sharing network model; respectively training a left branch sub-model and a right branch sub-model in a face recognition double-branch model by a double-branch weight fusion homology supervision loss function and a face recognition Arcface loss function, and performing weight fusion on the trained left branch sub-model and right branch sub-model to obtain a high-level semantic fusion network model; and splicing the trained bottom layer feature sharing network model and the high-layer semantic fusion network model to obtain a final face recognition prediction model. Through this application, solved the industry difficult problem of how to promote gauze mask face through rate, increased substantially the discernment through rate of gauze mask face, improved the discernment precision, the cost is reduced.

Description

Mask face recognition method based on double-branch weight fusion homology self-supervision

Technical Field

The application relates to the technical field of face recognition, in particular to a mask face recognition method based on double-branch weight fusion homology self-supervision.

Background

At present, wearing the mask as one of important epidemic prevention means becomes a normal state in life and work of people, and on the premise of ensuring the same error recognition rate, how to improve the passing rate of the face of the mask is still a difficult problem in the industry. For example, under the condition that a mask blocks most of human faces, a general human face recognition system cannot accurately and quickly meet the traffic requirements of systems such as airport real-name ticket checking, company attendance or community entrance guard, and a new challenge is brought to an urban informatization system related to a face recognition technology. Therefore, a solution to the above-mentioned problems is urgently needed.

In fact, the problem of face image recognition with a worn mask essentially belongs to the problem of visual occlusion. In the related art, when solving such a blocking problem, on one hand, a certain proportion of related samples are roughly added from data, however, although the mask face recognition accuracy can be improved by adopting such a manner, the numerical value is very limited, and the recognition accuracy of a non-mask face can be greatly reduced. In addition to the above data, on the other hand, the technical staff also tried at the method level, but the existing means such as deep learning attention mechanism, increasing the network computation amount, multi-model feature weighting and the like have proved to be unable to solve the occlusion problem in practice. Because the methods can only improve the identification precision of the mask face in a very limited way, the distance target precision has a larger difference, and the compatibility of the identification precision of the mask face scene and the non-mask face scene is poor; in addition, if the above method is adopted, the amount of calculation is large, which may increase the time consumption for recognition on the edge computing device, and thus the body feeling may be deteriorated.

At present, no effective solution is provided for the problem of low pass rate of identification when a face of a mask is identified in the related technology.

Disclosure of Invention

The embodiment of the application provides a mask face recognition method based on double-branch weight fusion homology self-supervision, and aims to at least solve the problem of low recognition passing rate when a mask face is recognized in the related technology.

In a first aspect, an embodiment of the present application provides a mask face recognition method based on double-branch weight fusion homology self-supervision, where the method includes:

dynamically adjusting the learning rate and the online mask augmentation ratio by a cosine curve alternating optimization method, and training a basic face recognition model to obtain a trained bottom layer feature sharing network model;

respectively training a left branch sub-model and a right branch sub-model in a face recognition double-branch model by a double-branch weight fusion homology supervision loss function and a face recognition Arcface loss function, and performing weight fusion on the trained left branch sub-model and right branch sub-model to obtain a high-level semantic fusion network model;

and splicing the trained bottom layer feature sharing network model and the high-layer semantic fusion network model to obtain a final face recognition prediction model.

In some embodiments, the dynamically adjusting the learning rate and the online mask augmentation ratio by a cosine curve alternating optimization method, and training the basic face recognition model includes:

dividing the learning rate of the cosine curve and the online mask augmentation ratio into different stages, and dynamically setting the parameters of the cosine curve learning rate and the online mask augmentation ratio of the different stages;

and training the basic face recognition model through the face recognition Arcface loss function.

In some embodiments, mask attaching is performed on the face image through an attaching means to obtain a mask face image;

presetting different proportions of the face image and the mask face image to obtain the online mask augmentation proportion data.

In some embodiments, the training the left branch submodel and the right branch submodel in the face recognition bi-branch model by fusing the homology monitoring loss function and the face recognition arcfacace loss function with the two-branch weight comprises:

training the left branch sub-model through the face recognition Arcface loss function to obtain a trained left branch sub-model, and fixing network parameters of the left branch sub-model;

and training the right branch sub-model through the double-branch weight fusion homology supervision loss function and the face recognition Arcface loss function to obtain the trained right branch sub-model.

In some of these embodiments, training the right branch sub-model with the two-branch weight fusion homology surveillance loss function and the face recognition arcfacace loss function comprises:

and acquiring the output characteristics of the trained left branch submodel, training the output characteristics of the left branch submodel and the output characteristics of the right branch submodel by fusing a homology supervision loss function through the double branch weight, and training the output characteristics of the right branch submodel through a face recognition Arcface loss function.

In some embodiments, weight fusing the trained left branch submodel with the trained right branch submodel comprises:

and (4) performing weight fusion of different proportions on the trained left branch sub-model and right branch sub-model by self-defining weighting parameters, and adjusting different scene priorities.

In some of these embodiments, after deriving the final face recognition prediction model, the method includes:

and carrying out prediction recognition on the mask face through the final face recognition prediction model, and outputting to obtain a recognition result.

In a second aspect, the present application provides a mask face recognition system based on dual-branch weight fusion homology self-supervision, including:

the basic training module is used for dynamically adjusting the learning rate and the online mask augmentation ratio by a cosine curve alternating optimization method, training a basic face recognition model and obtaining a trained bottom layer feature sharing network model;

the double-branch training fusion module is used for respectively training a left branch sub-model and a right branch sub-model in the face recognition double-branch model through a double-branch weight fusion homology supervision loss function and a face recognition Arcface loss function, and performing weight fusion on the trained left branch sub-model and right branch sub-model to obtain a high-level semantic fusion network model;

and the splicing module is used for splicing the trained bottom layer feature sharing network model and the high-level semantic fusion network model to obtain a final face recognition prediction model.

In a third aspect, an embodiment of the present application provides an electronic device, which includes a memory, a processor, and a computer program stored on the memory and executable on the processor, and the processor executes the computer program to implement the mask face recognition method based on dual-branch weight fusion homology self-supervision as described in the first aspect.

In a fourth aspect, the present application provides a storage medium, on which a computer program is stored, where the computer program is executed by a processor to implement the mask face recognition method based on dual-branch weight fusion homology self-supervision as described in the first aspect.

Compared with the related technology, the mask face recognition method based on the double-branch weight fusion homology self-supervision provided by the embodiment of the application dynamically adjusts the learning rate and the online mask augmentation ratio by a cosine curve alternative optimization method, trains a basic face recognition model and obtains a trained bottom layer feature sharing network model; respectively training a left branch sub-model and a right branch sub-model in a face recognition double-branch model by a double-branch weight fusion homology supervision loss function and a face recognition Arcface loss function, and performing weight fusion on the trained left branch sub-model and right branch sub-model to obtain a high-level semantic fusion network model; and splicing the trained bottom layer feature sharing network model and the high-layer semantic fusion network model to obtain a final face recognition prediction model.

According to the method, a basic face recognition model is trained by an online fine-grained mask face data augmentation method based on cosine curve alternating optimization, and the problem that the compatibility of mask face and non-mask face recognition accuracy is poor when face recognition is carried out in an offline rough data augmentation mode in the related technology is solved. Then, the application provides a two-branch weight fusion homology supervision training loss function, the loss function can constrain the output characteristic values of two branches to keep a smaller two-range distance, and the high-level semantic homology of the two branches is kept, so that the success rate of subsequent weight fusion links is ensured, and the precision gain rate is improved. Furthermore, the invention realizes high-precision recognition of two scenes, namely a mask face and a non-mask face, by a method of weight weighting and fusion of the deep convolutional network model on the premise of not increasing any calculated amount, and meanwhile, the scene priority can be adjusted by setting a weighting coefficient. Finally, the bottom layer feature sharing model and the high-layer semantic fusion network model are spliced to obtain a final face recognition prediction model, the splicing model is trained through the bottom layer feature sharing and high-layer semantic double-branch deep convolution network training method to achieve efficient convergence of models in two scenes, namely a mask face and a non-mask face, and compared with other multi-model feature fusion modes, the method can greatly reduce calculation and time cost in a training stage and improve efficiency.

Therefore, through above-mentioned scheme, the difficult problem in the industry of how to promote the passing rate of gauze mask people face has been solved in this application, has increased substantially the discernment passing rate of gauze mask people face, has also improved the discernment precision, the cost is reduced.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:

fig. 1 is a schematic application environment diagram of a mask face recognition method based on double-branch weight fusion homology self-supervision according to an embodiment of the present application;

fig. 2 is a flowchart of a mask face recognition method based on dual-branch weight fusion homology self-supervision according to an embodiment of the present application;

fig. 3 is a schematic diagram of a cosine curve alternating optimization mask face data augmentation curve according to an embodiment of the present application;

fig. 4 is a schematic diagram illustrating an effect of an online mask data augmentation method according to an embodiment of the present application;

FIG. 5 is a schematic diagram of a network structure of a face recognition dual-branch model according to an embodiment of the present application;

fig. 6 is a block diagram of a mask face recognition system based on dual-branch weight fusion homology self-supervision according to an embodiment of the present application;

fig. 7 is an internal structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be described and illustrated below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments provided in the present application without any inventive step are within the scope of protection of the present application. Moreover, it should be appreciated that in the development of any such actual implementation, as in any engineering or design project, numerous implementation-specific decisions must be made to achieve the developers' specific goals, such as compliance with system-related and business-related constraints, which may vary from one implementation to another.

Reference in the specification to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the specification. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of ordinary skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments without conflict.

Unless defined otherwise, technical or scientific terms referred to herein shall have the ordinary meaning as understood by those of ordinary skill in the art to which this application belongs. Reference to "a," "an," "the," and similar words throughout this application are not to be construed as limiting in number, and may refer to the singular or the plural. The present application is directed to the use of the terms "including," "comprising," "having," and any variations thereof, which are intended to cover non-exclusive inclusions; for example, a process, method, system, article, or apparatus that comprises a list of steps or modules (elements) is not limited to the listed steps or elements, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus. Reference to "connected," "coupled," and the like in this application is not intended to be limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect. Reference herein to "a plurality" means greater than or equal to two. "and/or" describes an association relationship of associated objects, meaning that three relationships may exist, for example, "A and/or B" may mean: a exists alone, A and B exist simultaneously, and B exists alone. Reference herein to the terms "first," "second," "third," and the like, are merely to distinguish similar objects and do not denote a particular ordering for the objects.

The mask face recognition method based on double-branch weight fusion homology self-supervision provided by the application can be applied to the application environment shown in fig. 1, and fig. 1 is a schematic application environment diagram of the mask face recognition method based on double-branch weight fusion homology self-supervision according to the embodiment of the application, and is shown in fig. 1. The terminal 11 and the server 10 communicate with each other via a network. The server 10 dynamically adjusts the learning rate and the online mask augmentation ratio by a cosine curve alternating optimization method, trains a basic face recognition model and obtains a trained bottom layer feature sharing network model; respectively training a left branch sub-model and a right branch sub-model in a face recognition double-branch model by a double-branch weight fusion homology supervision loss function and a face recognition Arcface loss function, and performing weight fusion on the trained left branch sub-model and right branch sub-model to obtain a high-level semantic fusion network model; and splicing the trained bottom layer feature sharing network model and the high-layer semantic fusion network model to obtain a final face recognition prediction model, carrying out mask face recognition through the face recognition prediction model, and outputting the obtained recognition result to be displayed on the terminal 11. The terminal 11 may be, but not limited to, various personal computers, notebook computers, smart phones, tablet computers, and portable wearable devices, and the server 10 may be implemented by an independent server or a server cluster formed by a plurality of servers.

It should be noted that the method and the device can be applied to different mask epidemic prevention scenes, such as subway face brushing security inspection, community access control, epidemic prevention security inspection and other scenes, and can also be applied to other scenes in which the face is shielded, and are not limited to the situation that the face is shielded by wearing the mask.

Preferably, the following examples mainly explain the present application in terms of the scene of a mask face.

The present embodiment provides a mask face recognition method based on dual-branch weight fusion homology self-supervision, and fig. 2 is a flowchart of the mask face recognition method based on dual-branch weight fusion homology self-supervision according to the embodiment of the present application, and as shown in fig. 2, the flowchart includes the following steps:

step S201, dynamically adjusting a learning rate and an online mask augmentation ratio by a cosine curve alternating optimization method, and training a basic face recognition model to obtain a trained bottom layer feature sharing network model;

fig. 3 is a schematic diagram of a curve of cosine curve alternating optimization mask face data augmentation according to an embodiment of the present application, and as shown in fig. 3, preferably, this embodiment divides the learning rate of cosine curve and the online mask augmentation ratio into different stages, and dynamically sets parameters of the learning rate of cosine curve and the online mask augmentation ratio in different stages. Specifically, the method can be divided into four continuous stages of a basic stage (Enhance base), a Low Mask Ratio stage (Low Mask Ratio), a High Mask Ratio stage (High Mask Ratio) and a balanced Mask Ratio stage (Banland & More Epoch); wherein, the learning rates of the above four stages can be set in turn according to experience, as shown by black bold lines in fig. 3, the basic stage can be set to be constant 0.01; the initial learning rate of the low mask proportion stage is set to be 0.075, and then the initial learning rate is faded to be 0.01 through a cosine curve; the initial learning rate of the high mask proportion stage is set to be 0.075, and then the initial learning rate is increased to be 0.12 through a cosine curve; the learning rate in the final stage of equalizing mask ratio was constantly set to 0.075. In this embodiment, the model is trained by the learning rate set in the above four stages, and a training process gradually approaching global optimum can be effectively constructed, wherein the total number of training iterations can be set by user, and the optimal setting in this embodiment is 60 rounds.

Note that, the learning rate of the cosine curve and the cosine curve function used in the online mask expansion ratio in the present embodiment are as shown in the following formula (1):

（1）

wherein T is the total training step number, T is the current training step number, cos is the symbol of the cosine value, value _ start is the initial learning rate or the initial mask fit proportion value, value _ end is the stage target learning rate or the stage target mask fit proportion value,

the obtained current learning rate or mask fit ratio value is shown.

Fig. 4 is a schematic diagram illustrating the effect of the online mask data augmentation method according to the embodiment of the present application, and as shown in fig. 4, a mask image and a face image are obtained, where the mask image is obtained from: taking a picture of a proper amount of masks with different styles, which are prepared in advance, and making the picture of the mask into a picture with a transparent background; then, the mask image is attached to the face image by an attaching means, such as a ps means, and the face image is fixedly attached to the area below the eyes of the human face, so that a real face image of the wearer mask is obtained by approximate simulation, as shown in fig. 4; and finally, presetting different proportions of the face image and the mask face image obtained by simulation, and obtaining different online mask augmentation proportion data.

Further, after all the parameters are set, in this embodiment, the basic face recognition Model _ a is trained through a general face recognition arcfacce loss function, so as to obtain a trained bottom-layer feature sharing network Model Shared _ Net _ Stage, where a calculation formula of the arcfacce loss function is shown in the following formula (2):

（2）

wherein i represents a group

The ith sample of the class, N represents the number of batches processed, N represents the number of classes, m is the angle penalty value, s is the scale over-parameter of the vector modulo length,

is the angle between the eigenvector and the ith column of the weight matrix of the full connection layer.

It should be noted that the basic face recognition model used in this embodiment may be several types of classical face recognition models, such as a convolutional neural network, a residual error network, a DenseNet network, and the like, and this embodiment is not limited in particular.

The method for amplifying the online fine-grained mask face data based on cosine curve alternating optimization trains a basic face recognition model, solves the problem that the compatibility of mask face and non-mask face recognition precision is poor when face recognition is carried out in an offline rough data amplifying mode in the related technology, and is beneficial to subsequent model training;

step S202, respectively training a left branch sub-model and a right branch sub-model in a face recognition double-branch model through a double-branch weight fusion homology supervision loss function and a face recognition Arcface loss function, and performing weight fusion on the trained left branch sub-model and right branch sub-model to obtain a high-level semantic fusion network model;

fig. 5 is a schematic network structure diagram of a face recognition dual-branch Model according to an embodiment of the present application, and as shown in fig. 5, a network portion above a bifurcation Branching Point is a trained network module obtained after training a basic recognition Model _ a in step S201, that is, a bottom layer feature sharing network Model Shared _ Net _ Stage, and this portion of the network Model does not participate in backward propagation of a network in a subsequent process of training the dual-branch Model, that is, network parameters of this portion of the network Model are fixed. The network part below the branch Point is a double-branch Model, and respectively comprises a left branch Sub Model _ non Model adapted to a non-Mask scene and a right branch Sub Model adapted to the Mask scene, wherein the network structures of the left branch and the right branch are completely the same.

Preferably, as shown in fig. 5, when the two branch models are trained, two successive trainings of a first train and a next train are required to be performed on the two branches, that is, the left branch sub-model is trained first, and then the feature vector output by the trained left branch sub-model participates in the following training of the right molecular model. Specifically, firstly, training a left branch sub-model through a universal face recognition Arcface loss function, at this time, closing an online mask amplification ratio, namely setting a ratio value of the online mask amplification ratio to 0.0, and setting the number of training rounds in a self-defined manner, at this time, because fine tuning is performed on the branch model, a large number of training rounds are not needed, so that the number of the self-defined training rounds can be set to be 1/n times of the number of the training rounds in step S201, for example, the optimal setting in step S201 is 60 rounds, at this time, the optimal number of the training rounds can be set to be 6 rounds, obtaining the trained left branch sub-model after training is finished, and fixing the network parameters of the left branch sub-model;

then, obtaining the output characteristics of the trained left branch submodel, training the output characteristics of the left branch submodel and the output characteristics of the right branch submodel by fusing a homology monitoring loss function through a double branch weight, and training the output characteristics of the right branch submodel by a face recognition Arcface loss function, wherein the online mask augmentation proportion value is fixedly set to be 0.1, the number of training rounds is the same as that of the left branch, and the trained right branch submodel is obtained after the training is finished.

By training the double-branch model through the steps, the output characteristic values Features _ non-nMASk and Features _ Mask of the two branches can be restrained to keep a smaller two-norm distance, the high-level semantic homology of the weights of the two branches is ensured, the success rate of the subsequent weight fusion link is ensured, and the precision gain rate is also improved.

It should be noted that the calculation formula of the two-branch weight fusion Homology supervision Loss function (homogeneity Supervised Loss) adopted in this embodiment is shown in the following formula (3):

L2 = ||Features_nonMask - Features_Mask|| （3）

the Features _ non-Mask is a feature vector of a left branch sub-model adapted to a non-Mask scene, the Features _ Mask is a feature vector of a right branch sub-model adapted to the Mask scene, and the calculation symbol | | | represents the Euclidean distance between the Features.

In addition, the calculation formula of the training loss function of the left branch submodel is shown in formula (2), and the calculation formula of the training loss function of the right branch submodel is shown in formula (4):

（4）

wherein λ is a hyper-parameter for balancing learning bias.

Further, after the training of the dual-branch Model is completed, weight fusion is performed on the trained left branch Sub-Model _ non-normal Sub-Model and the right branch Sub-Model _ Mask, so that a high-level semantic fusion network Model, Merged _ Net _ Stage, is obtained. The trained left branch submodel and the right branch submodel are subjected to weight fusion in different proportions through self-defined weighting parameters, and different scene priorities are adjusted. For example, if the model is applied to a hospital scene, the weighting parameter can be set to a value greater than 0.5, so that the network weight proportion of the right branch sub-model adapted to the mask scene is increased, the priority of the mask scene is increased, and the purpose of adapting to the hospital scene is achieved; on the contrary, if the model is applied to a non-mask scene, the weighting parameter can be set to be a value less than 0.5, so that the purpose of adapting to the non-mask scene is achieved. In the embodiment, by the method for weighting and fusing the weights of the deep convolutional network model, high-precision recognition of two scenes, namely the mask face and the non-mask face, is realized on the premise of not increasing any calculated amount, and meanwhile, the priority of the scene can be adjusted by setting the weighting coefficient.

Note that, the weight fusion calculation formula in this embodiment is shown in the following formula (5):

Parmas_Merged

= (1-α) * Parmas_nonMask +α* Parmas_Mask，0.0<α<1.0 （5）

the method comprises the steps of obtaining a high-level semantic fusion network module, obtaining a Parmas _ Merged, obtaining a Parmas _ nomMask, obtaining a weighting parameter, and obtaining a weighting parameter.

And step S203, splicing the trained bottom layer feature sharing network model and the high-layer semantic fusion network model to obtain a final face recognition prediction model.

Preferably, in this embodiment, the bottom-layer feature-sharing network Model Shared _ Net _ Stage part in the base face recognition Model _ a trained in step S201 and the high-layer semantic fusion network Model merge _ Net _ Stage obtained in step S202 are spliced in front and back to obtain a final face recognition prediction Model.

In some embodiments, after the final face recognition prediction model is obtained, the mask face image is subjected to prediction recognition through the final face recognition prediction model, a recognition result is output, and the output recognition result can be used for representing a face wearing a mask or a conventional face.

Through the steps S201 to S203, the embodiment integrates various algorithms and training methods, so that the industrial problem of how to improve the passing rate of the mask face is solved, the identification passing rate of the mask face is improved, and especially in the operation of face identification actual products in a large number of epidemic prevention scenes such as hospitals and public places, the application can greatly improve the identification precision of the mask face.

It should be noted that the steps illustrated in the above-described flow diagrams or in the flow diagrams of the figures may be performed in a computer system, such as a set of computer-executable instructions, and that, although a logical order is illustrated in the flow diagrams, in some cases, the steps illustrated or described may be performed in an order different than here.

The present embodiment further provides a mask face recognition system based on dual-branch weight fusion homology self-supervision, which is used to implement the foregoing embodiments and preferred embodiments, and the description of the system is omitted. As used hereinafter, the terms "module," "unit," "subunit," and the like may implement a combination of software and/or hardware for a predetermined function. Although the means described in the embodiments below are preferably implemented in software, an implementation in hardware, or a combination of software and hardware is also possible and contemplated.

Fig. 6 is a block diagram of a mask face recognition system based on two-branch weight fusion homology self-supervision according to an embodiment of the present application, and as shown in fig. 6, the system includes a basic training module 61, a two-branch training fusion module 62, and a splicing module 63:

the basic training module 61 is used for dynamically adjusting the learning rate and the online mask augmentation ratio by a cosine curve alternating optimization method, training a basic face recognition model and obtaining a trained bottom layer feature sharing network model; the double-branch training fusion module 62 is used for training a left branch sub-model and a right branch sub-model in the face recognition double-branch model respectively through a double-branch weight fusion homology supervision loss function and a face recognition Arcface loss function, and performing weight fusion on the trained left branch sub-model and right branch sub-model to obtain a high-level semantic fusion network model; and the splicing module 63 is configured to splice the trained bottom-layer feature sharing network model and the high-layer semantic fusion network model to obtain a final face recognition prediction model.

Through the system, the embodiment integrates various algorithms and training methods, the industrial problem of how to improve the passing rate of the mask face is solved, the identification passing rate of the mask face is improved, and especially in the operation of face identification actual products in a large number of epidemic prevention scenes such as hospitals and public places, the identification precision of the mask face can be greatly improved.

It should be noted that, for specific examples in this embodiment, reference may be made to examples described in the foregoing embodiments and optional implementations, and details of this embodiment are not described herein again.

Note that each of the modules may be a functional module or a program module, and may be implemented by software or hardware. For a module implemented by hardware, the modules may be located in the same processor; or the modules can be respectively positioned in different processors in any combination.

The present embodiment also provides an electronic device comprising a memory having a computer program stored therein and a processor configured to execute the computer program to perform the steps of any of the above method embodiments.

Optionally, the electronic apparatus may further include a transmission device and an input/output device, wherein the transmission device is connected to the processor, and the input/output device is connected to the processor.

In addition, in combination with the mask face recognition method based on the double-branch weight fusion homology self-supervision in the above embodiments, the embodiments of the present application may provide a storage medium to implement. The storage medium having stored thereon a computer program; when being executed by a processor, the computer program realizes the mask face recognition method based on the double-branch weight fusion homology self-supervision.

In one embodiment, a computer device is provided, which may be a terminal. The computer device includes a processor, a memory, a network interface, a display screen, and an input device connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to realize a mask face recognition method based on double-branch weight fusion homology self-supervision. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, a key, a track ball or a touch pad arranged on the shell of the computer equipment, an external keyboard, a touch pad or a mouse and the like.

In one embodiment, fig. 7 is a schematic diagram of an internal structure of an electronic device according to an embodiment of the present application, and as shown in fig. 7, there is provided an electronic device, which may be a server, and an internal structure diagram of which may be as shown in fig. 7. The electronic device comprises a processor, a network interface, an internal memory and a non-volatile memory connected by an internal bus, wherein the non-volatile memory stores an operating system, a computer program and a database. The processor is used for providing calculation and control capabilities, the network interface is used for being connected and communicated with an external terminal through a network, the internal memory is used for providing an environment for an operating system and the running of a computer program, the computer program is executed by the processor to realize the mask face recognition method based on the double-branch weight fusion homology self-supervision, and the database is used for storing data.

Those skilled in the art will appreciate that the architecture shown in fig. 7 is a block diagram of only a portion of the architecture associated with the subject application, and does not constitute a limitation on the electronic devices to which the subject application may be applied, and that a particular electronic device may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).

It should be understood by those skilled in the art that various features of the above-described embodiments can be combined in any combination, and for the sake of brevity, all possible combinations of features in the above-described embodiments are not described in detail, but rather, all combinations of features which are not inconsistent with each other should be construed as being within the scope of the present disclosure.

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. A mask face recognition method based on double-branch weight fusion homology self-supervision is characterized by comprising the following steps:

through the double-branch weight fusion homology supervision loss function and the face recognition Arcface loss function, a left branch sub-model and a right branch sub-model in the face recognition double-branch model are respectively trained, and the method specifically comprises the following steps: firstly, training a left branch sub-model through a face recognition Arcface loss function to obtain the output characteristics of the trained left branch sub-model, training the output characteristics of the left branch sub-model and the output characteristics of the right branch sub-model through a double-branch weight fusion homology supervision loss function, and training the output characteristics of the right branch sub-model through the face recognition Arcface loss function; performing weight fusion on the trained left branch submodel and right branch submodel to obtain a high-level semantic fusion network Model, wherein the double-branch Model respectively comprises a left branch submodel Sub _ Model _ non-Mask adapting to a non-Mask scene and a right branch submodel Sub _ Model _ Mask adapting to a Mask scene, and the network structures of the left branch and the right branch are completely the same; the formula for calculating the homology monitoring loss function of the double branch weight fusion is shown as the following formula:

L2 = ||Features_nonMask - Features_Mask||

wherein, the Features _ non-Mask is a feature vector of a left branch sub-model adapted to a non-Mask scene, the Features _ Mask is a feature vector of a right branch sub-model adapted to the Mask scene, and the calculation symbol | | | represents the Euclidean distance between the Features;

2. The method of claim 1, wherein the dynamically adjusting the learning rate and the on-line mask augmentation ratio by a cosine curve alternating optimization method and training the base face recognition model comprises:

dividing the learning rate of the cosine curve and the online mask augmentation ratio into different stages, and dynamically setting the parameters of the learning rate of the cosine curve and the online mask augmentation ratio in different stages;

3. The method of claim 2,

carrying out mask lamination on the face image by a laminating means to obtain a mask face image;

4. The method of claim 1, wherein weight fusing the trained left branch submodel with the right branch submodel comprises:

5. The method of claim 1, wherein after obtaining the final face recognition prediction model, the method comprises:

6. A mask face recognition system based on double-branch weight fusion homology self-supervision is characterized by comprising:

the double-branch training fusion module is used for respectively training a left branch sub-model and a right branch sub-model in the face recognition double-branch model by fusing a homology supervision loss function and a face recognition Arcface loss function through double-branch weight, and comprises the following specific steps: training a left branch Sub-Model through a face recognition Arcface loss function to obtain output characteristics of the trained left branch Sub-Model, training the output characteristics of the left branch Sub-Model and the output characteristics of the right branch Sub-Model through a double-branch weight fusion homology monitoring loss function, training the output characteristics of the right branch Sub-Model through the face recognition Arcface loss function, performing weight fusion on the trained left branch Sub-Model and the right branch Sub-Model to obtain a high-level semantic fusion network Model, wherein the double-branch Model respectively comprises a left branch Sub-Model _ non-nMASk adapting to a non-Mask scene and a right branch Sub-Model _ Mask adapting to the Mask scene, and the network structures of the left branch and the right branch are completely the same; the formula for calculating the homology monitoring loss function of the double branch weight fusion is shown as the following formula:

L2 = ||Features_nonMask - Features_Mask||

7. An electronic device comprising a memory and a processor, wherein the memory stores a computer program, and the processor is configured to execute the computer program to execute the mask face recognition method based on the dual-branch weight fusion homology self-supervision according to any one of claims 1 to 5.

8. A storage medium having a computer program stored therein, wherein the computer program is configured to execute the mask face recognition method based on dual-branch weight fusion homology self-supervision according to any one of claims 1 to 5 when running.