CN113076813B

CN113076813B - Training method and device for mask face feature recognition model

Info

Publication number: CN113076813B
Application number: CN202110272296.2A
Authority: CN
Inventors: 许二赫; 陈彪; 孙虹; 陈益强; 卢旺; 于汉超; 杨晓东
Original assignee: Xuanwu Hospital; Institute of Computing Technology of CAS
Current assignee: Xuanwu Hospital; Institute of Computing Technology of CAS
Priority date: 2021-03-12
Filing date: 2021-03-12
Publication date: 2024-04-12
Anticipated expiration: 2041-03-12
Also published as: CN113076813A

Abstract

The application provides a mask face feature recognition model training method and device, wherein the method comprises the following steps: acquiring a sample facial feature video and a corresponding evaluation label; the sample facial feature video is a video formed by a user executing operations according to a set rule; extracting image frames of the sample facial feature video to form a frame sequence; according to the frame sequence, carrying out differential operation on adjacent frames in the frame sequence to obtain a differential image; extracting a feature matrix of each differential image; combining the feature matrixes of the differential images according to the frame sequence to obtain a video feature matrix; and training a mask face feature recognition model by adopting the video feature matrix and the corresponding evaluation label. Compared with the prior art, the method for determining facial features of the mask by directly extracting features by using facial features of the face can simplify calculation, realize quick establishment of a facial feature recognition model of the mask, and achieve better accuracy.

Description

Training method and device for mask face feature recognition model

Technical Field

The application relates to the technical field of machine learning, in particular to a mask face feature recognition model training method and device.

Background

A mask face refers to a facial state that is stiff in expression, even though intentionally expressed, and that is exhibited by the facial expression due to suppressed muscle activity. Although the method cannot be directly used as the direct diagnosis basis of the parkinsonism, the method determines that the facial and parkinsonism and other nervous system degeneration diseases have a strong association relationship from a large number of existing cases, and can be used as the primary screening basis of parkinsonism and other diseases.

Along with the popularization of intelligent terminals such as smart phones, the health data monitoring work which originally needs professional to operate can be obtained by performing operation processing on the acquired data by the intelligent terminals. For example, the recognition of the mask face can be obtained by shooting a face video of the user on the intelligent terminal by a deep learning algorithm; in this case, a core problem is applicability and accuracy of an algorithm for processing the acquired data.

At present, a process of acquiring a user face by adopting an intelligent terminal to obtain a sample facial feature video and establishing a related algorithm model by using a deep learning method is mentioned, but the kernel of the algorithm needs to perform feature recognition on each frame in the video, so that the calculated amount of the algorithm is large; in practical application, the action state of a tester when shooting a characteristic video has great influence on the identification result.

Disclosure of Invention

Based on the problems found by analysis of the prior art, the application provides a mask face feature recognition model training method and device and a mask face recognition method.

In one aspect, the present application provides a mask face feature recognition model training method, including:

acquiring a sample facial feature video and a corresponding evaluation label; the sample facial feature video is formed by a user executing operation according to a set rule;

extracting image frames of the sample facial feature video to form a frame sequence;

according to the frame sequence, carrying out differential operation on adjacent frames in the frame sequence to obtain a differential image;

extracting a feature matrix of each differential image; combining the feature matrixes of the differential images according to the frame sequence to obtain a video feature matrix;

and training the mask face feature recognition model by adopting the video feature matrix and the corresponding evaluation label.

Optionally, extracting image frames of the sample facial feature video to form a frame sequence includes: dividing the sample facial feature video according to a set rule to obtain a sample sub-video;

extracting image frames of each sample sub-video to form a corresponding frame sequence;

combining the feature matrixes of the differential images according to the frame sequence to obtain a video feature matrix, wherein the video feature matrix comprises the following components: and combining the feature matrixes of the differential images according to the arrangement sequence of the sample video and the corresponding frame sequence to obtain the video feature matrix.

Optionally, the setting rule includes at least two facial actions and execution time of each of the facial actions;

dividing the sample facial feature video according to a set rule to obtain a sample partial video, wherein the method comprises the following steps: and dividing the video features according to the facial actions and the corresponding execution time to obtain the sample partial video.

Optionally, the facial action includes closing both eyes, relaxing and looking straight ahead, smiling and exposing teeth.

Optionally, extracting image frames of the sample facial feature video to form a frame sequence includes:

extracting a facial image area of the image frame;

the facial image areas are combined in the order of the image frames to form the sequence of frames.

Optionally, the feature matrix of the differential image includes at least two feature parameters;

combining the feature matrixes of the differential images according to the frame sequence to obtain a video feature matrix, wherein the video feature matrix comprises the following components:

extracting the same characteristic parameters of each characteristic matrix according to the frame sequence, and combining to form a same parameter vector;

and combining the same-parameter vectors to obtain the video feature matrix.

In another aspect, the present application provides a method for face recognition, including:

acquiring a facial feature video to be evaluated; the facial feature video to be evaluated is a video formed by a user executing operation according to a set rule;

extracting image frames of the face evaluation video to be evaluated to form a frame sequence;

and processing the video feature matrix by using the mask face feature recognition model obtained by the method to obtain a mask face degree evaluation result.

In yet another aspect, the present application provides a mask face recognition model training device, including:

the source data acquisition unit is used for acquiring the sample facial feature video and the corresponding evaluation tag; the sample facial feature video is formed by a user executing operation according to a set rule;

a frame extraction unit for extracting image frames of the sample facial feature video to form a frame sequence;

the differential processing unit is used for carrying out differential operation on adjacent frames in the frame sequence according to the frame sequence to obtain a differential image;

a feature determining unit, configured to extract a feature matrix of each of the differential images; combining the feature matrixes of the differential images according to the frame sequence to obtain a video feature matrix;

and the model training unit is used for training the mask face feature recognition model by adopting the video feature matrix and the corresponding evaluation label.

According to the mask face feature recognition model training method, after differential operation is carried out on adjacent frames in a frame sequence to obtain differential images, the differential images are processed to extract feature matrixes, and feature matrixes of the face feature videos of the characterization samples are obtained based on the feature matrixes of the differential images.

Because the feature matrix for representing the sample facial feature video is determined based on the differential image, the feature matrix represents the change condition of video content along with time, namely represents the change delay degree and the controllable degree of the facial expression of the tester according to the set rule, and then can represent the facial feature of the tester. Compared with the method for directly extracting features by using facial features of a human face and determining facial features of a mask in the prior art, the feature matrix data quantity for representing the facial features of the sample video is small, and the method can simplify the calculated quantity, realize quick establishment of a facial feature recognition model of the mask and achieve better accuracy.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and together with the description, serve to explain the principles of the application.

For a clearer description of embodiments of the present application or of the solutions of the prior art, reference will be made below to the accompanying drawings, which are used in the description of embodiments or of the prior art, and from which it is obvious to a person skilled in the art that other drawings can be obtained without inventive effort;

FIG. 1 is a flowchart of a mask face feature recognition model training method provided in an embodiment of the present application;

FIG. 2 is a flowchart of a mask face recognition method according to an embodiment of the present application;

fig. 3 is a schematic structural diagram of a mask face recognition device according to an embodiment of the present application;

fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present application;

wherein: the system comprises an 11-source data acquisition unit, a 12-frame extraction unit, a 13-differential processing unit, a 14-characteristic determination unit, a 15-model training unit, a 21-processor, a 22-memory, a 23-communication interface and a 24-bus system.

Detailed Description

In order that the above objects, features and advantages of the present application may be more clearly understood, a further description of the aspects of the present application will be provided below. It should be noted that, in the case of no conflict, the embodiments of the present application and the features in the embodiments may be combined with each other.

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present application, but the present application may be practiced otherwise than as described herein; it will be apparent that the embodiments in the specification are only some, but not all, embodiments of the application.

The embodiment of the application provides a training method for a mask face feature recognition model, which is used for acquiring training data for training the mask face feature recognition model by adopting a new method and adopting the training data to perform the model.

Fig. 1 is a flowchart of a mask face feature recognition model training method according to an embodiment of the present application. As shown in fig. 1, the method provided in the embodiment of the present application includes steps S101 to S105.

S101: and acquiring a sample facial feature video and a corresponding evaluation label.

The sample facial feature video may be acquired by a tester using a mobile terminal such as a smart phone, or may be acquired using a dedicated image acquisition device, and the embodiment of the present application is not particularly limited.

In the embodiment of the application, the sample facial feature video is a video formed by a tester executing facial actions according to a set rule; the setting rules include parameters such as the type of the face action of the tester, the execution time of the face action, and the like. When acquiring a sample facial feature video, a tester must autonomously control facial muscles to perform corresponding actions strictly according to set rules.

In the embodiment of the application, the setting rule may include one facial action and corresponding execution time, and may also include a plurality of facial actions and corresponding execution times.

In one specific application, the set rules include three facial movements, which are closing eyes, relaxing and looking straight ahead and tiny and exposing teeth, respectively; three facial movements are designed for some features of the mask face for observing the degree of facial stiffness and facial retardation of the tester (i.e., the tester), wherein: the eyes are closed for observing the face relaxation degree of the tester, the front part is relaxed and directly looked at for observing the eye condition of the tester, smile and exposed teeth are used for observing the mouth angle and the eye condition of the tester.

In order for each facial motion to be reasonably captured and embody the foregoing degree of facial stiffness and degree of facial slowness of change, each facial motion should meet the proper execution time. In one application of the embodiment of the present application, the execution time of each face action is 5s, and the execution of three face actions is continuous.

In the embodiment of the application, the evaluation tag is obtained by a professional looking at the sample facial feature video or looking at the corresponding facial features of the test object. In one application, the evaluation labels may include five grades, normal, mild, moderate, and severe; normal facial expression is normally characterized; slightly corresponding to the situation that the testers have transient frequency reduction; a slight corresponding decrease in the transient frequency of the tester and a decrease in the lower facial expression (e.g., spontaneous smiling lips without separation); the lips of the moderate corresponding testers can be opened when the mouth is not moved; the severe corresponds to the lips being open in most cases when they are stationary.

S102: image frames of a sample facial feature video are extracted to form a sequence of frames.

In this embodiment of the present application, after the sample facial feature video is acquired, frame framing processing is required to be performed on the sample facial feature video to form a frame sequence for subsequent processing by the user.

In practical applications, the strategies for correspondingly forming the frame sequences may be different according to the sampling frequency of the sample facial feature video. If the sampling frequency is small, all image frames of the sample facial feature video can be directly used as one frame in the frame sequence; if the sampling frequency is high, a method of interval sampling can be adopted to extract partial frames in the sample facial feature video as a frame sequence.

In practical application, when a tester performs sample facial feature video acquisition, an image acquisition device (such as a smart phone of a user) is displaced relative to the face of the user, so that the position of the face of the user is moved.

In order to extract effective information and exclude unnecessary information, a face region of the user in each image frame may be determined in step S102, and the face region of the user may be extracted and the other region may be deleted as a pixel point of each frame in the frame sequence. For example, in one application of the embodiment of the present application, the face area of each image frame may be determined by an edge recognition or a deep learning method, and the size of the face area is 64×64 pixels by a difference value or a pixel combination method, so as to meet the consistency requirement of the subsequent processing.

In practical application of the embodiment of the present application, if the setting rule in step S101 includes three actions, that is, when the sample facial feature video includes video contents of multiple actions, in step S102, the sample facial feature video may be further divided according to the setting rule (that is, the execution time of each facial action in the setting rule), to obtain sample partial videos, and frame processing is performed on each sample partial video to form a frame sequence corresponding to each facial action. Of course, in other embodiments, the frame sequence may be determined first and then split according to the processing rule.

In the embodiment of the application, in order to improve the processing efficiency, in the case that the sample facial feature video is a color video, the sample facial feature video can be converted into a black-and-white gray video, and then a frame sequence is acquired; or after the frame sequence is acquired, carrying out gray conversion on the content in the frame sequence to form black-white gray video.

S103: and carrying out differential operation on adjacent frames in the frame sequence according to the frame sequence to obtain a differential image.

In step S103, the difference operation is performed on the adjacent frames in the frame sequence, that is, the gray data of the corresponding pixel points of the adjacent two frames in the frame sequence are subtracted, and the absolute value of the subtracted value is used as the pixel gray value of the corresponding point in the corresponding difference image. It is conceivable that the difference operation determines the variation except for the adjacent frames, and thus the difference image embodies the variation of the facial features of the user.

Specifically, if the frame sequences have k, for the mth frame sequence, the difference value of the corresponding pixel point (i, j) in the corresponding t image frame is adoptedAnd (5) calculating to obtain the product.

S104: extracting a feature matrix of each differential image; and combining the feature matrixes of the differential images according to the frame sequence to obtain a video feature matrix.

After the differential image is determined, feature extraction may be performed on the differential image to determine a feature matrix characterizing the differential image.

In this embodiment of the present application, the feature matrix of the differential image may include the following parameters: information entropy, maximum value, minimum value, average value, variance, skewness, kurtosis, median, polar error, quartile 1, quartile 3, quartile difference and correlation coefficient, and constructing a feature matrix of the differential image by utilizing the parameters. In practical application, the feature matrix of the differential image may be only one feature vector, and each parameter in the feature vector is arranged according to a set sequence.

Let p be the proportion of pixels in a differential image at each gray level _i Then the information entropy is adoptedAnd (3) representing.

The gray scale of all pixels in a certain differential image is x _i Then the maximum value isMinimum value ofAverage value of->Variance is->Deflection is +.>Kurtosis is +.>The polar difference is Ptp=max-min, and the quartile difference is DQ=Q ₃ -Q ₁ The correlation coefficient isCov(x ^t ,x ^t+1 )＝E(x ^t x ^t+1 )-E(x ^t )(x ^t+1 )。

After the feature matrix of each differential image is obtained, the feature matrix of each differential image is combined according to the sequence, and a video feature matrix is obtained. In practical application, a video feature matrix is obtained according to the feature matrix of each differential image, the same feature parameters in the feature matrix of each differential image can be extracted, the parameters are sequenced according to the sequence of video frames to obtain corresponding feature vectors, and the feature vectors corresponding to all the parameters are combined into the video feature matrix.

In some applications of the embodiments of the present application, under the condition of splitting the sample facial feature video into a plurality of sample partial videos, the video feature matrix of each sample partial video may be calculated first, and then the feature matrices of each sample partial video may be combined into the feature matrix of the sample facial feature video.

S105: and training a mask face feature recognition model by adopting the video feature matrix and the corresponding evaluation label.

Step S105 is a process of training the mask face feature recognition model based on the association relationship between the video feature matrix and the corresponding face evaluation label after the video feature matrix and the corresponding face evaluation label are determined. In this embodiment of the present application, the mask face feature recognition model may be a model widely adopted in the machine learning field, for example, may be a support vector machine model or the like.

As can be seen from the foregoing description analysis of steps S101-S105, in the training method for facial feature recognition models provided in the embodiments of the present application, after differential operation is performed on adjacent frames in a frame sequence to obtain differential images, the differential images are processed to extract feature matrices, and feature matrices representing facial feature videos of samples are obtained based on the feature matrices of the differential images. Because the feature matrix for representing the sample facial feature video is determined based on the differential image, the feature matrix represents the change condition of the video content along with time, and the change delay degree and the controllable degree of the facial expression of the tester according to the set rule are reflected, so that the facial feature of the tester can be represented.

Compared with the prior art, the method for determining facial features of the mask by directly extracting features by using facial features of the face can simplify calculation, realize quick establishment of a facial feature recognition model of the mask, and achieve better accuracy.

In addition, in some specific applications of the embodiment of the application, the sample facial feature video is a video formed when a shooting tester guides actions according to a set rule, and the set rule is a corresponding test index feature, so that a feature matrix corresponding to the sample facial feature video also corresponds to the corresponding test index feature, and a mask facial feature recognition model is more accurate.

After the mask face feature recognition model is obtained, the model can be stored in an APP, the APP is distributed to a corresponding user side, the user side processes the acquired sample face feature video to obtain a video feature matrix, and the video feature matrix is used as model input to obtain a corresponding classification result.

The mask face obtained based on the mask face recognition model training method is a recognition model, and the embodiment of the application also provides a mask face recognition method. Fig. 2 is a flowchart of a mask face recognition method according to an embodiment of the present application. As shown in fig. 2, the mask amount recognition method provided in the embodiment of the present application includes steps S201 to S205.

S201: acquiring a facial feature video to be evaluated; the facial feature video to be evaluated is a video formed by a user performing an operation according to a set rule.

S202: and extracting image frames of the face evaluation video to be evaluated to form a frame sequence.

S203: and carrying out differential operation on adjacent frames in the frame sequence according to the frame sequence to obtain a differential image.

S204: extracting a feature matrix of each differential image; and combining the feature matrixes of the differential images according to the frame sequence to obtain a video feature matrix.

The foregoing steps S201 to S204 are performed substantially the same as the steps S101 to S105, except that the facial feature video to be evaluated is processed in step S201, and the evaluation tag does not need to be determined; specific operation of the relevant steps may be referred to the foregoing description and will not be repeated here.

S205: and processing the video feature matrix by adopting a mask face feature recognition model to obtain a mask face degree evaluation result.

In step S205, the video feature matrix determined in step S204 is input into the mask feature recognition model determined in the foregoing, and a mask face degree evaluation result may be obtained. In the case where the evaluation labels in the foregoing model are normal, mild, light, moderate, and heavy, the evaluation result of the mask face degree is any one of the foregoing labels.

In addition to providing the foregoing mask face recognition model training method, the embodiment of the present application further provides a mask face feature recognition model training device. Fig. 3 is a schematic structural diagram of a mask face recognition device according to an embodiment of the present application, and as shown in fig. 3, the mask face recognition model training device includes a source data acquisition unit 11, a frame extraction unit 12, a differential processing unit 13, a feature determination unit 14, and a model training unit 15.

The source data acquisition unit 11 is configured to acquire a sample facial feature video and a corresponding evaluation tag.

In order to enable each facial motion to be reasonably captured and cast the degree of facial stiffness and degree of facial slowness of change as previously described, each facial motion should meet the proper execution time. In one application of the embodiment of the present application, the execution time of each face action is 5s, and the execution of three face actions is continuous.

The frame extraction unit 12 is used to extract image frames of a sample facial feature video to form a frame sequence.

In practice, the strategy by which the frame extraction unit 12 forms a sequence of frames may be different depending on the sampling frequency of the sample facial feature video. For example, if the sampling frequency is small, all image frames of the sample facial feature video may be taken directly as one frame in the frame sequence; if the sampling frequency is high, a method of interval sampling can be adopted to extract partial frames in the sample facial feature video as a frame sequence.

In order to be able to extract valid information, but not necessary information, the frame processing unit may also determine the face region of the user in each image frame, extract the face region of the user and delete other regions as pixels of each frame in the frame sequence.

The difference processing unit 13 is configured to perform a difference operation on adjacent frames in the frame sequence according to the frame sequence, so as to obtain a difference image.

The feature determining unit 14 is configured to extract a feature matrix of each of the differential images; and combining the feature matrixes of the differential images according to the frame sequence to obtain a video feature matrix.

In practical application, a video feature matrix is obtained according to the feature matrix of each differential image, the same feature parameters in the feature matrix of each differential image can be extracted, the parameters are sequenced according to the sequence of video frames to obtain corresponding feature vectors, and the feature vectors corresponding to all the parameters are combined into the video feature matrix.

In some applications of the embodiments of the present application, under the condition of splitting the sample facial feature video into a plurality of sample partial videos, the video feature matrix of each sample partial video may also be calculated in a line, and then the feature matrices of each sample partial video are combined into the feature matrix of the sample facial feature video.

The model training unit 15 is configured to train the mask face feature recognition model by using the video feature matrix and the corresponding evaluation label.

In this embodiment of the present application, the mask face feature recognition model may be a model widely adopted in the machine learning field, for example, may be a support vector machine model or the like.

The training method for the mask face recognition model can simplify calculation, realize rapid establishment of the mask face feature recognition model and achieve good accuracy.

Based on the foregoing inventive concept, the present application further provides an electronic device. Fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present application. As shown in fig. 4, the first server comprises at least one processor 21, at least one memory 22 and at least one communication interface 23. A communication interface 23 for information transmission with an external device.

The various components in the first server are coupled together by a bus system 24. It will be appreciated that the bus system 24 is used to enable connected communications between these components. The bus system 24 includes a power bus, a control bus, and a status signal bus in addition to the data bus. The various buses are labeled as bus system 24 in fig. 4 for clarity of illustration.

It will be appreciated that the memory 22 in this embodiment may be either volatile memory or nonvolatile memory, or may include both volatile and nonvolatile memory. In some embodiments, the memory 22 stores the following elements, executable units or data structures, or a subset thereof, or an extended set thereof: an operating system and application programs.

The operating system includes various system programs, such as a framework layer, a core library layer, a driving layer, and the like, and is used for realizing various basic tasks and processing hardware-based tasks. Applications, including various applications such as media players (mediaplayers), browsers (browses), etc., are used to implement various application tasks. The program for implementing the mask face feature recognition model training method provided by the embodiment of the present disclosure may be included in the application program.

In the embodiment of the present disclosure, the processor 21 is configured to execute each step of the training method for facial feature recognition model provided in the embodiment of the present disclosure by calling a program or an instruction stored in the memory 22, specifically, a program or an instruction stored in an application program.

The mask face feature recognition model training method provided by the embodiment of the present disclosure may be applied to the processor 21 or implemented by the processor 21. The processor 21 may be an integrated circuit chip with signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware in the processor 21 or by instructions in the form of software. The processor 21 may be a general purpose processor, a digital signal processor (Digital Signal Processor, DSP), an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), an off-the-shelf programmable gate array (Field Programmable Gate Array, FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The steps of the mask face feature recognition model training method provided by the embodiment of the disclosure may be directly embodied in the execution of a hardware decoding processor or in the combined execution of hardware and software units in the decoding processor. The software elements may be located in a random access memory, flash memory, read-only memory, programmable read-only memory or electrically erasable programmable memory, registers, etc. as well known in the art. The storage medium is located in a memory 22 and the processor 21 reads the information in the memory 22 and in combination with its hardware performs the steps of the method.

The embodiments of the present disclosure further provide a non-transitory computer readable storage medium, where the non-transitory computer readable storage medium stores a program or instructions, and the program or instructions cause a computer to execute the steps of the embodiments of the training method for facial feature recognition model of a mask, so that the repeated description is avoided.

It should be noted that in this document, relational terms such as "first" and "second" and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises an element.

The foregoing is merely a specific embodiment of the application to enable one skilled in the art to understand or practice the application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A mask face feature recognition model training method, comprising:

dividing the sample facial feature video according to a set rule to obtain a sample partial video, wherein the set rule comprises at least two facial actions and execution time of each facial action, and the method comprises the following steps: dividing the feature video according to the facial action and the corresponding execution time to obtain the sample sub-video;

extracting image frames of each sample sub-video to form a corresponding frame sequence, including: extracting a facial image area of the image frame; combining the facial image areas in the order of the image frames to form the frame sequence;

extracting a feature matrix of each differential image; and combining the feature matrixes of the differential images according to the frame sequence to obtain a video feature matrix, wherein the video feature matrix comprises the following components: combining the feature matrixes of the differential images according to the arrangement sequence of the sample sub-videos and the corresponding frame sequence to obtain a video feature matrix;

2. The mask face feature recognition model training method of claim 1, wherein the facial motion comprises closing both eyes, relaxing and looking straight ahead and smiling and exposing teeth.

3. The method for training a facial feature recognition model of a mask according to any one of claims 1-2,

the feature matrix of the differential image comprises at least two feature parameters;

and combining the same-parameter vectors to obtain the video feature matrix.

4. A method of face recognition, comprising:

extracting image frames of the facial feature video to be evaluated to form a frame sequence;

processing the video feature matrix by using the mask face feature recognition model obtained by the method according to any one of claims 1-3 to obtain a mask face degree evaluation result.

5. A mask face recognition model training device, comprising:

the frame extraction unit divides the sample facial feature video according to a set rule to obtain a sample partial video, wherein the set rule comprises at least two facial actions and execution time of each facial action, and the frame extraction unit comprises: dividing the feature video according to the facial action and the corresponding execution time to obtain the sample sub-video; and extracting the image frames of each sample sub-video to form a corresponding frame sequence, comprising: extracting a facial image area of the image frame; combining the facial image areas in the order of the image frames to form the frame sequence;

a feature determining unit, configured to extract a feature matrix of each of the differential images; and combining the feature matrixes of the differential images according to the frame sequence to obtain a video feature matrix, wherein the video feature matrix comprises the following components: combining the feature matrixes of the differential images according to the arrangement sequence of the sample sub-videos and the corresponding frame sequence to obtain a video feature matrix;

and the model training unit is used for training the mask face recognition model by adopting the video feature matrix and the corresponding evaluation label.

6. An electronic device comprising a processor and a memory;

the processor is configured to perform the steps of the mask face feature recognition model training method according to any one of claims 1 to 3 by invoking a program or instructions stored in the memory.

7. A computer-readable storage medium storing a program or instructions that cause a computer to perform the steps of the mask face feature recognition model training method according to any one of claims 1 to 3.