Disclosure of Invention
Based on the problems discovered by analysis of the prior art scheme, the application provides a method and a device for training a mask face feature recognition model and a mask face recognition method.
In one aspect, the application provides a method for training a mask face feature recognition model, comprising:
acquiring a sample facial feature video and a corresponding evaluation label; the sample facial feature video is a video formed by a user executing operation according to a set rule;
extracting image frames of the sample facial feature video to form a frame sequence;
according to the frame sequence, carrying out differential operation on adjacent frames in the frame sequence to obtain a differential image;
extracting a feature matrix of each differential image; combining the characteristic matrixes of the differential images according to the frame sequence to obtain a video characteristic matrix;
and training the mask face feature recognition model by adopting the video feature matrix and the corresponding evaluation label.
Optionally, extracting image frames of the sample facial feature video to form a frame sequence comprises: dividing the sample facial feature video according to a set rule to obtain a sample divided video;
extracting image frames of each sample sub-video to form a corresponding frame sequence;
combining the feature matrices of the differential images according to the frame sequences to obtain a video feature matrix, wherein the video feature matrix comprises: and combining the characteristic matrix of each differential image according to the arrangement sequence of the sample sub-videos and the corresponding frame sequences to obtain the video characteristic matrix.
Optionally, the setting rule includes at least two facial movements and execution time of each of the facial movements;
dividing the sample facial feature video according to a set rule to obtain a sample video, comprising: and dividing the video characteristics according to the facial actions and the corresponding execution time to obtain the sample divided video.
Optionally, the facial action includes closing both eyes, relaxing and looking straight ahead, smiling and exposing teeth.
Optionally, extracting image frames of the sample facial feature video to form a frame sequence comprises:
extracting a face image region of the image frame;
combining the facial image regions in the order of the image frames to form the frame sequence.
Optionally, the feature matrix of the difference image includes at least two feature parameters;
combining the feature matrices of the differential images according to the frame sequences to obtain a video feature matrix, wherein the video feature matrix comprises:
extracting the same characteristic parameter of each characteristic matrix according to the frame sequence, and combining to form a vector with the same parameter;
and combining the vectors with the same parameters to obtain the video feature matrix.
In another aspect, the present application provides a mask face recognition method, including:
acquiring a facial feature video to be evaluated; the to-be-evaluated facial feature video is a video formed by a user executing operation according to a set rule;
extracting image frames of the facial evaluation video to be evaluated to form a frame sequence;
according to the frame sequence, carrying out differential operation on adjacent frames in the frame sequence to obtain a differential image;
extracting a feature matrix of each differential image; combining the characteristic matrixes of the differential images according to the frame sequence to obtain a video characteristic matrix;
and processing the video characteristic matrix by adopting the mask face characteristic identification model obtained by the method to obtain a mask face degree evaluation result.
In another aspect, the present application provides a mask face recognition model training device, including:
the source data acquisition unit is used for acquiring a sample facial feature video and a corresponding evaluation label; the sample facial feature video is a video formed by a user executing operation according to a set rule;
a frame extraction unit for extracting image frames of the sample facial feature video to form a frame sequence;
the difference processing unit is used for carrying out difference operation on adjacent frames in the frame sequence according to the frame sequence to obtain a difference image;
a feature determination unit configured to extract a feature matrix of each of the difference images; combining the characteristic matrixes of the differential images according to the frame sequence to obtain a video characteristic matrix;
and the model training unit is used for training the mask face feature recognition model by adopting the video feature matrix and the corresponding evaluation label.
According to the mask face feature recognition model training method, after a difference image is obtained by performing difference operation on adjacent frames in a frame sequence, the difference image is processed to extract a feature matrix, and the feature matrix representing a sample face feature video is obtained based on the feature matrix of the difference image.
Because the feature matrix for representing the sample facial feature video is determined based on the differential image, the change condition of the video content along with time is embodied, namely the delay degree and the controllable degree of the change of the facial expression of the tester according to the set rule are embodied, and then the facial mask characteristics of the tester can be embodied. Compared with the prior art in which the method for determining the facial mask features by directly extracting the features by using the facial features is directly adopted, the method can simplify the calculated amount, realize the rapid establishment of the facial mask feature identification model and achieve better accuracy.
Detailed Description
In order that the above-mentioned objects, features and advantages of the present application may be more clearly understood, the solution of the present application will be further described below. It should be noted that the embodiments and features of the embodiments of the present application may be combined with each other without conflict.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present application, but the present application may be practiced in other ways than those described herein; it is to be understood that the embodiments described in this specification are only some embodiments of the present application and not all embodiments.
The embodiment of the application provides a mask face feature recognition model training method, which is characterized in that training data for training a mask face feature recognition model are obtained by a new method, and the model is performed by the aid of the training data.
Fig. 1 is a flowchart of a mask face feature recognition model training method according to an embodiment of the present disclosure. As shown in fig. 1, the method provided by the embodiment of the present application includes steps S101 to S105.
S101: and acquiring a sample facial feature video and a corresponding evaluation label.
The sample facial feature video may be acquired by a tester using a mobile terminal such as a smart phone, or may be acquired by using a dedicated image acquisition device, which is not particularly limited in the embodiments of the present application.
In the embodiment of the application, the sample facial feature video is a video formed by a tester executing facial actions according to a set rule; the set rules comprise parameters such as the type of the facial action of the tester, the execution time of the facial action and the like. When obtaining a sample facial feature video, a tester must autonomously control facial muscles to perform corresponding actions according to set rules.
In the embodiment of the present application, the setting rule may include one facial action and corresponding execution time, or may include a plurality of facial actions and corresponding execution times.
In one specific application, the set rule includes three facial movements, which are closing both eyes, relaxing and looking straight ahead and tiny and exposing teeth, respectively; three facial movements all are designed to some characteristics that the mask face has for observe the facial rigidity degree and the face of tester (also be the tester) and change lazy degree, wherein: closing the eyes to observe the relaxation degree of the face of the tester, relaxing and directly viewing the front to observe the eye condition of the tester, and smiling and exposing the teeth to observe the mouth angle and the eye condition of the tester.
In order to capture each facial movement reasonably and reflect the degree of stiffness and slowness of change of the face, each facial movement should be performed properly. In one application of the embodiment of the present application, the execution time of each face motion is 5s, and the execution of three face motions is continuous.
In the embodiment of the application, the evaluation tag is obtained by a professional viewing the sample facial feature video or observing the corresponding facial features of the test object. In one application, the rating labels may include five grades, normal, mild, moderate and severe; normal characterizes normal facial expression; slightly corresponding to the situation that the tester has the instantaneous frequency reduction; mild response test persons with reduced frequency of hits, reduced lower facial expression (e.g., spontaneous smiley lips not separated); the lips of the moderate corresponding testers can be opened when the mouth is still; the severe counterpart lips are mostly open when immobilized.
S102: image frames of a sample facial feature video are extracted to form a sequence of frames.
In the embodiment of the present application, after the sample facial feature video is obtained, the sample facial feature video needs to be subjected to framing processing to form a frame sequence for subsequent processing by a user.
In practical applications, the strategy for forming the frame sequence may be different according to the sampling frequency of the sample facial feature video. If the sampling frequency is small, all image frames of the sample facial feature video can be directly used as one frame in the frame sequence; if the sampling frequency is large, a method of interval sampling can be adopted to extract partial frames in the sample facial feature video as the frame sequence.
In practical application, when a tester performs video acquisition of sample facial features, an image acquisition device (such as a smartphone of a user) is displaced relative to the face of the user, so that the position of the face of the user moves.
In order to extract valid information and exclude unnecessary information, in step S102, the face area of the user in each image frame may be determined, and the face area of the user may be extracted and other areas may be deleted as pixel points of each frame in the frame sequence. For example, in one application of the embodiment of the present application, a face region of each image frame may be determined by an edge recognition method or a deep learning method, and the size of the face region is made to be 64 × 64 pixels by a difference value method or a pixel combination method, so as to meet the requirement of consistency in subsequent processing.
In practical application of the embodiment of the present application, if the set rule in step S101 includes three actions, that is, the sample facial feature video includes video contents of a plurality of actions, in step S102, the sample facial feature video may be further divided according to the set rule (that is, the execution time of each facial action in the set rule) to obtain sample partial videos, and each sample partial video is subjected to frame processing to form a frame sequence corresponding to each facial action. Of course, in other embodiments, the frame sequence may be determined and then divided according to the processing rule.
In the embodiment of the application, in order to improve the processing efficiency, under the condition that the sample facial feature video is a color video, the sample facial feature video can be converted into a black-and-white gray-scale video, and then a frame sequence is obtained; or after acquiring the frame sequence, performing gray scale conversion on the content in the frame sequence to form a black-and-white gray scale video.
S103: and carrying out difference operation on adjacent frames in the frame sequence according to the frame sequence to obtain a difference image.
In step S103, the difference operation is performed on the adjacent frames in the frame sequence, in which the gray data of the pixel points corresponding to the two adjacent frames in the frame sequence are subtracted, and the absolute value of the subtracted value is used as the pixel gray value of the corresponding point in the corresponding difference image. It is conceivable that the difference operation determines the change except for the adjacent frames, and therefore the difference image embodies the change of the facial features of the user.
Specifically, if there are k frame sequences, for the mth frame sequence, the difference value of the corresponding pixel point (i, j) in the tth image frame is adopted
And (4) calculating.
S104: extracting a characteristic matrix of each differential image; and combining the characteristic matrixes of the differential images according to the frame sequence to obtain a video characteristic matrix.
After determining the difference image, feature extraction may be performed on the difference image to determine a feature matrix characterizing the difference image.
In this embodiment of the application, the feature matrix of the difference image may include the following parameters: the method comprises the steps of obtaining information entropy, a maximum value, a minimum value, an average value, a variance, skewness, kurtosis, a median, a range, a quartile 1, a quartile 3, a quartile difference and a correlation coefficient, and constructing a feature matrix of a differential image by using the parameters. In practical applications, the feature matrix of the difference image may be only one feature vector, and the parameters in the feature vector are arranged in a set order.
Suppose that the proportion of pixels at each gray value in a certain difference image is p
iThen entropy of information is adopted
And (4) showing.
The gray scale of all pixels in a certain differential image is x
iThen the maximum value is
Minimum value of
Has an average value of
Variance of
Skewness of
Kurtosis of
The range is Ptp ═ max-min, and the quartile number difference is DQ ═ Q
3-Q
1The correlation coefficient is
Cov(x
t,x
t+1)=E(x
tx
t+1)-E(x
t)(x
t+1)。
And after the characteristic matrixes of the differential images are obtained, combining the characteristic matrixes of the differential images according to the sequence to obtain a video characteristic matrix. In practical application, the video feature matrix is obtained according to the feature matrix of each differential image, the same feature parameter in the feature matrix of each differential image can be extracted, the parameters are sequenced according to the sequencing of video frames to obtain corresponding feature vectors, and the feature vectors corresponding to all the parameters are combined into the video feature matrix.
In some applications of the embodiment of the application, when the sample facial feature video is split into the plurality of sample sub-videos, the video feature matrix of each sample sub-video may be calculated first, and then the feature matrices of each sample sub-video may be combined into the feature matrix of the sample facial feature video.
S105: and training a mask face feature recognition model by adopting the video feature matrix and the corresponding evaluation label.
Step S105 is a process of training the mask face feature recognition model based on the association relationship between the video feature matrix and the corresponding facial face evaluation label after the video feature matrix and the corresponding facial face evaluation label are determined. In the embodiment of the present application, the mask face feature recognition model may be a model widely used in the field of machine learning, and may be, for example, a support vector machine model.
As can be seen from the foregoing description and analysis in steps S101 to S105, in the method for training a mask face feature recognition model provided in the embodiment of the present application, after a difference image is obtained by performing a difference operation on adjacent frames in a frame sequence, the difference image is processed to extract a feature matrix, and a feature matrix representing a sample face feature video is obtained based on the feature matrix of the difference image. Because the feature matrix of the video representing the facial features of the sample is determined based on the differential image, the change condition of the video content along with time is represented, the delay degree and the controllable degree of the change of the facial expression of the tester according to the set rule are reflected, and the facial mask characteristics of the tester can be further reflected.
Compared with the method for determining the facial mask features by directly extracting the features by using the facial features of the human face in the prior art, the method can simplify calculation, realize quick establishment of a facial mask feature recognition model and achieve better accuracy.
In addition, in some specific applications of the embodiment of the application, the sample facial feature video is a video formed when a shooting tester takes an action according to a set rule, and the set rule is a corresponding test index feature, so that a feature matrix corresponding to the sample facial feature video also corresponds to the corresponding test index feature, and a mask facial feature recognition model is more accurate.
After the mask face feature recognition model is obtained, the model can be stored in an APP application program, the APP application program is distributed to corresponding user sides, the user sides process the collected sample face feature videos to obtain video feature matrixes, the video feature matrixes are used as model input, and corresponding classification results are obtained.
The mask face obtained based on the mask face recognition model training method provided in the foregoing is a recognition model, and the embodiment of the application further provides a mask face recognition method. Fig. 2 is a flowchart of a mask face recognition method according to an embodiment of the present disclosure. As shown in fig. 2, the method for identifying the mask size provided by the embodiment of the present application includes steps S201 to S205.
S201: acquiring a facial feature video to be evaluated; the to-be-evaluated facial feature video is a video formed by a user executing operation according to a set rule.
S202: image frames of a face evaluation video to be evaluated are extracted to form a frame sequence.
S203: and carrying out difference operation on adjacent frames in the frame sequence according to the frame sequence to obtain a difference image.
S204: extracting a characteristic matrix of each differential image; and combining the characteristic matrixes of the differential images according to the frame sequence to obtain a video characteristic matrix.
The execution steps of the foregoing steps S201-S204 are substantially the same as the implementation steps of the steps S101-S105, except that the facial feature video to be evaluated is processed in the step S201, and the evaluation tag does not need to be determined; the specific operation of the related steps can be referred to the foregoing description, and will not be repeated here.
S205: and processing the video characteristic matrix by adopting a mask face characteristic identification model to obtain a mask face degree evaluation result.
In step S205, the video feature matrix determined in step S204 is input into the mask feature recognition model determined in the foregoing, so that the mask face degree evaluation result can be obtained. In the case where the evaluation labels in the foregoing model were normal, mild, moderate, and severe, the mask face degree evaluation result was any of the foregoing labels.
In addition to providing the aforementioned training method for the mask face recognition model, the embodiment of the present application further provides a training device for the mask face feature recognition model. Fig. 3 is a schematic structural diagram of a mask face recognition apparatus according to an embodiment of the present application, and as shown in fig. 3, the mask face recognition model training apparatus includes a source data obtaining unit 11, a frame extracting unit 12, a difference processing unit 13, a feature determining unit 14, and a model training unit 15.
The source data acquiring unit 11 is used for acquiring a sample facial feature video and a corresponding evaluation label.
In the embodiment of the application, the sample facial feature video is a video formed by a tester executing facial actions according to a set rule; the set rules comprise parameters such as the type of the facial action of the tester, the execution time of the facial action and the like. When obtaining a sample facial feature video, a tester must autonomously control facial muscles to perform corresponding actions according to set rules.
In the embodiment of the present application, the setting rule may include one facial action and corresponding execution time, or may include a plurality of facial actions and corresponding execution times.
In one specific application, the set rule includes three facial movements, which are closing both eyes, relaxing and looking straight ahead and tiny and exposing teeth, respectively; three facial movements all are designed to some characteristics that the mask face has for observe the facial rigidity degree and the face of tester (also be the tester) and change lazy degree, wherein: closing the eyes to observe the relaxation degree of the face of the tester, relaxing and directly viewing the front to observe the eye condition of the tester, and smiling and exposing the teeth to observe the mouth angle and the eye condition of the tester.
In order to capture each facial movement reasonably and to reflect the degree of stiffness and slowness of change of the face, each facial movement should be performed properly. In one application of the embodiment of the present application, the execution time of each face motion is 5s, and the execution of three face motions is continuous.
In the embodiment of the application, the evaluation tag is obtained by a professional viewing the sample facial feature video or observing the corresponding facial features of the test object. In one application, the rating labels may include five grades, normal, mild, moderate and severe; normal characterizes normal facial expression; slightly corresponding to the situation that the tester has the instantaneous frequency reduction; mild response test persons with reduced frequency of hits, reduced lower facial expression (e.g., spontaneous smiley lips not separated); the lips of the moderate corresponding testers can be opened when the mouth is still; the severe counterpart lips are mostly open when immobilized.
The frame extraction unit 12 is configured to extract image frames of the sample facial feature video to form a frame sequence.
In practical applications, the strategy for the frame extraction unit 12 to form the frame sequence may be different according to the sampling frequency of the sample facial feature video. For example, if the sampling frequency is small, all image frames of the sample facial feature video may be directly taken as one frame in the frame sequence; if the sampling frequency is large, a method of interval sampling can be adopted to extract partial frames in the sample facial feature video as the frame sequence.
In order to extract valid information and exclude unnecessary information, the frame processing unit may further determine a face region of the user in each image frame, extract the face region of the user and delete other regions as pixel points of each frame in the frame sequence.
The difference processing unit 13 is configured to perform difference operation on adjacent frames in the frame sequence according to the frame sequence to obtain a difference image.
The characteristic determining unit 14 is configured to extract a characteristic matrix of each of the difference images; and combining the characteristic matrixes of the differential images according to the frame sequence to obtain a video characteristic matrix.
In practical application, the video feature matrix is obtained according to the feature matrix of each differential image, the same feature parameter in the feature matrix of each differential image can be extracted, the parameters are sequenced according to the sequencing of video frames to obtain corresponding feature vectors, and the feature vectors corresponding to all the parameters are combined into the video feature matrix.
In some applications of the embodiment of the application, when the sample facial feature video is split into a plurality of sample sub-videos, the video feature matrix of each sample sub-video may be calculated on line, and then the feature matrices of each sample sub-video are combined into the feature matrix of the sample facial feature video.
The model training unit 15 is configured to train the mask face feature recognition model by using the video feature matrix and the corresponding evaluation labels.
In the embodiment of the present application, the mask face feature recognition model may be a model widely used in the field of machine learning, and may be, for example, a support vector machine model.
The training method for the mask face recognition model provided by the embodiment of the application can simplify calculation, realize quick establishment of the mask face feature recognition model and achieve better accuracy.
Based on the inventive concept, the application also provides an electronic device. Fig. 4 is a schematic structural diagram of an electronic device provided in an embodiment of the present application. As shown in fig. 4, the first server comprises at least one processor 21, at least one memory 22 and at least one communication interface 23. And a communication interface 23 for information transmission with an external device.
The various components in the first server are coupled together by a bus system 24. Understandably, the bus system 24 is used to enable connective communication between these components. The bus system 24 includes a power bus, a control bus, and a status signal bus in addition to a data bus. For clarity of illustration, the various buses are labeled as bus system 24 in fig. 4.
It will be appreciated that the memory 22 in this embodiment may be either volatile memory or nonvolatile memory, or may include both volatile and nonvolatile memory. In some embodiments, memory 22 stores elements, executable units or data structures, or a subset thereof, or an expanded set thereof: an operating system and an application program.
The operating system includes various system programs, such as a framework layer, a core library layer, a driver layer, and the like, and is used for implementing various basic tasks and processing hardware-based tasks. The application programs include various application programs such as a media player (MediaPlayer), a Browser (Browser), etc. for implementing various application tasks. The program for implementing the mask face feature recognition model training method provided by the embodiment of the disclosure may be included in an application program.
In the embodiment of the present disclosure, the processor 21 is configured to execute the steps of the training method for the mask face feature recognition model provided in the embodiment of the present disclosure by calling a program or an instruction stored in the memory 22, which may be specifically a program or an instruction stored in an application program.
The mask face feature recognition model training method provided by the embodiment of the disclosure can be applied to the processor 21, or implemented by the processor 21. The processor 21 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware or instructions in the form of software in the processor 21. The Processor 21 may be a general-purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, a discrete Gate or transistor logic device, or a discrete hardware component. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The steps of the mask face feature recognition model training method provided by the embodiment of the disclosure can be directly implemented by a hardware decoding processor, or implemented by combining hardware and software units in the decoding processor. The software elements may be located in ram, flash, rom, prom, or eprom, registers, among other storage media that are well known in the art. The storage medium is located in a memory 22, and the processor 21 reads the information in the memory 22 and performs the steps of the method in combination with its hardware.
The embodiments of the present disclosure further provide a non-transitory computer-readable storage medium, where a program or an instruction is stored in the non-transitory computer-readable storage medium, where the program or the instruction causes a computer to execute the steps of the mask face feature recognition model training method in each embodiment, and in order to avoid repeated description, the steps are not repeated here.
It is noted that, in this document, relational terms such as "first" and "second," and the like, may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The foregoing are merely exemplary embodiments of the present application and are presented to enable those skilled in the art to understand and practice the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.