WO2021143667A1

WO2021143667A1 - Facial expression analysis method and system, and facial expression-based satisfaction analysis method and system

Info

Publication number: WO2021143667A1
Application number: PCT/CN2021/071233
Authority: WO
Inventors: 郭明坤
Original assignee: 北京灵汐科技有限公司
Priority date: 2020-01-13
Filing date: 2021-01-12
Publication date: 2021-07-22
Also published as: CN113111690B; CN113111690A

Abstract

A facial expression analysis method and system. The method comprises: obtaining a facial expression video clip to be analyzed, and obtaining a picture stream in the facial expression video clip; determining, according to a facial expression index of each frame of picture in the picture stream, a facial expression spectrogram corresponding to the picture stream; determining, according to the facial expression spectrogram, a reference line corresponding to the face in a natural state, and determining, on the basis of the reference line, a natural emotion area of the face in the natural state; and by taking the natural emotion area as a reference, dividing the facial expression spectrogram into a plurality of emotion areas corresponding to different expressions. Also provided are a facial expression-based satisfaction analysis method and system.

Description

Facial expression analysis method and system and facial expression satisfaction analysis method and system

This application claims the priority of a Chinese patent application filed with the Chinese Patent Office with an application number of 202010033040.1 on January 13, 2020, and the entire content of this application is incorporated into this application by reference.

Technical field

This application relates to the technical field of data analysis, for example, to a method and system for analyzing facial expressions and a method and system for analyzing facial expression satisfaction.

Background technique

In related technologies, when performing facial expression analysis, most of them only train a neural network model based on training data and corresponding annotation information, and input the object to be predicted into the trained neural network model to obtain the facial expression analysis result. The facial expressions are fluctuating, that is, the face does not keep happy, calm, or angry every second, which makes the obtained facial expression analysis results inaccurate.

Summary of the invention

This application provides a facial expression analysis method and system and a facial expression satisfaction analysis method and system. The complete video information of the user’s expression is used, and the fluctuation of the expression is fully considered. The real emotion of the user can be determined and accurately Determine user satisfaction.

This application provides a method for analyzing facial expressions, including:

Acquiring a face expression video clip to be analyzed, and acquiring a picture stream in the face expression video clip;

Determine the face expression spectrogram corresponding to the picture stream according to the facial expression index of each frame of the picture in the picture stream;

Determine the reference line corresponding to the human face in the natural state according to the face expression spectrogram, and determine the natural emotional area of the human face in the natural state based on the reference line;

Using the natural emotional region as a reference, the human facial expression spectrogram is divided into multiple emotional regions corresponding to different expressions.

This application also provides a facial expression satisfaction analysis method. After adopting the aforementioned facial expression analysis method, it also includes: in each time period in the facial expression video clip to be analyzed, The facial expression spectrogram corresponding to multiple emotional regions of different expressions is analyzed and calculated to determine the user's satisfaction.

This application also provides a facial expression analysis system, which adopts the aforementioned facial expression analysis method, including:

The picture acquisition module is configured to acquire a video segment of the facial expression to be analyzed, and acquire the picture stream in the facial expression video segment;

The expression spectrum module is set to determine the facial expression spectrum map corresponding to the picture stream according to the facial expression index of each frame of the picture stream;

An expression reference module, configured to determine a reference line corresponding to the human face in a natural state according to the facial expression spectrogram, and determine the natural emotional area of the human face in the natural state based on the reference line;

The expression partition module is set to divide the facial expression spectrogram into multiple emotional regions corresponding to different expressions based on the natural emotional region.

This application also provides a facial expression satisfaction analysis system, which adopts the aforementioned facial expression analysis system, including:

An expression spectrum module, configured to determine the facial expression spectrum map corresponding to the picture stream according to the facial expression index of each frame of the picture stream;

An expression reference module, configured to determine a reference line corresponding to a human face in a natural state according to the facial expression spectrogram, and determine the natural emotional region of the human face in the natural state based on the reference line;

An expression partition module, configured to divide the facial expression spectrogram into a plurality of emotional regions corresponding to different expressions based on the natural emotional region;

The satisfaction calculation module is configured to analyze and calculate multiple emotional regions corresponding to different expressions in the facial expression spectrogram within each time period in the facial expression video clip to be analyzed to determine the user's satisfaction.

The present application also provides an electronic device, including a memory and a processor, the memory is configured to store one or more computer instructions, wherein the one or more computer instructions are executed by the processor to realize the face Expression analysis method and facial expression satisfaction analysis method.

This application also provides a computer-readable storage medium that stores a computer program that is executed by a processor to realize the facial expression analysis method and the facial expression satisfaction analysis method.

Description of the drawings

FIG. 1 is a schematic flowchart of a method for analyzing facial expressions according to an embodiment of the present disclosure;

FIG. 2 is a schematic flowchart of a method for analyzing facial expression satisfaction according to an embodiment of the present disclosure;

3 is a schematic diagram of a human facial expression spectrogram corresponding to a picture stream according to an embodiment of the disclosure;

4 is a schematic diagram of a reference interval for determining a reference line according to an embodiment of the disclosure;

FIG. 5 is a schematic diagram of multiple emotional regions according to an embodiment of the present disclosure;

Fig. 6 is a schematic diagram of the frequency and percentage of three emotions, positive, natural, and negative in each time period according to an embodiment of the present disclosure.

Detailed ways

The technical solutions in the embodiments of the present disclosure will be described below in conjunction with the accompanying drawings in the embodiments of the present disclosure. The described embodiments are only a part of the embodiments of the present disclosure, rather than all the embodiments.

If there are directional indications (such as up, down, left, right, front, back...) in the embodiments of the present disclosure, the directional indications are only used to explain the multiple directions in a specific posture (as shown in the figure). The relative positional relationship and movement of the components, etc., when the specific posture changes, the directional indication also changes accordingly.

In addition, in the description of the present disclosure, the terms used are for illustrative purposes only, and are not intended to limit the scope of the present disclosure. The terms "including" and/or "including" are used to specify the existence of the described elements, steps, operations and/or components, but do not exclude the presence or addition of one or more other elements, steps, operations and/or components . The terms "first", "second", etc. may be used to describe various elements, do not represent an order, and do not limit these elements. In addition, in the description of the present disclosure, "plurality" means two or more. These terms are only used to distinguish one element from another. The drawings are used for illustration purposes only to depict the embodiments of the present disclosure.

A face expression analysis method according to an embodiment of the present disclosure, as shown in FIG. 1, includes:

S1: Obtain a face expression video clip to be analyzed, and obtain a picture stream in the face expression video clip.

In an alternative embodiment, when acquiring the picture stream, you can extract frames frame by frame (that is, extract each frame), frame at fixed intervals (for example, one frame per second), or extract key frames (that is, according to the picture The image stream in the facial expression video segment is obtained by extracting i-frames). In this embodiment, when acquiring the picture stream in the video segment, the facial expression video segment can be divided to obtain multiple video sub-segments, and at least one frame of each video sub-segment is randomly or fixedly extracted, and according to the extracted Multiple picture frames determine the picture stream in the facial expression video clip. In this embodiment, by collecting complete video information including the user's facial expression, the fluctuation of the facial expression is fully considered, and the one-sidedness and inaccuracy of determining the user's emotion are avoided through a single frame of image.

S2: Determine a face expression spectrogram corresponding to the picture stream according to the facial expression index of each frame of the picture stream.

For example, the facial expression index ranges from 0 to 100. The higher the index, the more positive the expression, that is, the more inclined to emotions such as happiness and surprise. The lower the index, the more negative the expression, that is, the more inclined to emotions such as anger and fear. The index near the middle indicates that the expression is in a natural state. For example, according to the time stamp information of each frame of the picture in the picture stream, a face expression spectrogram is generated according to the facial expression index of each frame and the corresponding time information. The generated facial expression spectrogram can intuitively display the fluctuation process of the user's expression.

In an optional implementation manner, before analyzing the facial expression index of each frame of the picture stream, the method further includes: performing face detection on each frame of the picture stream to obtain the face of each frame of the picture. Image, and then analyze the facial expression index of each frame to obtain the facial expression spectrogram.

When performing face detection, the face detection algorithm can, for example, use Multi-Task Convolutional Neural Network (MTCNN), Single Shot Multi-boxes Detector (SSD) or target detection Algorithms (You Only Look Once V3, YOLOV3), etc., but not limited to the ones listed, you can choose according to your needs. By analyzing each frame of the picture, a facial expression index is output.

In an optional implementation manner, S2 includes:

S21: Divide the face image into multiple regions, and each region contains multiple key feature points for determining the facial expression index.

Optionally, perform face feature point recognition on the face image to obtain multiple feature points of the face image, and identify multiple key feature points for determining the facial expression index from the multiple feature points; The key feature points divide the face image into multiple regions, and each region contains multiple key feature points used to determine the facial expression index. If all the feature points are directly used to determine the facial expression index, the amount of calculation will be increased. Through key feature point recognition, the amount of calculation can be reduced while ensuring the accuracy of the facial expression index. For example, there are 106 feature points on a face image. From these 106 feature points, some of the key feature points used to determine the facial expression index are identified, such as the key feature points of the mouth, the key feature points of the eye, and the key feature of the eyebrows. Wait. When recognizing the multiple feature points, feature point recognition can be performed based on the trained neural network model. It is not limited to the above methods, and adaptive selection and adjustment of feature point recognition methods can be carried out.

In an optional implementation manner, key feature point recognition may also be performed on the face image to obtain multiple key feature points used to determine the facial expression index. For example, the key feature points are the key feature points of the mouth, the key feature points of the eye, and the key feature points of the eyebrows. The neural network model obtained by training can be used to process the face image, for example, the face image is input to the trained nerve Processed in the network model, the key feature points of the mouth, the key features of the eyes, and the key feature points of the eyebrows in the multi-frame face image are obtained.

Optionally, the face image can be divided to obtain multiple regions of the face image according to the region of the reference facial organs, and the key feature points of the face images of the multiple regions can be extracted to obtain the multiple regions included in the multiple regions. Key feature points. For example, referring to facial organs including mouth, eyes, and eyebrows, three areas of a face image can be obtained, and the images of the three areas can be input into the mouth key feature point detection model, the eye key feature point detection model, and the eyebrows. The key feature point detection model obtains multiple key feature points contained in multiple regions.

S22: Determine the key feature points included in the multiple regions for each frame of the picture, determine the expression scores corresponding to the multiple regions, and determine the facial expression index of each frame of the picture according to the expression scores corresponding to the multiple regions.

Optionally, determine at least one included angle between the lines of key feature points in each region, and determine the expression score corresponding to each region based on the at least one included angle; determine the weight corresponding to each region; The expression scores corresponding to multiple regions and the weights corresponding to multiple regions determine the facial expression index of each frame. For example, when calculating at least one included angle and the region, each region corresponds to a weight. The weight of each region can be different. The sum of the weights of all regions is 1. Each region contains at least one included angle. The angle corresponds to an expression score (for example, a percentile system), and the included angle and area are weighted to obtain the facial expression index. Since there are many feature points in the face area, calculating the angle between the two feature points will increase the amount of calculation. After selecting the key feature points for weighted calculation of the facial expression index, you can directly calculate these The key feature points calculate the angle of the connection line, which reduces the amount of calculation. Among them, there can be many lines between the key feature points of an area, and you can select the target line to calculate the included angle, such as the angle between the lines of adjacent key feature points, the key feature points at both ends and the middle key feature point The angle of the connection, etc. In this way, on the basis of ensuring the accuracy of the facial expression index, the amount of calculation can be reduced and the processing efficiency can be improved.

In an alternative embodiment, the contour information of the facial organs contained in the multiple regions may be determined based on the key feature points contained in the multiple regions, and the contour information of the facial organs contained in the multiple regions may be determined respectively. The expression score corresponding to each area.

S23: Acquire a face expression spectrogram corresponding to the picture stream according to the face expression index of all pictures.

After the above weighting calculation, the facial expression index corresponding to each frame of the picture is obtained, and then the facial expression spectrogram corresponding to the picture stream is obtained, as shown in FIG. 3.

S3: Determine a reference line corresponding to the human face in a natural state according to the facial expression spectrogram, and determine the natural emotional area of the human face in the natural state based on the reference line.

Since each person’s expression in the natural state is different, some people are born with bitter melon faces, and some are born with smiles, so the baseline of each person’s natural state is different. By finding the baseline, it can be more accurate Based on the user’s baseline, the user’s true emotions are determined, thereby effectively improving the accuracy of facial expression recognition.

In an optional implementation manner, S3 includes:

S31: In the face expression spectrogram, determine the first interval where the face expression index appears most frequently.

People are in a natural state most of the time in the process of receiving services, and the horizontal line at the point with the highest frequency can be used as the baseline. But the value of each point cannot be exactly the same, so it is necessary to find a reference interval (that is, the first interval), that is, the interval with the highest frequency. For example, an interval width may be preset, and according to the interval width, the first interval in which the facial expression index appears most frequently in the facial expression spectrogram is determined.

S32: According to the first interval, determine a reference line corresponding to the human face in a natural state.

The first interval where the facial expression index appears most frequently in the facial expression spectrogram can more truly reflect the expression state of the current user in the natural state, and the baseline determined based on this can accurately reflect the expression of the current user in the natural state.

Optionally, S32 includes:

Determine the horizontal centerline of the first interval;

If the facial expression index corresponding to the horizontal center line is greater than the first threshold and less than the second threshold, the horizontal center line is determined as the reference line corresponding to the human face in a natural state;

If the facial expression index corresponding to the horizontal center line is less than or equal to the first threshold, the horizontal line corresponding to the first threshold is determined as the reference line corresponding to the human face in a natural state;

If the facial expression index corresponding to the horizontal center line is greater than or equal to the second threshold, the horizontal line corresponding to the second threshold is determined as the reference line corresponding to the human face in a natural state.

As shown in Figure 4, for example, you can set the interval width to 20. In the process of determining the baseline, use this interval to scan the facial expression spectrogram from bottom to top to find the interval with the highest frequency. The baseline is the interval Horizontal center. A schematic diagram of the determined baseline is shown in Figure 3.

As shown in Figure 3, in order to avoid positive or negative emotions throughout the process, for example, the baseline can be set between 30-60. If the actual measured baseline is higher than 60, set it to 60 , If the actual measured baseline is lower than 30, set it to 30 to avoid the special situation of positive emotions or negative emotions throughout the process. The set values of the first threshold and the second threshold of the baseline can be adjusted adaptively, and are not limited to the above-mentioned values.

S33, taking the reference line as the center, and taking the second interval within the set width range of the upper and lower sides of the reference line as the natural emotional area of the human face in the natural state.

As shown in FIG. 4, after the baseline is obtained, for example, an area of 15 up and down and a total width of 30 can be used to represent the facial expression in a natural state, that is, the natural emotional area of the human face in the natural state. The width of the natural emotion zone can be adjusted adaptively and is not limited to the above values.

S4: Using the natural emotional region as a reference, divide the facial expression spectrogram into multiple emotional regions corresponding to different expressions.

As shown in FIG. 5, in the human facial expression spectrogram, the area above the natural emotion area can be determined as a positive emotion area, and the area below the natural emotion area can be determined as a negative emotion area. Through the reference area, the natural emotion area, the user's emotions can be stratified, avoiding positive or negative emotions throughout the process.

This embodiment utilizes the complete video information of the user's facial expressions, fully considers the fluctuations of the facial expressions, and fully considers the differences in the natural state of the individual users, and obtains the baseline corresponding to the natural state based on frequency analysis, which can determine the user's true emotions. Set the reference interval corresponding to the natural state according to the user's reference line to avoid the situation of positive or negative expressions throughout the process. The user’s emotions are stratified through the reference area, and the time period of the user’s video clips are weighted. The user’s emotion type and time weight are comprehensively considered, and the user’s satisfaction can be determined more accurately.

The method for analyzing facial expression satisfaction according to the embodiment of the present disclosure, as shown in FIG. 2, after adopting the aforementioned method for analyzing facial expression, further includes: S5, in the facial expression video clip to be analyzed In each time period in, analyze and calculate multiple emotional regions corresponding to different expressions in the facial expression spectrogram to determine user satisfaction.

In an optional implementation manner, S5 includes:

S51: Divide the facial expression video clip to be analyzed into multiple time periods, and calculate the proportions of different facial expressions in the multiple time periods according to the facial expression spectrogram.

S52: Determine weights corresponding to multiple time periods.

S53: Determine the satisfaction result according to the proportions of different expressions in the multiple time periods and the weights corresponding to the multiple time periods.

Since the satisfaction of "cry and laugh and walk" is definitely higher than that of "laugh and cry and walk", the weight of different emotions in time needs to be designed to be different. For example, the weights can be set as: the first 20% of the time emotions account for 10% of the weight, the last 10% emotions account for 60% of the weight, and the middle part account for the remaining 30% of the weight. The time weight can be adjusted appropriately. After obtaining multiple emotional regions according to the method described in step S4, the proportions of each emotional region in the facial expression spectrogram in different time periods are respectively counted. In each time period, determine the corresponding weight of each emotion area according to the proportion of each emotion area in the facial expression spectrogram, and then determine the weight of each time period and the weight of each emotion area in each time period. The weight is weighted and calculated to obtain the user's satisfaction coefficient. The user satisfaction coefficient can be obtained by subtracting the weighted calculated value of negative sentiment from the weighted value of positive sentiment. The weight of the time period of the user's video segment is divided, combined with the proportion of the emotional area in each time period, and the user's emotional type and time weight are comprehensively considered, which can more accurately determine the user's satisfaction.

As shown in FIG. 6, in this embodiment, it can be seen that the positive emotions of the user gradually increase, and the negative emotions gradually decrease. According to the example shown in Figure 6, the user’s satisfaction coefficient can be calculated=(31%*10%+28%*30%+37%*60%)-(31%*10%+22%*30%+ 11%*60%)=0.174.

Optionally, the satisfaction result can be determined according to the satisfaction coefficient. For example, when the satisfaction coefficient is greater than or equal to the satisfaction threshold, the satisfaction result is satisfactory. When the satisfaction coefficient is less than or equal to the dissatisfaction threshold, the satisfaction result is unsatisfactory. When the satisfaction coefficient is greater than the dissatisfaction threshold and less than the satisfaction threshold, the satisfaction result is fair. For example, it can be set to be greater than or equal to 0.1 as satisfactory, less than or equal to -0.1 as unsatisfactory, and between -0.1 and 0.1 as normal. Among them, both the satisfactory threshold and the unsatisfactory threshold can be adjusted adaptively.

The facial expression analysis system according to the embodiment of the present disclosure includes: a picture acquisition module, an expression frequency spectrum module, an expression reference module, and an expression partition module.

The picture acquisition module is configured to acquire the facial expression video segment to be analyzed, and to acquire the picture stream in the facial expression video segment.

In an alternative embodiment, when acquiring the picture stream, you can extract frames frame by frame (that is, extract each frame), frame at fixed intervals (for example, one frame per second), or extract key frames (that is, according to the picture The image stream in the facial expression video clip is obtained by extracting i-frames). In this embodiment, when acquiring the picture stream in the video segment, the video segment can be divided to obtain multiple video sub-segments, and at least one frame in each video sub-segment is randomly or fixedly extracted, based on the extracted multiple pictures Frame, determine the picture stream in the facial expression video clip. In this embodiment, by collecting complete video information including the user's facial expression, the fluctuation of the facial expression is fully considered, and the one-sidedness and inaccuracy of determining the user's emotion are avoided through a single frame of image.

The expression spectrum module is configured to determine the facial expression spectrum map corresponding to the picture stream according to the facial expression index of each frame of the picture stream.

For example, the facial expression index ranges from 0 to 100. The higher the index, the more positive the expression, that is, the more inclined to emotions such as happiness and surprise. The lower the index, the more negative the expression, that is, the more inclined to emotions such as anger and fear. The index near the middle indicates that the expression is in a natural state. For example, according to the time stamp information of each frame in the picture stream, the facial expression spectrogram is generated according to the facial expression index of each frame and the corresponding time information. The generated facial expression spectrogram can intuitively display the fluctuation process of the user's expression.

In an optional implementation manner, before analyzing the facial expression index of each frame of the picture stream, it further includes a face detection module configured to perform face detection on each frame of the picture stream to obtain each frame The face image in the picture, and then the face expression index of each frame of the picture is analyzed to obtain the face expression spectrogram.

When performing face detection, the face detection algorithm can be selected, for example, MTCNN, SSD, or YOLOV3, etc., but it is not limited to the listed ones, and can be selected according to requirements. By analyzing each frame of the picture, a facial expression index is output.

In an optional implementation manner, the expression spectrum module may include a face area division module, an expression index determination module, and an expression spectrogram determination module.

The face region dividing module is configured to divide the face image into multiple regions, and each region contains multiple key feature points for determining the facial expression index.

In an optional implementation manner, the face region dividing module includes: performing face feature point recognition on the face image to obtain multiple feature points of the face image, and identifying from the multiple feature points for determining the face Multiple key feature points of the expression index; according to the multiple key feature points, the face image is divided into multiple regions, and each region contains multiple key feature points for determining the facial expression index. If all the feature points are directly used to determine the facial expression index, the amount of calculation will be increased. Through key feature point recognition, the amount of calculation can be reduced while ensuring the accuracy of the facial expression index. For example, there are 106 feature points on a face image. From these 106 feature points, some of the key feature points used to determine the facial expression index are identified, such as the key feature points of the mouth, the key feature points of the eye, and the key feature of the eyebrows. Wait. When recognizing the multiple feature points, key feature points can be identified based on the trained neural network model. It is not limited to the above methods, and adaptive selection and adjustment of feature point recognition methods can be carried out.

In an optional implementation manner, the face region dividing module may also perform key feature point recognition on the face image to obtain multiple key feature points used to determine the facial expression index. For example, the key feature points are the key feature points of the mouth, the key feature points of the eye, and the key feature points of the eyebrows. The neural network model obtained by training can be used to process the face image, for example, the face image is input to the trained nerve Processed in the network model, the key feature points of the mouth, the key features of the eyes, and the key feature points of the eyebrows in the multi-frame face image are obtained.

Optionally, the face region dividing module can also divide the face image to obtain multiple regions of the face image according to the region of the reference facial organs, and extract the key feature points of the face images in the multiple regions to obtain multiple regions. Multiple key feature points included in each area. For example, referring to facial organs including mouth, eyes, and eyebrows, three areas of a face image can be obtained, and the images of the three areas can be input into the mouth key feature point detection model, the eye key feature point detection model, and the eyebrows. The key feature point detection model obtains multiple key feature points contained in multiple regions.

The expression index determination module is configured to determine the key feature points contained in multiple areas for each frame of the picture, determine the expression scores corresponding to the multiple areas, and determine the person in each frame of the picture according to the expression scores corresponding to the multiple areas Face expression index.

In an optional implementation manner, the expression index determination module includes: determining at least one included angle between the lines of key feature points in each region, and determining the expression score corresponding to each region according to the at least one included angle; Determine the weight corresponding to each area; determine the facial expression index of each frame of the picture according to the expression scores corresponding to multiple areas and the weights corresponding to multiple areas. For example, when calculating at least one included angle and the area where it is located, each area corresponds to a weight, and each area has a different weight. The sum of the weights of all areas is 1, each area contains at least one included angle, and each included angle Corresponding to an expression score (for example, a percentile system), the included angle and area are weighted and calculated to obtain the facial expression index. Since there are many feature points in the face area, calculating the angle between the two feature points will increase the amount of calculation. After selecting the key feature points for weighted calculation of the facial expression index, you can directly calculate these The key feature points calculate the angle of the connection line, which reduces the amount of calculation. Among them, there can be many lines between the key feature points of an area, and you can select the target line to calculate the included angle, such as the angle between the lines of adjacent key feature points, the key feature points at both ends and the middle key feature point The angle of the connection, etc. In this way, on the basis of ensuring the accuracy of the facial expression index, the amount of calculation can be reduced and the processing efficiency can be improved.

In an optional embodiment, the expression index determination module may also determine contour information of facial organs contained in multiple regions based on key feature points contained in multiple regions, and based on the contour information of facial organs contained in multiple regions , Respectively determine the expression scores corresponding to multiple regions.

The facial expression spectrogram determining module is configured to obtain the facial expression spectrogram corresponding to the picture stream according to the facial expression indexes of all pictures. The facial expression spectrogram determination module obtains the facial expression index corresponding to each frame of picture after weighted calculation of the facial expression index, and then obtains the facial expression spectrogram corresponding to the picture stream, as shown in FIG. 3.

The facial expression reference module is configured to determine a reference line corresponding to the human face in a natural state according to the facial expression spectrogram, and determine the natural emotional region of the human face in the natural state based on the reference line.

In an optional implementation manner, the expression reference module includes a frequency interval determination module, a reference line determination module, and a natural emotion region determination module.

The frequency interval determination module is configured to determine the first interval in which the facial expression index appears most frequently in the facial expression spectrogram.

People are in a natural state most of the time in the process of receiving services, and the horizontal line at the point with the highest frequency can be used as the baseline. However, the value of each point cannot be exactly the same, so it is necessary to find a reference interval (that is, the first interval), that is, the interval with the highest frequency needs to be found. For example, an interval width may be preset, and according to the interval width, the first interval in which the facial expression index appears most frequently in the facial expression spectrogram is determined.

The reference line determination module is configured to determine the reference line corresponding to the human face in a natural state according to the first interval.

In an optional implementation manner, the baseline determination module includes:

Determine the horizontal centerline of the first interval;

If the facial expression index corresponding to the horizontal center line is greater than a first threshold and less than a second threshold, the horizontal center line is determined as the reference line corresponding to the human face in a natural state;

If the facial expression index corresponding to the horizontal center line is less than or equal to the first threshold, determining the horizontal line corresponding to the first threshold as the reference line corresponding to the human face in a natural state;

As shown in Figure 4, for example, you can set the interval width to 20. In the process of determining the baseline, use this interval to scan the facial expression spectrogram from bottom to top to find the interval with the highest frequency. The baseline is the interval Horizontal center. The schematic diagram of the baseline is shown in Figure 3.

The natural emotion area determination module is configured to take the reference line as the center, and use the second interval within the set width range of the upper and lower sides of the reference line as the natural emotion area of the human face in the natural state.

As shown in FIG. 4, after the baseline is obtained, for example, an area of 15 up and down and a total width of 30 can be used to represent the expression in a natural state, that is, the natural emotional area of a human face in a natural state. The width of the natural emotion zone can be adjusted adaptively and is not limited to the above values.

The expression partition module is configured to use the natural emotional region as a reference to divide the facial expression spectrogram into multiple emotional regions corresponding to different expressions. As shown in FIG. 5, in the human facial expression spectrogram, the area above the natural emotion area can be determined as a positive emotion area, and the area below the natural emotion area can be determined as a negative emotion area. Through the reference area, the natural emotion area, the user's emotions can be stratified, avoiding positive or negative emotions throughout the process.

The facial expression satisfaction analysis system according to the embodiment of the present disclosure adopts the aforementioned facial expression analysis system, and the difference lies in that it also includes a satisfaction calculation module. The satisfaction calculation module is configured to analyze and calculate multiple emotional regions corresponding to different expressions in the facial expression spectrogram within each time period in the facial expression video clip to be analyzed to determine the user's satisfaction.

In an optional implementation manner, the satisfaction calculation module includes:

The time period expression percentage calculation module is configured to divide the facial expression video segment to be analyzed into multiple time periods, and calculate the proportions of different expressions in the multiple time periods according to the facial expression spectrogram.

The time period weight determination module is configured to determine the weights corresponding to multiple time periods.

The satisfaction result calculation module is configured to determine the satisfaction result according to the proportions of different expressions in multiple time periods and the weights corresponding to the multiple time periods.

Since the satisfaction of "cry and laugh and walk" is definitely higher than that of "laugh and cry and walk", the weight of different emotions in time needs to be designed to be different. For example, the weights can be set as: the first 20% of the time emotions account for 10% of the weight, the last 10% emotions account for 60% of the weight, and the middle part account for the remaining 30% of the weight. The time weight can be adjusted appropriately. After acquiring each emotional region according to the aforementioned expression partitioning module, the proportion of each emotional region in the facial expression spectrogram in different time periods is separately counted. In each time period, determine the corresponding weight of each emotion area according to the proportion of each emotion area in the facial expression spectrogram, and then determine the weight of each time period and the weight of each emotion area in each time period. The weight is weighted and calculated to obtain the user's satisfaction. User satisfaction can be obtained by subtracting the weighted value of negative sentiment from the weighted value of positive sentiment. By dividing the weight of the time period of the user's video clip, and combining the proportion of the emotional area in each time period, comprehensively considering the user's emotional type and time weight, the user's satisfaction can be determined more accurately.

Optionally, the satisfaction result can be determined according to the satisfaction coefficient. For example, when the satisfaction coefficient is greater than or equal to the satisfaction threshold, the satisfaction result is satisfactory. When the satisfaction coefficient is less than or equal to the dissatisfaction threshold, the satisfaction result is unsatisfactory. When the satisfaction coefficient is greater than the dissatisfaction threshold and less than the satisfaction threshold, the satisfaction result is general. For example, it can be set higher than 0.1 as satisfactory, lower than -0.1 as unsatisfactory, and between -0.1-0.1 as normal. Among them, both the satisfactory threshold and the unsatisfactory threshold can be adjusted adaptively.

The facial expression analysis method and system and the facial expression satisfaction analysis method and system provided in the present disclosure utilize the complete video information of user expressions, fully consider the fluctuations of facial expressions, and fully consider the naturalness of the user. State differences, based on frequency analysis to obtain the baseline corresponding to the natural state, it is possible to determine the user’s true emotions. Set the reference interval corresponding to the natural state according to the user's reference line to avoid the situation of positive or negative expressions throughout the process. The user’s emotions are stratified through the reference area, and the time period of the user’s video clips are weighted. The user’s emotion type and time weight are comprehensively considered, and the user’s satisfaction can be determined more accurately.

The present disclosure also relates to an electronic device, including a server, a terminal, and the like. The electronic device includes: at least one processor; a memory that is communicatively connected with the at least one processor; and a communication component that is communicatively connected to the memory, and the communication component receives and sends data under the control of the processor; wherein the memory stores data The instructions are executed by at least one processor, and the instructions are executed by at least one processor to implement the facial expression analysis method and the facial expression satisfaction analysis method in the foregoing embodiments.

In an optional implementation manner, the memory, as a non-volatile computer-readable storage medium, can be configured to store non-volatile software programs, non-volatile computer-executable programs, and modules. The processor executes multiple functional applications and data processing of the device by running non-volatile software programs, instructions, and modules stored in the memory, that is, realizing the aforementioned facial expression analysis method and facial expression satisfaction analysis method.

The memory may include a program storage area and a data storage area, where the program storage area can store an operating system and an application program required by at least one function; the data storage area can store a list of options and the like. In addition, the memory may include a high-speed random access memory, and may also include a non-volatile memory, such as at least one magnetic disk storage device, a flash memory device, or other non-volatile solid-state storage devices. In some embodiments, the memory may optionally include a memory remotely arranged with respect to the processor, and these remote memories may be connected to an external device through a network. Examples of the aforementioned networks include the Internet, corporate intranets, local area networks, mobile communication networks, and combinations thereof.

One or more modules are stored in the memory, and when executed by one or more processors, the facial expression analysis method and the facial expression satisfaction analysis method in any of the foregoing embodiments are executed.

The above products can execute the facial expression analysis method and the facial expression satisfaction analysis method provided by the embodiments of this application, and have the corresponding functional modules and effects of the execution method. For technical details not described in this embodiment, please refer to this application The facial expression analysis method and the facial expression satisfaction analysis method provided by the embodiment.

The present disclosure also relates to a computer-readable storage medium for storing a computer-readable program for a computer to execute part or all of the above-mentioned facial expression analysis method and facial expression satisfaction analysis method.

All or part of the steps in the method of the foregoing embodiments can be completed by instructing relevant hardware through a program. The program is stored in a storage medium and includes multiple instructions to enable a device (may be a single-chip microcomputer, a chip, etc.) or A processor (processor) executes all or part of the steps of the methods described in the multiple embodiments of the present application. The aforementioned storage media include: Universal Serial Bus flash disk (USB flash disk), mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory) , Magnetic disks or optical disks and other media that can store program codes.

In the instructions provided here, a lot of details are explained. The embodiments of the present disclosure can be implemented without these details. In some instances, well-known methods, structures and technologies are not shown, so as not to obscure the understanding of this specification.

Although some embodiments described herein include some features included in other embodiments but not others, the combination of features of different embodiments means to be within the scope of the present disclosure and form different embodiments. For example, in the claims, any one of the claimed embodiments can be used in any combination.

Claims

A method for analyzing facial expressions, including:

Acquiring a face expression video clip to be analyzed, and acquiring a picture stream in the face expression video clip;

Determine the face expression spectrogram corresponding to the picture stream according to the facial expression index of each frame of the picture in the picture stream;

Determine the reference line corresponding to the human face in a natural state according to the human face expression spectrogram, and determine the natural emotional area of the human face in the natural state based on the reference line;

Using the natural emotional region as a reference, the human facial expression spectrogram is divided into multiple emotional regions corresponding to different expressions.
The method according to claim 1, wherein the picture stream in the facial expression video clip is obtained by frame-by-frame, fixed-interval frame extraction or key frame extraction.
The method according to claim 2, wherein said acquiring the picture stream in said facial expression video segment comprises: dividing said facial expression video segment to obtain multiple video sub-segments, and extracting randomly or fixedly At least one frame in each video sub-segment;

According to the extracted multiple picture frames, the picture stream in the facial expression video segment is determined.
The method according to claim 1, further comprising: performing face detection on each frame of the picture stream, and obtaining a face image in each frame of the picture.
The method according to claim 4, wherein the determining the face expression spectrogram corresponding to the picture stream according to the facial expression index of each frame of the picture stream comprises:

Dividing the face image into a plurality of regions, and each region contains a plurality of key feature points for determining the facial expression index;

Determine the key feature points contained in the multiple regions for each frame, determine the expression scores corresponding to the multiple regions, and determine the facial expressions of each frame according to the expression scores corresponding to the multiple regions index;

Obtain the face expression spectrogram corresponding to the picture stream according to the face expression index of all the pictures.
The method of claim 5, wherein the dividing the face image into multiple regions comprises:

Performing face feature point recognition on the face image to obtain multiple feature points of the face image;

Identifying multiple key feature points for determining the facial expression index from the multiple feature points;

According to the multiple key feature points, the face image is divided into multiple regions.
The method according to claim 5, wherein the key feature points contained in the multiple regions are determined for each frame of the screen, the expression scores corresponding to the multiple regions are determined, and the corresponding expression scores are based on the multiple regions. The expression score of, determines the facial expression index of each frame, including:

Determine at least one included angle between the lines of key feature points in each area, and determine the expression score corresponding to each area according to the at least one included angle;

Determine the weight corresponding to each area;

According to the expression scores corresponding to the multiple regions and the weights corresponding to the multiple regions, the facial expression index of each frame is determined.
The method according to claim 1, wherein said determining a reference line corresponding to a human face in a natural state according to said facial expression spectrogram, and determining that said human face is in said natural state based on said reference line. Natural emotional areas in the state, including:

In the face expression spectrogram, determine the first interval in which the face expression index appears most frequently;

Determine the reference line corresponding to the human face in the natural state according to the first interval;

With the reference line as a center, a second interval within a set width range of the upper and lower sides of the reference line is taken as the natural emotional area of the human face in the natural state.
8. The method according to claim 8, wherein the determining the reference line corresponding to the human face in the natural state according to the first interval comprises:

Determine the horizontal centerline of the first interval;

In a case where the facial expression index corresponding to the horizontal center line is greater than a first threshold and less than a second threshold, determining the horizontal center line as the reference line corresponding to the human face in the natural state;

In the case that the facial expression index corresponding to the horizontal center line is less than or equal to the first threshold, the horizontal line corresponding to the first threshold is determined as the reference line corresponding to the human face in the natural state ；

In the case that the facial expression index corresponding to the horizontal center line is greater than or equal to the second threshold, the horizontal line corresponding to the second threshold is determined as the reference line corresponding to the human face in the natural state .
The method according to claim 1, wherein the dividing the human facial expression spectrogram into a plurality of emotional regions corresponding to different expressions based on the natural emotional region includes: In the figure, the area above the natural emotion area is determined as a positive emotion area, and the area below the natural emotion area is determined as a negative emotion area.
A method for analyzing facial expression satisfaction, after adopting the method for analyzing facial expression according to any one of claims 1-10, further comprising: each time period in the facial expression video clip to be analyzed Inside, the multiple emotional regions corresponding to different expressions in the facial expression spectrogram are analyzed and calculated to determine the user's satisfaction.
The method of claim 11, wherein, in each time period in the facial expression video clip to be analyzed, multiple emotional regions corresponding to different expressions in the facial expression spectrogram are analyzed and calculated , To determine user satisfaction, including:

Divide the face expression video segment to be analyzed into multiple time periods, and calculate the proportions of different expressions in the multiple time periods according to the face expression spectrogram;

Determine the weights corresponding to multiple time periods;

The satisfaction result is determined according to the proportions of different expressions in the multiple time periods and the weights corresponding to the multiple time periods.
A facial expression analysis system, including:

The picture acquisition module is configured to acquire a video segment of the facial expression to be analyzed, and acquire the picture stream in the facial expression video segment;

An expression spectrum module, configured to determine the facial expression spectrum map corresponding to the picture stream according to the facial expression index of each frame of the picture stream;

An expression reference module, configured to determine a reference line corresponding to a human face in a natural state according to the facial expression spectrogram, and determine the natural emotional region of the human face in the natural state based on the reference line;

The expression partition module is set to divide the facial expression spectrogram into a plurality of emotional regions corresponding to different expressions based on the natural emotional region.
A facial expression satisfaction analysis system based on the facial expression analysis system according to claim 13, comprising:

The picture acquisition module is configured to acquire a video segment of the facial expression to be analyzed, and acquire the picture stream in the facial expression video segment;

An expression spectrum module, configured to determine the facial expression spectrum map corresponding to the picture stream according to the facial expression index of each frame of the picture stream;

An expression reference module, configured to determine a reference line corresponding to a human face in a natural state according to the facial expression spectrogram, and determine the natural emotional region of the human face in the natural state based on the reference line;

An expression partition module, configured to divide the facial expression spectrogram into a plurality of emotional regions corresponding to different expressions based on the natural emotional region;

The satisfaction calculation module is set to analyze and calculate multiple emotional regions corresponding to different expressions in the facial expression spectrogram within each time period in the facial expression video clip to be analyzed to determine user satisfaction .
An electronic device, comprising a memory and a processor, the memory is configured to store one or more computer instructions, wherein the one or more computer instructions are executed by the processor to implement any one of claims 1-12 The method described in one item.
A computer-readable storage medium storing a computer program, the computer program being executed by a processor to implement the method according to any one of claims 1-12.