WO2021244352A1

WO2021244352A1 - Method and apparatus for determining local area that affects degree of facial aging

Info

Publication number: WO2021244352A1
Application number: PCT/CN2021/095753
Authority: WO
Inventors: 汪思佳; 王馥迪; 杜思源
Original assignee: 中国科学院上海营养与健康研究所
Priority date: 2020-06-05
Filing date: 2021-05-25
Publication date: 2021-12-09
Also published as: CN113761985A

Abstract

The present application relates to facial aging prediction technology. Disclosed are a method and apparatus for determining a local area that affects the degree of facial aging. By using the method and the apparatus, the degree of impact of a local facial area on face aging can be accurately evaluated. The method comprises: acquiring a first facial image of an object; inputting the first facial image into an apparent age prediction model for human faces, to obtain a first apparent age; performing image processing on the first facial image, and changing a predetermined number of pixel points and/or predetermined areas in the first facial image, so as to obtain a second facial image; and according to the first apparent age, the apparent age prediction model and the second facial image, determining the degree of impact of the changed pixel points and/or areas in the second facial image on the first apparent age.

Description

Method and device for determining the local area that affects the degree of facial aging

Technical field

This application relates to facial aging prediction technology, in particular to the technology of determining the local area that affects the degree of facial aging.

Background technique

Facial aging refers to a complex biological process in which facial morphology and structure change over time. With the improvement of the quality of life, people are increasingly concerned about facial aging. Facial aging is not only a criterion for judging health in the field of biomedicine, but also a general concern of society. Therefore, accurately predicting the facial aging area, so as to help individuals delay or improve the facial aging situation, has extremely important research and application value.

Summary of the invention

The purpose of this application is to provide a method and device for determining the local area that affects the degree of facial aging, which can accurately assess the degree of influence of the local area of the face on facial aging.

This application discloses a method for determining the local area that affects the degree of facial aging, including:

Acquiring a first facial image of an object;

Inputting the first facial image into the apparent age prediction model of the human face to obtain the first apparent age;

Performing image processing on the first facial image, and changing a predetermined number of pixels and/or a predetermined area in the first facial image to obtain a second facial image;

According to the first apparent age, the apparent age prediction model, and the second facial image, determine the effect of changed pixels and/or regions in the second facial image on the first apparent age degree.

In a preferred example, the image processing adopts a method selected from the following group:

Pixel derivation method, area masking method, or a combination thereof.

In a preferred example, the predetermined number of pixels are all pixels of the first facial image.

In a preferred example, the predetermined area is selected from one or more of the following group:

Eye area, cheek area, mouth area, forehead area.

In a preferred example, the image processing adopts a pixel derivation method;

The performing image processing on the first facial image and changing a predetermined number of pixels and/or predetermined areas in the first facial image to obtain a second facial image further includes:

Performing image processing on the first facial image by using a pixel derivation method, and adding Gaussian noise to the predetermined number of pixels to obtain a second facial image;

According to the first apparent age, the apparent age prediction model, and the second facial image, determine the effect of changed pixels and/or regions in the second facial image on the first apparent age The degree further includes:

Use the apparent age prediction model to derive the second facial image to obtain the derivative value corresponding to each pixel on the second facial image, and calculate the changed pixel based on the derivative value of each pixel for the first The degree of influence of apparent age.

In a preferred example, the use of the apparent age prediction model to derive the second facial image to obtain a derivative value corresponding to each pixel on the second facial image, and calculation takes place based on the derivative value of each pixel After the degree of influence of the changed pixels on the first apparent age, it also includes:

Dividing the first facial image into a plurality of partial regions, and separately counting the sum of the derivative values of all pixels in each partial region as the weight coefficient of the influence of the partial region on the overall facial aging;

Based on the influence weight coefficient of each partial area, mark each partial area on the first facial image to obtain a third facial image.

In a preferred example, the image processing adopts a region masking method;

The performing image processing on the first facial image and changing a predetermined number of pixels and/or a predetermined area in the first facial image to obtain a second facial image further includes:

Performing image processing on the first facial image by using an area covering method, and covering the predetermined area with an average pixel value of the first facial image to obtain a second facial image;

The determining, according to the first apparent age, the apparent age prediction model, and the second facial image, that the changed pixels and/or regions in the second facial image are relative to the first apparent age The degree of influence further includes:

Inputting the second facial image into the apparent age prediction model to obtain a second apparent age;

The second apparent age and the first apparent age are compared, and the degree of influence of the predetermined area on the apparent age of the human face is calculated based on the comparison result.

In a preferred example, the image processing of the first facial image by using a region masking method, and covering the predetermined region with the average pixel value of the first facial image to obtain the second facial image further includes:

Divide the first facial image into multiple partial regions, use the region masking method to perform image processing on the first facial image, and sequentially cover each partial region with the average pixel value of the first facial image to Obtain a corresponding second facial image covering each local area;

Said dividing the first facial image into a plurality of partial regions, performing image processing on the first facial image by using a region masking method, and sequentially covering each partial region with the average pixel value of the first facial image , After obtaining the corresponding second facial image covering each local area, it also includes:

Separately counting the difference between the second apparent age and the first apparent age corresponding to each local area as the weight coefficient of the influence of the local area on the overall facial aging;

In a preferred example, the first facial image is divided into a plurality of local regions, and the region masking method is used to perform image processing on the first facial image, and the pixels of the first facial image are sequentially used. Covering each local area by means to obtain a corresponding second facial image covering each local area further includes:

The first facial image is divided into four local areas of eye area, cheek area, mouth area, and forehead area, and image processing is performed on the first facial image by using the area masking method. The pixel average of the first facial image covers each local area to obtain a corresponding second facial image that covers each local area.

In a preferred example, the apparent age prediction model is obtained by a method including the following steps:

Use perception experiments to quantify the age distribution, age mean or age median of facial sample images as deep learning training labels to obtain a training sample set; and

The convolutional neural network model is trained with the training sample set to obtain the apparent age prediction model.

In a preferred example, the convolutional neural network model is a ResNet18 model.

The application also discloses a device for determining the local area that affects the degree of facial aging, including:

An image acquisition module for acquiring a first facial image of an object;

An image processing module, configured to perform image processing on the first facial image, and change a predetermined number of pixels and/or a predetermined area in the first facial image to obtain a second facial image;

An age prediction module, configured to input the first facial image into an apparent age prediction model of a human face to obtain the first apparent age;

The influence degree determination module is configured to determine, according to the first apparent age, the apparent age prediction model, and the second facial image, that the changed pixels and/or regions in the second facial image are relevant to the The degree of influence of the first apparent age.

Memory for storing computer executable instructions; and,

The processor is used to implement the steps in the method described above when executing the computer-executable instructions.

The present application also discloses a computer-readable storage medium, wherein the computer-readable storage medium stores computer-executable instructions, and when the computer-executable instructions are executed by a processor, the steps in the method described above are implemented.

In the implementation of this application, at least the following advantages and beneficial effects are included:

The facial perception experimental method is selected to quantify the facial aging phenotype as a whole, and combined with deep learning and visualization methods, to locate the main affected areas of the overall facial aging, which can objectively and effectively evaluate the degree of impact of facial local areas on facial aging.

Furthermore, visualization methods such as pixel derivation method and/or area masking method can more accurately and objectively locate the facial area that affects facial aging, and provide a more scientific basis for decision-making in the fields of medical treatment and cosmetology.

Further, using deep learning models such as ResNet18, using age distribution as training labels, and accurately simulating humans to perform age perception experiments to quantify the overall facial aging phenotype, and more quickly and effectively assess the impact of local facial areas on facial aging.

A large number of technical features are recorded in the specification of this application, which are distributed in various technical solutions. If all possible combinations of technical features (ie, technical solutions) of this application are to be listed, the specification will be too long. In order to avoid this problem, the various technical features disclosed in the above invention content of this application, the various technical features disclosed in the various embodiments and examples below, and the various technical features disclosed in the drawings can be freely combined with each other to form Various new technical solutions (these technical solutions are deemed to have been recorded in this specification), unless such a combination of technical features is technically infeasible. For example, in one example, the feature A+B+C is disclosed, and in another example, the feature A+B+D+E is disclosed, and the features C and D are equivalent technical means that play the same role. Technically, just choose It can be used once and cannot be used at the same time. Feature E can be combined with feature C technically, then the A+B+C+D solution should not be regarded as recorded because it is technically infeasible, and A+B+ The C+E plan should be deemed to have been documented.

Description of the drawings

Fig. 1 is a schematic flowchart of a method for determining a local area that affects the degree of facial aging according to a first embodiment of the present application.

Fig. 2 shows a schematic diagram of the preprocessing flow of facial image data in an embodiment of the present application. Among them, Figure A is a schematic diagram of the identification of 106 facial key points in the facial area; Figure B is a schematic diagram of calculating the position of the central axis of the face using a regression model; Figure C is a schematic diagram of rotating the facial image according to the tilt angle to align the face vertically Figure D and Figure E are the schematic diagrams of the interception of the two pictures according to the mandibular point, left and right cheek points and upper forehead point.

Fig. 3 shows a schematic diagram of facial area division according to an embodiment of the present application.

FIG. 4 shows a schematic diagram of the variation curve of the variance of 1000 samples with the number of evaluators in an embodiment of the present application.

Fig. 5 shows a schematic diagram of the deep learning, visualization and verification process in an embodiment of the present application.

Figure 6 shows a schematic diagram of the ResNet18 network structure in an embodiment of the present application. Among them, Figure a is the basic module of the residual network, which establishes a shortcut link from input to output; Figure b is the network structure of ResNet18, in which the dotted line refers to the feature doubled.

FIG. 7 shows a schematic diagram of a comparison of training effects using three different deep learning models and three different training tags in an embodiment of the present application.

FIG. 8 shows a schematic diagram of the division and visualization of the facial area in an embodiment of the present application. Among them, Figure a is a schematic diagram of the division of the facial area, showing the four regions of the face; Figure b is a schematic diagram of the second facial image obtained based on the pixel derivation method; Figure d is the result of the pixel derivation of Figure b being counted to four The heat map of the aging degree of each part; Figure c is the heat map of the aging degree of the four parts obtained by the result of the area covering method.

FIG. 9 shows a schematic flow chart of using the pixel derivation method in an embodiment of the present application.

FIG. 10 shows a schematic diagram of the process of adopting the area covering method in an embodiment of the present application.

Figure 11 shows a schematic diagram of the consistency check in an embodiment of the present application. Among them, Figure A is an example of deep learning ranking results, and the numbers represent the order of importance of different regions; Figure BD shows the ranking results of eye movement experiments (or manual evaluation), and the bold dashed box is the main contrast area, where Figure B represents The order of the four regions is exactly the same. Figure C shows that only the most important regions are consistent, and Figure D shows that the most important regions are consistent with the less important regions.

FIG. 12 shows a schematic diagram of the structure of the device for determining the local area that affects the degree of facial aging according to the second embodiment of the present application.

detailed description

In the following description, many technical details are proposed for the reader to better understand this application. However, those of ordinary skill in the art can understand that even without these technical details and various changes and modifications based on the following embodiments, the technical solution claimed in this application can be realized.

Term explanation:

Visualization: Display the basis of neural network decision-making in the form of images or pictures.

In order to make the objectives, technical solutions, and advantages of the present application clearer, the implementation manners of the present application will be described in further detail below in conjunction with the accompanying drawings.

The first embodiment of the present application relates to a method for determining the local area that affects the degree of facial aging. The process is shown in Figure 1. The method includes the following steps:

In step 101, a first facial image of an object is acquired;

In step 102, the first facial image is input into the apparent age prediction model of the human face to obtain the first apparent age;

In step 103, image processing is performed on the first facial image, and a predetermined number of pixels and/or predetermined regions in the first facial image are changed to obtain a second facial image;

In step 104, according to the first apparent age, the apparent age prediction model, and the second facial image, determine the degree of influence of the changed pixels and/or regions in the second facial image on the first apparent age .

Optionally, the first facial image in step 101 may be a whole facial image or a partial facial image.

Optionally, before step 102, the apparent age prediction model is obtained in advance through the following steps ① and ②: In step ①, the perception experiment is used to quantify the age distribution, average age or median age of the facial sample image as deep learning Train the label to obtain the training sample set; then perform step ②, use the training sample set to train the convolutional neural network model to obtain the apparent age prediction model.

Preferably, in this step ①, the age distribution is used as the training label.

Optionally, the convolutional neural network model may be, but is not limited to, a VGG 16 model, a ResNet 18 model, or a ResNet 50. Preferably, the convolutional neural network model is a ResNet18 model.

Optionally, in step 103, the image processing may adopt a pixel derivation method, an area covering method, or a combination thereof.

Optionally, the predetermined number of pixels are all pixels of the first facial image.

Optionally, the predetermined area is a sub-area of m1 pixels×m2 pixels, and m1 and m2 are each independently a positive integer of 1-1000, preferably 2-500, more preferably 3-250, and most preferably 5- 100. Optionally, the predetermined area is 0.01%-25% of the entire facial area, preferably 0.1-10%, more preferably 1-5%.

Preferably, the predetermined area is selected from one or more of the following group: eye area, cheek area, mouth area, and forehead area.

In an embodiment, step 103 can be further implemented as the following steps: image processing is performed on the first facial image by using a pixel derivation method, and Gaussian noise is added to the predetermined number of pixels to obtain a second facial image For example, random Gaussian noise can be added to all pixels of the first face image, but it is not limited to this. Further, this step 104 can be further implemented as the following steps: use the apparent age prediction model to derivate the second facial image to obtain a derivative value corresponding to each pixel on the second facial image, and based on the derivative of each pixel Numerically calculate the degree of influence of the changed pixels on the first apparent age.

Optionally, the above "use the apparent age prediction model to derive the second facial image to obtain the derivative value corresponding to each pixel on the second facial image, and calculate the changed pixel based on the derivative value of each pixel After "the degree of influence of the first apparent age", the following steps ① and ② are included: In step ①, the first facial image is divided into multiple local areas, and all the pixels in each local area are counted separately The sum of the derivative values of is used as the weight coefficient of the influence of the local area on the overall facial aging; and in step ②, based on the influence weight coefficient of each local area, on the first facial image for each local area Make an annotation to obtain a third facial image.

In another embodiment, step 103 can be further implemented as the following steps: image processing the first facial image by using an area masking method, and covering the predetermined area with the average pixel value of the first facial image to obtain the first facial image Two facial images. Further, this step 104 can be further implemented as the following step: input the second facial image into the apparent age prediction model of the face to obtain the second apparent age, and compare the second apparent age with the first apparent age. The apparent age, and the degree of influence of the preset area on the apparent age of the human face is calculated based on the comparison result.

Optionally, the above-mentioned "adopting an area covering method to perform image processing on the first facial image, and using the average pixel value of the first facial image to cover the predetermined area to obtain a second facial image" is further implemented as: A facial image is divided into a plurality of local areas, the first facial image is processed by the area masking method, and each local area is sequentially covered with the pixel average of the first facial image to obtain the corresponding cover each The second facial image of the local area. For example, the first facial image can be divided into four local areas of eye area, cheek area, mouth area, and forehead area, and image processing is performed on the first facial image by using the area masking method. The pixel average of the first facial image covers each local area to obtain a corresponding second facial image that covers each local area.

Optionally, the above "divide the first facial image into multiple partial regions, use the region masking method to perform image processing on the first facial image, and sequentially cover each partial region with the average pixel value of the first facial image Area to obtain the corresponding second facial image covering each local area", it also includes the following steps ① and ②: In step ①, the second apparent age and the first age corresponding to each local area are counted separately. The difference in apparent age is used as the weight coefficient of the influence of the local area on the overall facial aging; and in step ②, based on the influence weight coefficient of each local area, on the first facial image for each local area Make an annotation to obtain a third facial image.

The present invention will be further explained below in conjunction with specific embodiments. It should be understood that this embodiment is only used to illustrate the present invention and not to limit the scope of the present invention.

In this embodiment, a deep learning network such as ResNet 18 is used to build a facial aging evaluation system, and a deep learning visualization method such as pixel derivation or facial masking is used to locate the main areas of facial aging. The specific plan is as follows:

One, data collection

In order to perform a facial perception experiment, this example needs to remove some information including the subject's hairstyle and clothing. According to the experimental requirements, an automated process is designed for image preprocessing, as shown in Figure 2.

First, the face++ software was selected to identify 106 key points of the face in the face area (Figure 2A, https://www.faceplusplus.com/). Then, according to the position coordinates of the eyes, nose and mouth in the key points, the regression model is used to calculate the position of the central axis of the face (Figure 2B, the red solid line), and the vertical line between the central axis and the numerical direction is calculated at the same time (Figure 2B) , The red dashed line) of the inclination angle. According to this, the face picture is rotated according to the tilt angle, so as to align the face vertically (Figure 2C). Finally, we intercepted the pictures according to the jaw point, left and right cheek points and upper forehead point. The final results of the interception are shown in Figure 2D and Figure 2E. The captured pictures were used for follow-up experiments and analysis.

2. Manual evaluation of perceived age

Twenty-two reviewers were recruited to evaluate the perceived age of the sample, including 10 male reviewers and 12 female reviewers. In order to reduce the experimental error as much as possible, the evaluator uses a unified display device to evaluate the perceived age. Before the experiment, the evaluator did not know the age of the sample and the age distribution of the data set. In the evaluation, the evaluator needs to observe all 5,768 sample pictures, and then predict the age of each sample and record it. In addition, we selected 1,014 sample photos (500 men and 514 women). For these 1,014 sample photos, the evaluator needs to evaluate the facial area that he focused on when evaluating the age of the sample according to Figure 3, and then tick it in the evaluation form. The evaluation form is shown in Table 1, for example, where multiple areas can be selected.

Table 1

图片编号Picture ID	感知年龄Perceived age	眼睛Eye	嘴巴mouth	额头Forehead	脸颊cheek	所有区域All regions
15HanTZ0005TB1_F15HanTZ0005TB1_F	To	To	To	To	To	To
15HanTZ0010TB1_F15HanTZ0010TB1_F	To	To	To	To	To	To
15HanTZ0014TB1_F15HanTZ0014TB1_F	To	To	To	To	To	To
15HanTZ0022TB1_F15HanTZ0022TB1_F	To	To	To	To	To	To
15HanTZ0023TB1_F15HanTZ0023TB1_F	To	To	To	To	To	To

Third, the evaluation quality analysis of the evaluator

Use perception experiments to quantify overall facial aging and obtain high-quality deep learning training labels.

In order to verify the reliability of the deep learning training label (perceived age), a simulation analysis of the number of evaluators and evaluation quality was first carried out. We generate simulation parameters based on the real data of 22 evaluators, construct evaluation data containing systematic and random errors, and learn the relationship between the average and standard errors and the number of evaluators to reflect the number of evaluators The relationship with the quality of the evaluation.

First, take the arithmetic mean of the evaluation values of each sample by 22 evaluators as the true value of the sample, subtract the evaluation value of each sample by each evaluator from the true value of the sample, and take the arithmetic mean of the difference as the true value of the sample. The evaluator’s systematic error. Then, the data obtained by subtracting the evaluation value of each sample from the system error by each evaluator is regarded as the random error of the evaluator. We calculate the standard deviation σ _ij of the random error of different evaluators, assuming that the random error obeys N(0,σ _ij ). Subsequently, based on the obtained simulation parameters (parameters such as the true value of the sample, systematic error, random error, etc.), we generated simulation data of 10,000 perceptual age data for 1,000 evaluators. Subsequently, samples were taken from the simulated data according to the number of different evaluators (n _{i = 1-100).} The simulated value of 1000 perceived age was repeatedly sampled for the i-th sampling, so as to calculate the variance of the perceptual age of each i-th sample. Calculated by the following formula (1), X _i represents the i-th sampling, each of the subjects perceived age of analog value, n is selected represents the number of panelists.

Figure 4 shows the variation curve of the variance of 1000 random samples with the number of evaluators. It can be seen from this figure that as the number of evaluators continues to increase, the trend of decreasing variance is gradually slowing down. Derivation of the curve to find the position of the inflection point (n=12), which is the optimal selection method for the number of evaluators. That is, under the premise of ensuring data quality, the minimum number of evaluators is 12, and the more evaluators, the better the data quality. In our perception experiment, there are 22 reviewers, the number far exceeds the optimal solution, which fundamentally guarantees the quality of the perception data.

Fourth, deep learning to get a local aging model

As shown in Figure 5, deep learning uses 5,768 sample photos as training data, uses the perceived age evaluated by the perception experiment to quantify facial aging, as a deep learning training label; then uses deep learning to visualize and locate the main areas that affect facial aging.

(1) Training data set enhancement

Since the training data set has only 5,768 samples, less training data will not only affect the learning of network parameters, but also prone to overfitting. Therefore, before the actual training process, we follow the traditional deep learning data enhancement method to enhance the training data set in two ways: mirroring and tailoring. Mirror enhancement is the image generated by reflecting a picture relative to the Y axis, that is, one original picture generates two mirror pictures; Crop enhancement is to intercept five times on the top left, bottom right, top right, and the center to generate five different area pictures. According to this, the training data set can be expanded tenfold.

(2) Deep learning model selection

The convolutional neural network models VGG 16, ResNet 18 and ResNet 50 are selected for age prediction. Among them, ResNet18 is one of the five main models of the ResNet network structure, which mainly includes three parts: Input, Output and Intermediate Convolution (Stages). One of the main difficulties in training a deep neural network is that as the number of network layers increases, the parameters of the neural network will become difficult to optimize, and the problem of network degradation will occur, that is, the ability of the trained model to fit the data is even low. For models with fewer network layers. Compared with other deep learning models commonly used at present, ResNet establishes a shortcut link from shallow features to deep features by introducing a residual module, and uses neural networks to model the residuals between deep and shallow features instead of deep features In itself, the gradient can be returned more effectively when the back propagation algorithm is used to optimize the network parameters, so the degradation problem of the deep learning network can be better solved. In addition, in order to verify the robustness of ResNet18 (Robust), this project uses VGG16 and ResNet50 to measure the training effect of the samples. When training three kinds of neural networks, the network parameters are initialized using the model pre-trained on ImageNet, the batch size is set to 64, the learning rate is set to 0.001, and a total of 100 epochs are trained.

Figure 6(a) is the basic module of the residual network, which establishes a shortcut link from input to output, where weight layer refers to the weight layer, and relu is an activation function. Figure 6(b) shows the network structure of ResNet18, where the dotted line indicates the feature doubled. The connection part of the solid line: indicates the same channel, such as 3*3conv, 64 means that 64 convolution kernels with a size of 3*3 are used for convolution; the connection part of the dashed line: indicates that the channel is different, and the feature is doubled, such as 3 *3conv,128,/2 means that 128 convolution kernels with a size of 3*3 are used for convolution, and /2 means that the feature layer is doubled compared to the upper layer 64.

During model training, the evaluation result distribution, evaluation median, and evaluation mean of 22 age evaluators were used as labels for deep learning model training. After the model is trained, the error variance between the predicted result and the real result is used as the effect evaluation criterion to select the model.

In addition, the project uses a 10-fold cross-validation method to obtain the prediction data of the training data set, that is, the sample data is divided into 10 parts, and each time 9 parts are used for training to obtain the model to predict the age of the remaining 1 part, and the cycle repeats 10 times. The deep learning model can be used to obtain the perceptual age data predicted by all samples.

(3) Comparison of training effects of deep learning models

Using deep learning methods to simulate humans to evaluate the apparent age of the face, the correlation is as high as 96%. The model evaluation uses three training labels of 22 evaluators' age distribution, age mean and age median, and three deep learning models of VGG 16, ResNet 18, and ResNet 50.

The evaluation results are shown in Table 2 below.

Table 2

Note: The correlation coefficient is calculated by Pearson's correlation coefficient, and the P values in the table are all less than 0.001.

The results show that using the ResNet18 model, the age distribution is used as the training label to present the best results (the average difference is 2.27 years, and the correlation coefficient is 0.96, as shown in Figure 7). Preferably, the ResNet 18 model trained with the age distribution as the training label is used for the subsequent process.

Five, deep learning visual evaluation of facial aging

The pixel derivation method and the area masking method are selected to realize visualization, and the face is divided into four regions: forehead, eyes, mouth and cheeks based on the facial anatomy and 106 facial feature points automatically calibrated by face++ (Figure 8a).

(1) Pixel derivation method

The core idea of the pixel derivation method (SmoothGrad) is to derive the value of the pixel from the predicted perceived age, and use the magnitude of the derivative to measure the importance of each pixel of the picture to the perceived age. Because the neural network is a highly nonlinear mapping, there are usually a small number of pixels with very large derivatives, which brings difficulties to visualization, so random noise is added to the picture, and the results of multiple derivation are averaged to get smoother Visualization results. In order to determine the features that the deep learning model is based on when making predictions, we can create a feature importance mask with the same size as the original image. Its brightness value corresponds to the importance of each pixel. The entire image is called the sensitivity mask. Edition (sensitivity mask). The pixel derivation method adds noise to each pixel separately, and then calculates the derivative of the perceived age to the pixel after noise is added, and uses the derivative to evaluate the importance of each pixel. According to the following formula (2), where n refers to the number of calculations,

Refers to standard error of the Gaussian noise σ, x refers to the original pixel value, M _c sensitivity means:

After obtaining the importance of each pixel, we calculated the average value of the derivative of each area of the face during training, and sorted by its size to quantify the degree of influence of the local area of the face on facial aging. When applying the pixel derivation method, the number of calculations n is set to 10, and the standard deviation σ of Gaussian noise is set to 0.3.

Specifically, as shown in the flowchart shown in Figure 9, in step 901, the first facial image is input to the trained ResNet18 model to output the corresponding first apparent age; in step 902, n copies of the first facial image are copied , And respectively add random Gaussian noise to obtain corresponding n second facial images; in step 903, for each second facial image, calculate the derivative of the trained ResNet18 model for each pixel in the image; in step 904 In step 905, the derivation results of n second facial images are averaged and visualized to obtain the second facial image (as shown in Figure 8b, the brighter the color, the stronger the importance of the point); in step 905, count the parts of the face The sum of the derivatives of the area (mouth, forehead, eyes, cheeks), sort each part according to the corresponding sum value, the higher the sum, the higher the importance and aging degree of the area, and the color shade indicates the degree of aging to each part Mark the area (as shown in Figure 8d, the redder the color, the more aging).

(2) Area covering method

The regional occlusion rule is to occlude each region with the mean value of all pixels in the data set, and evaluate the importance of each region with the prediction difference before and after the occlusion. First, use unoccluded images to train a model for perceptual age estimation, and then calculate the average value of all pixels in the image. For each facial partition of the sample (Figure 8a), the average pixel value is used to cover the four regions, and then The age of the concealed picture is predicted by the network, and then the age difference between the occluded and before the occlusion is obtained. If the age difference is negative, it means that the occluded area will increase the overall aging of the face; on the contrary, it means that the occluded area will reduce the overall aging of the face. The larger the difference, the stronger the degree.

The specific flowchart is shown in Figure 10. In step 1001, the first facial image (unoccluded) is input to the trained ResNet18 model to output the corresponding first apparent age; in step 1002, the first facial image The four local areas (mouth, forehead, eyes, and cheeks) of the image are occluded and filled with the average value of the image to obtain the corresponding second facial image; in step 1003, these four occluded images are input into the trained ResNet18 model to obtain The corresponding second apparent age; in step 1004, the difference between the apparent age prediction results before and after the occlusion is used as the aging degree of each area, and the order is sorted (as shown in Figure 8c, the value corresponding to the color is the value after the occlusion and The predicted age difference before occlusion, the stronger the blue, the more aging, the stronger the red, the younger).

6. Result verification

In order to verify the effectiveness of the visualization results of deep learning, sensory experiment data, eye movement experiment data, and facial aging score data were collected respectively.

(1) Perception experiment data (manual evaluation)

In the perception experiment, for the 1014 selected photos, 22 reviewers were selected to observe the pictures, predict the age and record the facial area on which the judgment was based. Based on this, we counted the number of times that each area was selected. Each area of each sample was checked by several reviewers and counted as a few. This value was used to quantify the degree of influence of facial areas on facial aging.

(2) Eye movement data (machine measurement)

In addition, 20 evaluators were selected to conduct eye movement experiments on the two-dimensional images of the faces of 18 subjects. In the experiment, Eyelink 1000 eye tracker was used to observe and record the ratio of the staying time of the evaluator's gaze point in each facial area (Figure 11) to the total gaze time, to objectively quantify the degree of influence of local facial areas on the overall facial aging evaluation.

(3) Consistency inspection method

In order to evaluate the consistency between different methods (manual evaluation, eye movement experiment and deep learning visualization), this project designed three consistency test indicators based on the numerical characteristics of different methods. Using this indicator, combined with the results of manual evaluation and eye movement experiments, verifies the reliability of the deep learning visualization of this application. The indicators are introduced as follows:

Perfect match rate: The ratio of the number of samples that exactly match the regional importance ranking to the total number of samples in different methods (Figure 11B). This ratio shows that there is complete consistency between the two methods.

Top1 matching rate: The ratio of the number of samples that match each other in the most important area (Figure 11C) to the total number of samples in different methods. This ratio shows that the two methods are completely consistent when looking for the most significant local areas that have the most significant impact on the overall aging of the face.

Top2 matching rate: In different methods, the ratio of the number of samples matching the most important regions (Figure 11C) and the number of samples matching the most important regions (Figure 11D) to the total number of samples in different methods. The higher the ratio, the stronger the consistency between the two methods.

(4) Use deep learning visualization methods to locate the key areas that affect facial aging, and verify the validity of the results

Deep learning visualization uses two methods: pixel derivation and area concealment, and the visualization effect is evaluated according to three evaluation criteria: perfect agreement rate, key matching rate and secondary key matching rate.

The evaluation results are shown in Table 3 below.

table 3

The results show that, compared with the area masking method, the use of pixel derivation obtains better results. Among them, the deep learning-pixel derivation visualization result has the highest coincidence rate with manual evaluation (0.18vs0.13), which is about 38% higher than the area masking method.

In addition, when using deep learning-pixel derivation visualization, the matching rate of Top1 is 0.52, and the matching rate of Top2 reaches 0.89; the matching rate of Top1 with the eye movement experiment is 0.61, and the matching rate of Top2 is 0.85.

The above results indicate that the deep learning visualization has extremely high consistency with the two manual methods, and further verify the reliability of the deep learning visualization method.

The second embodiment of the present application relates to a device for determining the local area that affects the degree of facial aging. Its structure is shown in FIG. 12. The device for determining the local area that affects the degree of facial aging includes:

An image acquisition module for acquiring a first facial image of an object;

An image processing module, configured to perform image processing on the first facial image, change a predetermined number of pixels and/or a predetermined area in the first facial image, to obtain a second facial image;

An age prediction module, configured to input the first facial image into the apparent age prediction model of the human face to obtain the first apparent age;

The influence degree determination module is configured to determine, according to the first apparent age, the apparent age prediction model, and the second facial image, that the changed pixels and/or areas in the second facial image are relative to the first apparent age The degree of influence.

The first embodiment is a method embodiment corresponding to this embodiment. The technical details in the first embodiment can be applied to this embodiment, and the technical details in this embodiment can also be applied to the first embodiment.

It should be noted that those skilled in the art should understand that the implementation functions of the modules shown in the implementation of the device for determining the local area that affects the degree of facial aging can refer to the aforementioned method for determining the local area that affects the degree of facial aging. The relevant description and understanding. The function of each module shown in the above implementation of the device for determining the local area that affects the degree of facial aging can be realized by a program (executable instruction) running on the processor, or can be realized by a specific logic circuit . If the device for determining the local area that affects the degree of facial aging in the embodiment of the present application is implemented in the form of a software function module and sold or used as an independent product, it can also be stored in a computer readable storage medium. Based on this understanding, the technical solutions of the embodiments of the present application can be embodied in the form of a software product in essence or a part that contributes to the prior art. The computer software product is stored in a storage medium and includes several instructions for A computer device (which may be a personal computer, a server, or a network device, etc.) executes all or part of the method in each embodiment of the present application. The aforementioned storage media include: U disk, mobile hard disk, Read Only Memory (ROM, Read Only Memory), magnetic disk or optical disk and other media that can store program codes. In this way, the embodiments of the present application are not limited to any specific combination of hardware and software.

Correspondingly, the embodiments of the present application also provide a computer-readable storage medium in which computer-executable instructions are stored, and when the computer-executable instructions are executed by a processor, each method implementation of the present application is implemented. Computer-readable storage media include permanent and non-permanent, removable and non-removable media, and information storage can be realized by any method or technology. Information can be computer-readable instructions, data structures, program modules, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technology, CD-ROM, digital versatile disc (DVD) or other optical storage, Magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices or any other non-transmission media can be used to store information that can be accessed by computing devices. According to the definition in this article, computer-readable storage media does not include transitory media, such as modulated data signals and carrier waves.

In addition, the embodiments of the present application also provide a device for determining a local area that affects the degree of facial aging, which includes a memory for storing computer-executable instructions, and a processor; the processor is used to execute data in the memory The computer-executable instructions implement the steps in the foregoing method implementation manners. Among them, the processor can be a central processing unit (Central Processing Unit, "CPU"), other general-purpose processors, digital signal processors (Digital Signal Processor, "DSP"), and application specific integrated circuits (Application Specific Integrated Circuits). Integrated Circuit, referred to as "ASIC"), etc. The aforementioned memory may be a read-only memory (read-only memory, "ROM" for short), random access memory (random access memory, "RAM" for short), flash memory (Flash), hard disk or solid state hard disk, etc. The steps of the method disclosed in the various embodiments of the present invention may be directly embodied as being executed and completed by a hardware processor, or executed and completed by a combination of hardware and software modules in the processor.

It should be noted that in the application documents of this patent, relational terms such as first and second are only used to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply these There is any such actual relationship or sequence between entities or operations. Moreover, the terms "including", "including" or any other variants thereof are intended to cover non-exclusive inclusion, so that a process, method, article, or device that includes a series of elements includes not only those elements, but also those that are not explicitly listed Other elements of, or also include elements inherent to this process, method, article or equipment. If there are no more restrictions, the element defined by the sentence "including one" does not exclude the existence of other identical elements in the process, method, article or equipment that includes the element. In the application documents of this patent, if it is mentioned that an act is performed based on a certain element, it means that the act is performed at least based on that element. It includes two situations: performing the act only based on the element, and performing the act based on the element and Other elements perform the behavior. Multiple, multiple, multiple, etc. expressions include 2, 2, 2 and more than 2, 2 or more, and 2 or more.

All documents mentioned in this application are considered to be included in the disclosure of this application as a whole, so that they can be used as a basis for modification when necessary. In addition, it should be understood that the above are only preferred embodiments of this specification, and are not used to limit the protection scope of this specification. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of one or more embodiments of this specification shall be included in the protection scope of one or more embodiments of this specification.

Claims

A method for determining the local area that affects the degree of facial aging, which is characterized in that it includes:

Acquiring a first facial image of an object;

Inputting the first facial image into the apparent age prediction model of the human face to obtain the first apparent age;

Performing image processing on the first facial image, and changing a predetermined number of pixels and/or a predetermined area in the first facial image to obtain a second facial image;

According to the first apparent age, the apparent age prediction model, and the second facial image, determine the effect of changed pixels and/or regions in the second facial image on the first apparent age degree.
The method for determining the local area that affects the degree of facial aging according to claim 1, wherein the image processing adopts a method selected from the following group:

Pixel derivation method, area masking method, or a combination thereof.
The method for determining a local area that affects the degree of facial aging according to claim 1, wherein the predetermined number of pixels are all the pixels of the first facial image.
The method for determining the local area that affects the degree of facial aging according to claim 1, wherein the predetermined area is selected from one or more of the following group:

Eye area, cheek area, mouth area, forehead area.
The method for determining the local area that affects the degree of facial aging according to claim 2, wherein the image processing adopts a pixel derivation method;

The performing image processing on the first facial image and changing a predetermined number of pixels and/or predetermined areas in the first facial image to obtain a second facial image further includes:

Performing image processing on the first facial image by using a pixel derivation method, and adding Gaussian noise to the predetermined number of pixels to obtain a second facial image;

According to the first apparent age, the apparent age prediction model, and the second facial image, determine the effect of changed pixels and/or regions in the second facial image on the first apparent age The degree further includes:

Use the apparent age prediction model to derive the second facial image to obtain the derivative value corresponding to each pixel on the second facial image, and calculate the changed pixel based on the derivative value of each pixel for the first The degree of influence of apparent age.
The method for determining a local area that affects the degree of facial aging according to claim 5, wherein the use of the apparent age prediction model to derive the second facial image to obtain a second facial image The derivative value of each pixel above, after calculating the degree of influence of the changed pixel on the first apparent age based on the derivative value of each pixel, also includes:

Dividing the first facial image into a plurality of partial regions, and separately counting the sum of the derivative values of all pixels in each partial region as the weight coefficient of the influence of the partial region on the overall facial aging;

Based on the influence weight coefficient of each partial area, mark each partial area on the first facial image to obtain a third facial image.
The method for determining the local area that affects the degree of facial aging according to claim 2, wherein the image processing adopts an area masking method;

The performing image processing on the first facial image and changing a predetermined number of pixels and/or a predetermined area in the first facial image to obtain a second facial image further includes:

Performing image processing on the first facial image by using an area covering method, and covering the predetermined area with an average pixel value of the first facial image to obtain a second facial image;

The determining, according to the first apparent age, the apparent age prediction model, and the second facial image, that the changed pixels and/or areas in the second facial image are relative to the first apparent age The degree of influence further includes:

Inputting the second facial image into the apparent age prediction model to obtain a second apparent age;

The second apparent age and the first apparent age are compared, and the degree of influence of the predetermined area on the apparent age of the human face is calculated based on the comparison result.
The method for determining the local area that affects the degree of facial aging according to claim 7, wherein the first facial image is processed by the area masking method, and the first facial image is used for image processing. The average value of pixels covering the predetermined area to obtain the second facial image further includes:

Divide the first facial image into multiple partial regions, use the region masking method to perform image processing on the first facial image, and sequentially cover each partial region with the average pixel value of the first facial image to Obtain a corresponding second facial image covering each local area;

Said dividing the first facial image into a plurality of partial regions, performing image processing on the first facial image by using a region masking method, and sequentially covering each partial region with the average pixel value of the first facial image , After obtaining the corresponding second facial image covering each local area, it also includes:

Separately counting the difference between the second apparent age and the first apparent age corresponding to each local area as the weight coefficient of the influence of the local area on the overall facial aging;

Based on the influence weight coefficient of each partial area, mark each partial area on the first facial image to obtain a third facial image.
The method for determining the local area that affects the degree of facial aging according to claim 8, wherein the first facial image is divided into a plurality of local areas, and the first facial image is covered by a region masking method. Performing image processing on the facial image, and sequentially covering each partial area with the pixel average of the first facial image to obtain a corresponding second facial image covering each partial area further includes:

The first facial image is divided into four local areas of eye area, cheek area, mouth area, and forehead area, and image processing is performed on the first facial image by using the area masking method. The pixel average of the first facial image covers each local area to obtain a corresponding second facial image that covers each local area.
The method for determining the local area that affects the degree of facial aging according to any one of claims 1-9, wherein the apparent age prediction model is obtained by a method including the following steps:

Use perception experiments to quantify the age distribution, age mean or age median of facial sample images as deep learning training labels to obtain a training sample set; and

The convolutional neural network model is trained with the training sample set to obtain the apparent age prediction model.
The method for determining the local area that affects the degree of facial aging according to claim 10, wherein the convolutional neural network model is a ResNet18 model.
A device for determining the local area that affects the degree of facial aging, which is characterized in that it comprises:

An image acquisition module for acquiring a first facial image of an object;

An image processing module, configured to perform image processing on the first facial image, and change a predetermined number of pixels and/or a predetermined area in the first facial image to obtain a second facial image;

An age prediction module, configured to input the first facial image into an apparent age prediction model of a human face to obtain the first apparent age;

The influence degree determination module is configured to determine, according to the first apparent age, the apparent age prediction model, and the second facial image, that the changed pixels and/or regions in the second facial image are relevant to the The degree of influence of the first apparent age.
A device for determining the local area that affects the degree of facial aging, which is characterized in that it comprises:

Memory for storing computer executable instructions; and,

The processor is configured to implement the steps in the method according to any one of claims 1 to 11 when executing the computer-executable instructions.
A computer-readable storage medium, wherein the computer-readable storage medium stores computer-executable instructions, and when the computer-executable instructions are executed by a processor, the computer-readable The steps in the method described.