CN114639138A

CN114639138A - Newborn pain expression recognition method based on generation of confrontation network

Info

Publication number: CN114639138A
Application number: CN202210147904.1A
Authority: CN
Inventors: 潘赟; 赵益晟; 朱怀宇; 陈朔晖
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2022-02-17
Filing date: 2022-02-17
Publication date: 2022-06-17

Abstract

A method for recognizing the pain expression of a newborn based on a generated confrontation network is characterized in that the generated confrontation network is constructed to learn how to recover a non-blocked face image with a correct posture from newborn face images with different postures and blocking; considering hidden variables of generators in the generation confrontation network as modified facial pain features; and constructing a residual error network combined with an attention mechanism to screen and analyze the facial pain characteristics corrected by the zero-sum game so as to further get rid of the influence of shielding and posture change and output an accurate pain level result. The method aims to improve the neonatal pain expression recognition accuracy in a real environment, enhance the robustness of the neonatal pain recognition method to shielding and the adaptability to posture change, optimize the extraction of pain characteristics according to the generation of an antagonistic network, and realize the screening of the pain characteristics through an attention mechanism, thereby effectively solving the problem of the neonatal pain expression recognition in the shielding and posture change environment.

Description

Newborn pain expression recognition method based on generation of confrontation network

Technical Field

The invention relates to the field of neonatal pain recognition, in particular to a neonatal pain recognition method based on facial expressions.

Background

Neonatal pain not only causes physiological reactions, but also causes a series of short-term or long-term adverse reactions, such as growth retardation, permanent central nervous system injury, mood disorders, and even increases the risk of future diseases. Therefore, developing an automated neonatal pain recognition algorithm to achieve continuous, objective pain assessment is of great importance to neonatal pain management and healthy growth. In the aspect of automatic identification of neonatal pain, studies have been made, such as chinese patent application "a method for identifying neonatal pain expression based on a two-channel three-dimensional convolutional neural network" (patent application No. CN201810145292.6, publication No. CN108363979A), "a method and system for identifying neonatal pain expression based on a deep 3D residual network" (patent application No. CN201810346075.3, publication No. CN108596069A), "a method for identifying neonatal pain expression based on a dual-channel convolutional neural network" (patent application No. CN201910748936.5, publication No. CN111401117A), however, existing studies including the above patent applications only consider neonatal pain expression identification in an ideal environment (controlled environment), and these studies have made a breakthrough progress by using a method based on deep learning with respect to an unobstructed and posture-correct neonatal facial image. However, in a real environment (uncontrolled environment), due to the existence of factors of occlusion and variable head postures, great challenges are brought to the neonatal pain expression recognition, and a key problem to be solved at present is how to improve the neonatal pain expression recognition accuracy rate in the real environment and enhance the robustness of the neonatal pain recognition method to occlusion and the adaptability to posture changes.

Disclosure of Invention

In order to overcome the defects of the prior art, the invention provides a method for neonatal expression pain robust to face shielding and posture change in a real scene in consideration of the lack of a method for neonatal expression pain robust to face shielding and posture change at present.

The purpose of the invention can be realized by the following technical scheme:

a method for recognizing a neonatal pain expression based on generation of an antagonistic network, the method comprising the steps of:

step S1: restoring a non-blocked face image with a correct posture from face images of newborns with different postures and possibly blocked according to the generated countermeasure network;

step S2: generating eigenvectors (vector) of generators in the antagonistic network as corrected facial pain features for subsequent pain analysis;

step S3: and screening and analyzing the corrected facial pain features by using a residual error network combined with an attention mechanism so as to further get rid of the interference of shielding and posture change and output an accurate pain level result.

Further, the process of step S1 is:

the generation countermeasure network consists of a generator and a discriminator, wherein the generator is responsible for generating a modified face image on the basis of an input face image; the discriminator is responsible for learning and distinguishing images generated by the generator and ideal face images which are not shielded and have correct postures in the guide set, the guide set g consists of all the ideal face images in the training set, and the generator continuously improves the capability of the generator for converting input images into the non-shielded and correct face images through the zero sum game of the generator and the discriminator; in the process of the zero-sum game, the training of the generator and the discriminator is carried out by four loss functions, the parameters of the generator and the discriminator are adjusted to be optimal through an error back propagation algorithm, and the four loss functions are specifically as follows:

(1) loss of symmetry function

Symmetry is an inherent feature of a normal face, and the symmetry loss is calculated as:

where H and W represent the height and width of the image, (n, m) represent the pixels of the image, |, represents an absolute value. Real-world images may not have absolute symmetry at the pixel level, and therefore, it is decided to minimize the loss of symmetry in laplacian space;

(2) antagonism loss function

The discriminator network acts as a supervisor and is responsible for distinguishing the generated face image from the ideal image and training the face image and the ideal image simultaneously with the generator, and the discriminator is trained by the following cross entropy loss function:

L_GAN-Dis(g_i,x′_j)＝-log(Dis(g_i))-log(1-Dis(x′_j))

wherein g is_iRepresenting guide set image, GAN-Dis representing discriminator, x'_jIs an image generated by the generator;

for the generator, the way the antagonism loss function is computed is:

L_GAN-Gen(x′_j)＝-log(Dis(x′_j))

wherein GAN-Gen represents a generator and Dis represents a discriminator;

(3) identity retention loss function

Preserving identity is a key part of ideal face generation, perception loss is adopted, the perception similarity is kept, so as to help the pain feature correction module to obtain the identity preservation capability, and a loss function is calculated based on feature maps output by the last two layers in the open-source Light CNN:

wherein H_l,W_lIs the height and width of the last l-th layer feature map, Ω represents the feature map, | · | represents the absolute value, the identity retention loss aims to make the generated image and the original image have a smaller distance in the depth feature space, and it is considered that Light CNN can classify thousands of identities after being pre-trainedIt is considered that the most important human face structure or feature can be captured for identity recognition;

(4) total Variation regularization

To improve the spatial smoothness of the generated image and reduce the spike artifacts, a Total Variation regularizer is employed, which is defined as follows:

where H and W represent the height and width of the image, (n, m) represent the pixels of the image, x' is the image generated by the generator, | · | represents an absolute value.

The process of step S3 is:

screening and analyzing facial pain characteristics after zero-sum game correction by using a residual error network combined with an attention mechanism, wherein for the network structure of the part, a conventional residual error structure is adopted as a main body, and the core part is the attention mechanism combined with the main body; an attention branch parallel to a residual error branch is constructed, and based on a bottom-up top-down structure, an attention mask with the same size as a residual error structure feature map can be output to perform soft weighting on facial features in the residual error structure; in the bottom-up top-down structure, "down-sampling" is achieved by a series of convolution operations and pooling operations, while "up-sampling" is achieved by a deconvolution operation; the attention mask output from the attention mechanism can be used as a feature selector in forward propagation and also can be used as a filter in backward gradient updating, and the gradient calculation mode of the facial features under the action of the attention mask is as follows:

where M represents the attention branch, T represents the residual branch, σ represents the attention branch parameter, and φ is the residual branch parameter.

The beneficial effects of the invention are as follows: in a real environment (uncontrolled environment), the face of a newborn is often blocked or changed in posture, so that an invisible face area is generated, which brings a great challenge to the pain recognition of the newborn. The existing method only considers the pain recognition of the face image of the neonate without shielding and with correct posture at present, therefore, a semi-supervised learning mode is used for training and generating how to restore the face image without shielding and with correct posture from the face image of the neonate with different postures and shielding, so that facial pain expression characteristics which are slightly influenced by shielding and posture changes are obtained, namely, eigenvectors (latent vectors) of a generator in the antagonistic network are generated, and in addition, the further screening and filtering of a subsequent attention mechanism are carried out, so that the intrinsic neonatal pain expression information is finally obtained to complete the expression recognition of the neonatal pain, and the problem of the neonatal pain expression recognition under the shielding and posture change environment is effectively solved.

Drawings

Fig. 1 is a flow chart of a method of neonatal pain expression recognition based on generation of an antagonistic network;

FIG. 2 is an illustration of a pain signature correction module in a pain recognition model;

fig. 3 is an illustration of a pain level classification module in a pain recognition model.

Detailed Description

The technical solution of the method of the present invention will be described clearly and completely with reference to the accompanying drawings, and it is obvious that the described embodiments are some, not all embodiments of the present invention. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, shall fall within the scope of protection of the present invention.

Referring to fig. 1 to 3, a method for recognizing a neonatal pain expression based on generation of an antagonistic network, the method comprising the steps of:

step S1: restoring a non-blocked face image with a correct posture from face images of newborns with different postures and possibly blocked according to the generated countermeasure network; the process is as follows:

(1) loss of symmetry function

(2) antagonism loss function

L_GAN-Dis(g_i,x′_j)＝-log(Dis(g_i))-log(1-Dis(x′_j))

for the generator, the way the antagonism loss function is computed is:

L_GAN-Gen(x′_j)＝-log(Dis(x′_j))

wherein GAN-Gen represents a generator and Dis represents a discriminator;

(3) identity retention loss function

wherein H_l,W_lThe height and the width of the characteristic graph of the last l layer are shown, omega represents the characteristic graph, and | represents an absolute value, the identity retention loss aims to enable the generated image and the original image to have a smaller distance in a depth characteristic space, and considering that Light CNN can classify thousands of identities after being pre-trained, the Light CNN can capture the most important face structure or characteristic for identity recognition;

(4) total Variation regularization

step S3: screening and analyzing the corrected facial pain characteristics by using a residual error network combined with an attention mechanism so as to further get rid of the interference of shielding and posture change and output an accurate pain level result, wherein the process comprises the following steps:

The implementation process of the neonatal pain expression recognition method based on generation of the antagonistic network in the embodiment is as follows:

1): acquiring a neonate image set under a real environment, wherein the neonate image set comprises a pain level label, and the process is as follows:

the handheld electronic equipment is used for recording videos with the recording duration of one minute in a neonatal ward to serve as original data, and in order to ensure authenticity and clinical value of the data, video recording is only carried out when clinical pain-causing operation occurs, and it is guaranteed that the handheld electronic equipment is not moved to follow posture changes of a neonate, and blocking is not limited. After obtaining enough raw video data, selecting nurses with specialized training and pain assessment experience to perform pain level assessment meeting clinical criteria; selecting key frames from each video to form a neonate image set, evaluating the pain level of the neonate according to a neonate pain scale NIPS, generating four pain states based on the NIPS score as pain level labels of the neonate image set, namely no pain (NIPS score: 0-1 point), mild pain (NIPS score: 2-3 point), moderate pain (NIPS score: 4-5 point) and severe pain (NIPS score: 6-7 point), and dividing the data set into two subsets according to whether the human face is occluded by other objects (such as medical equipment or limbs of the neonate): "with occlusion" and "without occlusion", then the open source method in OpenCV estimates the face pose of the neonate and obtains Tait Bryan angles (pitch, yaw, roll). When the "pitch" angle is greater than 30 ° or the "yaw" angle is greater than 45 °, the neonatal facial pose is considered to be of a non-ideal type, whereby the data set is further divided into four subsets, the test sets in each subset are determined in the usual ratio of 7:3 and combined into a complete test set;

2): preprocessing a newborn image set, specifically comprising face detection, cutting and alignment; the process is as follows:

the method includes the steps that a newborn face image is cut out from a newborn image by using an open source ZFace, the ZFace can output 49 face mark points and face boundary points, and after the newborn face image is obtained, uniform alignment processing is carried out, specifically, plane deflection (rolling angle) of a face is easy to align compared with yaw angle (yaw angle) and pitch angle (pitch angle), so that the face images are unified into a vertical state by affine transformation, and two-dimensional face images are aligned by linear transformation under the action of an affine matrix M. Specifically, the affine transformation matrix M is dynamically acquired by calculating the coordinates of the key points in the original image and using the correspondence with the coordinates of the key points in the reference face. The calculation process is as follows:

wherein, a₁、b₁、c₁、a₂、b₂、c₂Respectively representing the values to be determined in a three-dimensional affine matrix M, (a, beta) as original coordinates, (u, v) as transformed seatsAnd (4) marking. The three links of face detection, cutting and alignment in the preprocessing operation can be replaced by other related algorithms.

3): constructing a pain recognition model, wherein the model comprises a pain characteristic correction module and a pain level classification module; the process is as follows:

as shown in fig. 2, a pain feature correction module is built based mainly on generating an antagonistic network. The generation countermeasure network mainly comprises a generator and a discriminator, wherein the discriminator is responsible for learning and distinguishing images generated by the generator and a guide set, and the guide set g comprises all ideal face images in a training set. Inspired by TP-GAN, producers in a designed generative confrontation network have two paths, focusing on global shape and local detail, respectively. For the design of the local path, five pain-related facial landmarks, i.e., left eye, right eye, nose, upper mouth, and lower mouth, were first detected using the open-source MTCNN. Five regions cut out from the five facial markers as centers are input to five generators in the local path; the global path is designed more conventionally, only one generator is used, the information fusion strategy between the two paths is similar to TP-GAN, a denoising auto-encoder (DAE) is selected as the generator, and the denoising auto-encoder is characterized by receiving an image damaged by some form of noise and realizing noise removal by requiring the output image to be similar to an ideal version of the original image. Input x (face image) can be considered as an ideal face image that has been corrupted by pose changes and partial occlusion, "denoising" is achieved by requiring DAE to learn how to bring the output image as close as possible to the guide set; four loss functions are used simultaneously to take advantage of the generation of the countermeasure network and DAE, and are expressed as follows:

(1) loss of symmetry function

where H and W represent the height and width of the image, (n, m) represent the pixels of the image, | · | represents an absolute value, real-world images may not have symmetry at the pixel level, and therefore, it is decided to minimize the loss of symmetry in laplace space;

(2) antagonism loss function

L_GAN-Dis(g_i,x′_j)＝-log(Dis(g_i))-log(1-Dis(x′_j))

for the generator, the way the antagonism loss function is computed is:

L_GAN-Gen(x′_j)＝-log(Dis(x′_j))

wherein GAN-Gen represents a generator and Dis represents a discriminator;

(3) identity preserving function

Preserving identity is a key part of ideal face generation, and adopts perceptual loss aiming at maintaining perceptual similarity to help the pain feature correction module to obtain identity preservation capability, specifically, a loss function is calculated based on feature maps output by the last two layers in the open-source Light CNN:

wherein H_l,W_lThe height and the width of the last l-th layer feature map are represented, omega represents the feature map, and | represents an absolute value, the identity retention loss aims to enable the generated image and the original image to have a smaller distance in a depth feature space, and in consideration of the fact that Light CNN can classify thousands of identities after being pre-trained, the Light CNN can capture the most important face structure or feature for identity recognition;

(4) total Variation regularization

where H and W represent the height and width of the image, (n, m) represent the pixels of the image, x' is the image generated by the generator, | · | represents an absolute value;

the pain level classification module is responsible for analyzing the corrected pain features to output a final pain level classification result, and the eigenvector (vector) z (output variable of an encoder in the generator) of the generator in the pain feature correction module is regarded as the corrected pain features and is input into the pain level classification module. As shown in fig. 3, for the network structure of this portion, a residual structure is used as a main trunk, and an attention mechanism is added, so that an attention branch parallel to the residual branch is constructed, and based on a bottom-up top-down structure, an attention mask with the same size can be output, and soft weighting is performed on the face features; in the bottom-up top-down structure, "down-sampling" is achieved by a series of convolutions and pooling, while "up-sampling" is achieved by deconvolution; the attention mask output from the attention mechanism can be used as a feature selector in forward propagation and also can be used as a filter in backward gradient updating, and the gradient calculation mode of the facial features under the action of the attention mask is as follows:

where M represents an attention branch, T represents a residual branch, σ represents an attention branch parameter, and φ is a residual branch parameter;

4): training and testing the constructed pain recognition model by using a neonatal pain image set; the process is as follows:

training the constructed neonatal pain recognition model by using a training set of a neonatal image set, wherein for the training of the pain correction module, the loss function calculation mode of a generator is as follows:

L_Gen＝λ₁L_tv+λ₂L_id+λ₃L_sym+ηL_GAN-Gen

wherein λ₁、λ₂And λ₃The generator receives error signals from the discriminator, so that a parameter eta is used as the weight of the antagonism loss function, and the specific values of the parameters are as follows: lambda [ alpha ]₁＝5×10^-3；λ₂＝3×10^-2；λ₃0.3; η is 0.1. For the training of the discriminators only the antagonism loss function, i.e. L, is used_GAN-Dis. Furthermore, occlusion removal for the local path output image is enhanced in view of the difficulty of completely removing the occlusion in the real scene and the importance of the five facial markers for pain assessment. Specifically, after the training of the pain correction module is completed once, new generation countermeasure networks are respectively constructed for 5 DAEs in the local path, namely, the output image of the local path and the image of the corresponding position in the guidance centralization are input into a new discriminator to realize the countermeasure training of the DAEs in the local path, after the training of the pain correction module is completely completed, the training of a pain level classification module is started, the parameters of the module are adjusted to be optimal through an error back propagation algorithm, a constraint term based on attention weight is added to a loss function of the pain level classification module on the basis of a mean square error loss function, wherein a gradient calculation mode under the action of attention mask is described in detail in step S3, and for a test sample in the neonatal image set, the neonatal pain recognition model is trained to perform neonatal pain recognition on the neonatal pain recognition, to obtain its corresponding pain level.

Based on the method, the invention is verified on a neonatal expression set, the comparison of the performance of the invention with that of other neonatal pain expression recognition methods is shown in table 1, as shown in table 1, our invention shows significant performance advantages in the face of occlusion and posture change, and in addition, ablation experiments are performed on the improvement effect of attention branches on the neonatal pain recognition accuracy, and the results are shown in table 2.

TABLE 1

TABLE 2

While the invention has been described with reference to specific embodiments, the invention is not limited thereto, and various equivalent modifications and substitutions can be easily made by those skilled in the art within the technical scope of the invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. A method for recognizing a neonatal pain expression based on generation of an antagonistic network, the method comprising the steps of:

step S1: restoring an unobstructed face image with a correct posture from face images of newborns with different postures and possible occlusion according to the generation countermeasure network;

step S2: generating eigenvectors of generators in the antagonistic network as modified facial features for subsequent pain analysis;

step S3: and screening and analyzing the corrected facial features by using a residual error network combined with an attention mechanism so as to further get rid of the interference of shielding and posture change and output an accurate pain level result.

2. The method for recognizing a neonatal pain expression based on generation of an antagonistic network in accordance with claim 1, wherein the step S1 is performed by:

the generation countermeasure network consists of a generator and a discriminator, wherein the generator is responsible for generating a modified face image on the basis of an input face image; the discriminator is responsible for learning and distinguishing images generated by the generator and ideal face images which are not shielded and have correct postures in the guide set, the guide set g consists of all the ideal face images in the training set, and the generator continuously improves the capability of the generator for converting input images into the non-shielded and correct face images through the zero sum game of the generator and the discriminator; in the process of the zero-sum game, the training of the generators and the discriminators is carried out by four loss functions, the parameters of the generators and the discriminators are adjusted to be optimal through an error back propagation algorithm, and the four loss functions are as follows:

(1) loss of symmetry function

where H and W represent the height and width of the image, (n, m) represent the pixels of the image, |, represents an absolute value. Real-world images do not have absolute symmetry at the pixel level, and therefore, it is decided to minimize the loss of symmetry in laplacian space;

(2) antagonism loss function

The discriminator network acts as a supervisor and is responsible for distinguishing the generated face image from the ideal image and training with the generator, and the discriminator is trained by the following cross entropy loss function:

L_GAN-Dis(g_i,x′_j)＝-log(Dis(g_i))-log(1-Dis(x′_j))

wherein g is_iRepresenting guide set image, GAN-Dis representing discriminator, x_j' is the image generated by the generator;

for the generator, the way the antagonism loss function is computed is:

L_GAN-Gen(x′_j)＝-log(Dis(x′_j))

wherein GAN-Gen represents a generator and Dis represents a discriminator;

(3) identity retention loss function

(4) total Variation regularization

To improve the spatial smoothness of the generated image and reduce the spiking artifacts, a Total Variation regularizer is employed, which is defined as follows:

3. The method for recognizing a neonatal pain expression based on generation of an antagonistic network according to claim 1 or 2, wherein the step S3 is performed by:

screening and analyzing facial pain characteristics after zero sum game correction by using a residual error network combined with an attention mechanism, constructing an attention branch parallel to the residual error branch, outputting an attention mask with the same size as a characteristic diagram in the residual error structure based on a bottom-up top-down structure, and carrying out soft weighting on the facial characteristics in the residual error structure; in the bottom-up top-down structure, "down-sampling" is achieved by a series of convolution operations and pooling operations, while "up-sampling" is achieved by a deconvolution operation; the attention mask output by the attention mechanism can be used as a feature selector in forward propagation and also can be used as a filter in backward gradient updating, and the gradient of the facial feature under the action of the attention mask is calculated in the following way: