Disclosure of Invention
The invention aims to provide a noninvasive diabetes risk prediction method, which is based on a face image and adopts an attention mechanism, can sense target information of a facial key area, inhibit other useless information, greatly improve the fitting speed and generalization capability of a model, and can perform noninvasive and accurate diabetes risk prediction quickly.
In order to achieve the purpose, the technical scheme of the invention is as follows:
a method for noninvasive risk prediction of diabetes, comprising the following steps:
acquiring or constructing a data set comprising facial images of a diabetic person and a healthy person,
preprocessing an image sample in the data set, positioning face image feature points, obtaining a plurality of key areas, cutting, splicing and marking the key areas with diabetes diagnosis information, and obtaining a marked sample data set;
and constructing a residual error attention network, performing supervised machine learning on the marked samples, and obtaining a diabetes noninvasive risk prediction model after training and parameter adjustment.
The sample images in the data set are a large number of front face images of testees collected by a high-definition camera under the same natural environment state (illumination, angle and expression), and are data sets for self-establishing facial images of diabetics and healthy people, and the specific process is as follows:
a large number of diabetic and healthy human subjects were enrolled, with the inclusion criteria set: all subjects need to be between 40 and 90 years of age; the facial skin has no obvious scars, and facial images are not made up at the same day of collection, wherein the diabetes subjects need to meet the condition that the definite diabetes diagnosis is made in second and above medical institutions; the blood sugar range of a healthy human subject meets 3.9-6.1 millimoles per liter of fasting whole blood sugar, 6.7-9.4 millimoles per liter of blood sugar after 1 hour, less than or equal to 7.8 millimoles per liter of blood sugar after 2 hours, or less than 6.5 percent of glycosylated hemoglobin H1A1c in a physical examination report within three months, and no history of diabetes exists; all subjects had no significant statistical differences in age, gender, etc.;
in a room that the illumination is good, the testee sits the one end at the desk, fixes the head on the forehead holds in the palm the support, and a high definition camera is placed to the other end of desk, holds in the palm the height of support through the adjustment forehead, ensures that the camera can be clear shoots the face image that the testee was facing, and the process of whole sample collection guarantees that the testee angle of shooing, expression and outside illumination condition are unanimous as far as possible.
Marking 68 points by using a pre-trained model 'shape _ predictor _68_ face _ landworks.dat' in a Dlib toolkit, performing imaging processing by using OpenCv, and drawing 68 points on a human face; positioning and cutting a key area according to the coordinates of the 68 feature points; the key area is selected in a scattered mode in the area where facial organs including eyebrows, eyes, a nose and a mouth are avoided, and the shape of the key area is rectangular.
Setting four key areas which are a forehead area (A), a left cheek area (B), a right cheek area (C) and a lower jaw area (D); the specific operation mode is as follows: firstly, a face characteristic point detection method in a machine learning toolkit Dlib is adopted to label the outlines of 68 key point positions on a face sample, and the coordinate of each point is marked as Pi(x, y), i is 1 to 68; with P9The horizontal axis of (x, y) is the axis of abscissa and is denoted as the x-axis, P1The vertical axis of (x, y) is the ordinate axis and is marked as the y axis, and simultaneously, 4 key squares with the same size are defined according to the coordinate relation between the characteristic pointsRegions (64 × 64 pixels), noted A, B, C, D respectively; if the position of the key region in the face image is to be accurately located, the coordinates of the center point of the key region must be located first, and the coordinates of the center point of the key region A, B, C, D are recorded as PA(x,y)、PB(x,y)、PC(x,y)、PD(x,y);
Key area A is near forehead area above central axis of human face, PAThe abscissa of (x, y) is taken as the abscissa of the nose tip feature point 34 and is denoted as P34(x) The vertical coordinate of the feature point corresponding to the highest point of the eyebrow is taken as Pmax-high(y) adding half of the corresponding length of the critical area pixel value 64 and recording as h, PAThe formula for the calculation of (x, y) is:
PA(x,y)=(P34(x),Pmax-high(y)+h) (1)
key regions B, C are near the left and right cheeks, respectively, of the face, PB(x,y)、PCThe formula for the calculation of (x, y) is:
PB(x,y)=(P42(x),P32(y)) (2)
PC(x,y)=(P47(x),P36(y)) (3)
PBthe abscissa and ordinate of (x, y) are the abscissa of the feature point 42 at the lowest position in the left eye, and are denoted as P42(x) Longitudinal coordinate P of feature point 32 on the leftmost side of the nose32(y);PCThe abscissa and ordinate of (x, y) are respectively the abscissa of the feature point 47 at the lowest position of the right eye and are denoted as P47(x) The ordinate of the rightmost feature point 36 of the nose is denoted as P36(y);
The critical area D is near the central axis below the mouth, PDThe abscissa of (x, y) is the abscissa of the feature point 58 at the lowermost end of the mouth, and is denoted as P58(x) The ordinate is half the vertical distance between feature point 58 and feature point 9; pDThe specific calculation formula of (x, y) is as follows:
after the coordinates of the central points of the four key areas are confirmed, the specific coordinates of the four vertexes of the key area of each square can be calculated according to the coordinates of the central points, and the calculation formula is as follows:
Pn, upper left(x,y)=(Pn(x)-h,Pn(y)+h) (5)
Pn, lower left(x,y)=(Pn(x)-h,Pn(y)-h) (6)
Pn, upper right(x,y)=(Pn(x)+h,Pn(y)+h) (7)
Pn, lower right(x,y)=(Pn(x)+h,Pn(y)-h) (8)
Wherein n represents A, B, C, D; h is half of the corresponding length of the key area pixel value 64;
the cut key areas of each human face image are spliced into a face combination image (128 multiplied by 128 pixels) according to the sequence of A, B, C, D, and the splicing sequence of all samples in the data set is guaranteed to be the same.
The residual error attention network is a 56-layer residual error attention network built by adopting a pytorech machine learning library, and the specific network architecture is as follows:
inputting the marked sample image into a residual error attention network, performing 1 convolution and maximum pooling operation through a first convolution layer and a maximum pooling layer, then inserting the marked sample image through 3 residual error units and 3 attention modules, respectively marking the 3 residual error units as a first residual error unit, a second residual error unit and a third residual error unit, performing average pooling operation, then reaching a full-link layer, finally connecting the full-link layer at the tail end of the residual error attention network by using a normalized exponential function Softmax to perform diabetes risk prediction, and outputting a prediction result;
each attention module is divided into two branches, one called the main branch, the other a soft mask branch,
the characteristic diagram is firstly preprocessed by 1 residual error unit, then respectively enters a main branch and a soft mask branch,
the main branch mainly comprises 2 residual units in series,
the soft mask branch comprises two steps of quick feedforward scanning and top-down feedback, the characteristic graph is subjected to two times of down-sampling operation to increase the receptive field, after the lowest resolution is reached, the size of the characteristic graph is enlarged to be consistent with the size of the input original characteristic graph through the same number of up-sampling operation to form an attention characteristic graph, 2 convolution layers of 1 multiplied by 1 are connected, and finally the attention of a mixed domain is obtained through a sigmoid activation function;
in addition, jump connection is added between down sampling and up sampling to fuse the feature information of feature maps with different proportions; the output of the soft mask branch is firstly subjected to matrix multiplication with the output of the main branch, the result is subjected to matrix addition with the output of the main branch, and finally the output of the attention module is obtained through p residual error units;
a residual error unit in the attention module adopts a bottleeck structure to reduce the parameter number, the size of a first layer convolution kernel in the bottleeck structure is 1 multiplied by 1, and the channel number is 64; the size of the second layer of convolution kernel is 3 multiplied by 3, and the number of channels is 64; the size of the convolution kernel of the third layer is 1 multiplied by 1, the number of channels is 256, the activation function between the convolution layers is set to relu, the output of the bottleeck structure is the sum of the output of the convolution layer of the third layer and the output of the identyblock, and the size is 112 multiplied by 112;
the first convolutional layer contains convolutional kernels with the size of 7 × 7, the step length is 2 × 2, the number of channels is 64, the filling mode is set to valid, and the output of the convolutional layer is 112 × 112;
the size of the pooling window of the maximum pooling layer is 3 multiplied by 3, the step length is 2 multiplied by 2, and the size of the feature graph output after the maximum pooling operation is 56 multiplied by 56;
the first residual unit adopts a bottleeck structure to reduce the number of parameters. The size of a first layer convolution kernel in the bottleeck structure is 1 multiplied by 1, and the number of channels is 64; the size of the second layer of convolution kernel is 3 multiplied by 3, and the number of channels is 64; the size of the convolution kernel of the third layer is 1 multiplied by 1, and the number of channels is 256; the activation function between convolutional layers is set to relu. The output of the bottompiece structure is the sum of the output of the third convolutional layer and the output of the identyblock, and the size is 56 × 56.
The first attention module is connected behind the first residual error unit, and the output size of the attention module is as follows: 56 is multiplied by 56;
a second residual error unit is connected behind the first attention module, a bottleeck structure is also adopted, 3 layers of convolution are set to reduce the parameter number, wherein the size of a convolution kernel of the first layer is 1 multiplied by 1, and the number of channels is 128; the size of the second layer of convolution kernel is 3 multiplied by 3, and the number of channels is 128; the size of the convolution kernel of the third layer is 1 multiplied by 1, and the number of channels is 512; setting an activation function between each convolution layer as relu; the output of the bottleeck structure is the sum of the output of the third convolutional layer and the output of the identyblock, and the size of the output is 28 multiplied by 28;
the second attention module is connected behind the second residual error unit, and the output size of the attention module is as follows: 28X 28;
a third residual error unit is connected behind the second attention module, a bottleeck structure is also adopted, 3 layers of convolution are set to reduce the parameter number, wherein the size of a convolution kernel of the first layer is 1 multiplied by 1, and the number of channels is 256; the size of the second layer of convolution kernel is 3 multiplied by 3, and the number of channels is 256; the size of a convolution kernel in the third layer is 1 multiplied by 1, and the number of channels is 1024; setting an activation function between each convolution layer as relu; the output of the bottleeck structure is the sum of the output of the third convolutional layer and the output of the identyblock, and the size of the output is 14 multiplied by 14;
and a third attention module is connected behind the third residual error unit, and the output size of the attention module is as follows: 14 is multiplied by 14;
a fourth residual error unit is connected behind the third attention module, 3 serially connected bottleeck structures are adopted, 3 layers of convolution are set to reduce the parameter number, the size of a convolution kernel of a first layer in each bottleeck is 1 multiplied by 1, and the number of channels is 512; the size of the second layer of convolution kernel is 3 multiplied by 3, and the number of channels is 512; the size of the convolution kernel of the third layer is 1 multiplied by 1, and the number of channels is 2048; the activation function between each convolution layer is set to relu, and the output size of the fourth residual error unit is 7 multiplied by 7;
carrying out average pooling operation on the feature map output by the fourth residual unit, wherein the size of a pooling window is 7 multiplied by 7, the step length is 1 multiplied by 1, and the size of the feature map subjected to average pooling is 1 multiplied by 1;
and finally, connecting a full-connection layer at the tail end of the residual attention network by using a normalized exponential function Softmax to predict the diabetes risk.
The invention has the beneficial effects that:
the invention adopts a residual error attention network to construct a diabetes noninvasive risk prediction model, and the network is constructed by stacking a plurality of attention modules through combining an end-to-end training mode and a feedforward network architecture. These modules generate attention awareness functionality. The visual attention mechanism is a brain signal processing mechanism unique to human vision. The target area needing attention is obtained by rapidly scanning the global image, then the target information needing attention is obtained in a focused mode, and other useless information is suppressed. The attention mechanism greatly improves the efficiency and the accuracy of machine vision information processing. Compared with a traditional residual error network model, the residual error attention mechanism can achieve fine-grained feature matching, meanwhile, the influence of a target area is enhanced, the influence of a non-target area is inhibited, and the fitting speed and the generalization capability of the model are favorably improved; in addition, the invention adopts a non-invasive testing method, namely, the frontal facial image of the testee is analyzed through a model, and the future diabetes onset risk of the testee is predicted. The method has the advantages of short time consumption and low cost, supports large-scale screening and remote diagnosis and treatment, is beneficial to quickly screening the diabetic patients in the high-morbidity crowd, reminds the diabetic patients to control the blood sugar as early as possible, and avoids the occurrence of diabetic complications.
The method is used for predicting the risks of the diabetics, whether the diabetics suffer from diabetes or not is distinguished through the face images, the residual error attention network has key feature perception capability by taking the face key regions as input samples after being processed on the basis of the traditional residual error network, compared with a traditional residual error network model mechanism, the network can achieve fine-grained feature matching, meanwhile, the influence of a target region is strengthened, the influence of a non-target region is inhibited, and the fitting speed and the generalization capability of the model are improved. The method supports large-scale screening and remote diagnosis and treatment, is high in speed and low in cost, and can be operated by non-professional personnel.
The self-built diabetes face database (the unit is a special diabetes hospital, has abundant diabetes patients and cases, and rarely has a complete diabetes patient face sample data set in the world at present) is adopted as an experimental basic sample, and the self-built diabetes patient and healthy human face sample data set is broken through, so that the diabetes risk prediction is carried out by using the residual attention neural network model. In the published literature of noninvasive diabetes risk prediction, no relevant method for noninvasive diabetes risk prediction by adopting a face image and a residual attention network exists, supervised machine learning is carried out on the face image of a diabetic by adopting a residual attention network model for the first time, and the diabetes risk prediction is carried out by a deep learning method.
Detailed Description
The technical solutions in the embodiments of the present invention will be described clearly and completely with reference to the accompanying drawings in the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that the embodiments and features of the embodiments may be combined with each other without conflict. The invention is further described with reference to the following drawings and specific examples, which are not intended to be limiting.
The invention provides a noninvasive diabetes risk prediction method based on a face image and a residual error attention network, which belongs to the field of intelligent medical treatment and mainly comprises the following steps as shown in figure 1:
in step S1, subjects are recruited to construct a data set containing images of the faces of diabetic patients and healthy persons.
The image in the data set in step S1 is a frontal facial image of the subject captured by the high-definition camera under the same natural environment conditions (illumination, angle, expression).
And step S2, preprocessing an image sample, positioning the characteristic points of the face image, cutting and splicing the key area, and marking the diabetes diagnosis information to obtain a marked sample data set.
In step S2, a Dlib machine learning library is first used to locate the feature points of the face image, 4 rectangular key regions are sequentially cut out according to the feature point locations, then a complete rectangular image is spliced in sequence, and finally corresponding diagnostic information is labeled for each spliced sample. And marking 68 points by using a pre-trained model 'shape _ predictor _68_ face _ landworks.dat' in a Dlib toolkit, and performing imaging processing by using OpenCv to draw 68 points on the face. The key area is located and cut according to the coordinates of the 68 feature points, which are conventional in the machine vision technology.
And step S3, randomly dividing the marked sample data set into a training set, a verification set and a test set. The random division in step S3 is to divide the sample data set into K mutually exclusive subsets with similar sizes after performing a shuffle operation on the sample data set, each time using the union of K-1 subsets as a training set, the rest subsets as a test set, and simultaneously dividing 50% of the samples in the test set as a verification set.
And step S4, constructing a residual attention network, and performing supervised machine learning on the sample. Step S4 builds a 56-layer residual attention network using the pytorech machine learning library, and classifies the sample by connecting all connected layers at the end of the residual attention network using the normalized exponential function Softmax.
And step S5, performing model parameter adjustment according to the performance on the verification set, estimating the generalization ability of the model based on the discrimination effect on the test set, and obtaining a diabetes noninvasive risk prediction model with good performance by adopting a cross-validation method. In step S5, the average of K test results is taken when the model effect is determined.
In this embodiment, the data set constructed in step S1 may be a sample data set of a self-constructed Face image of a diabetic patient in this unit, and is named as TMU-DFD (Tianjin Medical University-Diabetes Face Dataset). Wherein the 384 diabetes subjects recruited are from outpatients and inpatients of the Zhuxianyi memorial hospital of Tianjin medical university; 137 healthy human subjects were from the present college employees, family members, graduates, and social volunteers. In this embodiment, 2 to 3 clear front face images of each subject are collected by the high definition camera under the same natural environment condition (the same illumination and angle), and in this embodiment, the face image samples of the diabetic patient collected in step S1 are 966 and the face image samples of the healthy person are 411. Most of the objects collected in the embodiment are Chinese, and the sample data set is used for training a diabetes noninvasive risk prediction model, so that the method is more targeted and has good applicability to Chinese.
In this embodiment, in order to improve the subject recruitment efficiency and the sample quality, the inclusion/exclusion criteria need to be strictly set. Specifically, all subjects need to be between 40 and 90 years of age; the facial skin has no obvious scar, and the facial image collection day has no makeup. Wherein the diabetic subject is required to meet the requirement that a definite diabetes diagnosis has been made in second and above medical institutions; the blood sugar range of the healthy human subjects meets the requirements of fasting whole blood sugar of 3.9-6.1 millimole/liter, 1 hour after meal of 6.7-9.4 millimole/liter, 2 hours after meal of less than or equal to 7.8 millimole/liter, or a physical examination report within three months has less than 6.5 percent of glycosylated hemoglobin H1A1c and no history of diabetes. All subjects had no significant statistical differences in age, gender, etc.
Specifically, a schematic diagram of sample image acquisition is shown in fig. 2. In a well lit room, the subject sits on one end of the table and fixes his head to the forehead rest support. A high-definition camera is placed at the other end of the desk, and the height of the forehead support bracket is adjusted to ensure that the camera can clearly shoot facial images of the front face of a subject. In the whole sample collection process, conditions such as the photographing angle, the expression and the external illumination of the subject are ensured to be consistent as much as possible.
The diabetic is characterized in that the face is commonly suffered from symptoms of red swelling, skin infection, pruritus, dryness, pigmentation and the like due to the rise of blood sugar and the pathological changes of capillary vessels, wherein the intensity of the red swelling of the face depends on the degree of congestion of the blood vessels of the superficial venous plexus. The present invention focuses on the skin of the face. Meanwhile, in order to avoid the interference of facial organs such as eyebrows, eyes, a nose, a mouth and the like in the experiment, 4 key regions of a forehead region (a), a left cheek region (B), a right cheek region (C) and a lower jaw region (D) in the facial image are extracted for the experiment. The specific operation manner is as shown in fig. 3, firstly, a face feature point detection (facewellandmarkdetection) method in the machine learning toolkit Dlib is adopted to label contours of 68 key point locations on a face sample, and the coordinate of each point is marked as Pi(x, y). With P9The horizontal axis of (x, y) is the axis of abscissa and is denoted as the x-axis, P1The vertical axis of (x, y) is the ordinate axis and is marked as the y-axis, and meanwhile, 4 key areas (64 × 64 pixels) of the same size are defined according to the coordinate relationship between the feature points and are respectively marked as A, B, C, D. If the position of the key region in the face image is to be accurately located, the coordinates of the center point of the key region must be located first, and the coordinates of the center point of the key region A, B, C, D are recorded as PA(x,y)、PB(x,y)、PC(x,y)、PD(x,y)。
Key area A is near forehead area above central axis of human face, PAThe abscissa of (x, y) is taken as the abscissa of the nose tip feature point 34 and is denoted as P34(x) The feature point of the highest point of the eyebrow (generally P) is taken on the ordinate20(y) or P25(y)) the ordinate represents Pmax-high(y) adding half of the corresponding length of the critical area pixel value 64 and recording as h, PAThe calculation formula of (x, y) is as follows:
PA(x,y)=(P34(x),Pmax-high(y)+h) (1)
key regions B, C are near the left and right cheeks, respectively, of the face, PB(x,y)、PCThe calculation formula of (x, y) is as follows:
PB(x,y)=(P42(x),P32(y)) (2)
PC(x,y)=(P47(x),P36(y)) (3)
PBthe abscissa and ordinate of (x, y) are the abscissa of the feature point 42 at the lowest position in the left eye, and are denoted as P42(x) Longitudinal coordinate P of feature point 32 on the leftmost side of the nose32(y)。PCThe abscissa and ordinate of (x, y) are respectively the abscissa of the feature point 47 at the lowest position of the right eye and are denoted as P47(x) The ordinate of the rightmost feature point 36 of the nose is denoted as P36(y)。
The critical area D is near the central axis below the mouth, PDThe abscissa of (x, y) is the abscissa of the feature point 58 at the lowermost end of the mouth, and is denoted as P58(x) The ordinate is half the perpendicular distance of feature points 58 and 9. PDThe specific calculation formula of (x, y) is as follows:
after the coordinates of the central points of the four key areas are confirmed, the specific coordinates of the four vertexes of the key area of each square can be calculated according to the coordinates of the central points, and the calculation formula is as follows:
Pn, upper left(x,y)=(Pn(x)-h,Pn(y)+h) (5)
Pn, lower left(x,y)=(Pn(x)-h,Pn(y)-h) (6)
Pn, upper right(x,y)=(Pn(x)+h,Pn(y)+h) (7)
Pn, lower right(x,y)=(Pn(x)+h,Pn(y)-h) (8)
Wherein the value range of n is A, B, C, D; h is half the corresponding length of the critical area pixel value 64. The cut key areas of each sample are spliced into a face combination graph (128 multiplied by 128 pixels) according to the sequence of A, B, C, D, the splicing sequence of all samples in the data set is guaranteed to be the same, and the four positions of the selected key areas are different according to different face shapes.
The invention adopts a supervised machine learning algorithm to carry out the non-invasive detection of the diabetes, so that the diagnosis and the labeling of the data samples are needed. Specifically, the diagnostic information of the subject is associated with the image sample by searching medical systems such as HIS, EMR, LIS, physical examination, etc. in the home. For the subject samples recruited outside the hospital, the diagnosis materials (physical examination reports, medical records, etc.) related to the medical institution provided by the subjects are labeled.
In order to obtain a diabetes noninvasive risk prediction model with good classification performance and generalization capability, the sample data set marked in the step S3 is randomly divided into a training set, a verification set and a test set. Specifically, the random division refers to dividing the data set into 5 mutually exclusive subsets with similar sizes after performing a shuffle operation on the data set, wherein a union set of 4 subsets is used as a training set each time, and the remaining 1 subset is used as a test set. In the test set, 50% of the samples were randomly divided as validation sets, and the test set, validation set, and training set all included the same proportion of diabetic and healthy human samples.
Preferably, the embodiment uses a pytorech machine learning library to build a Residual Attention Network (Residual Attention Network) with a depth of 56, which is denoted as Attention-56. The network is constructed by stacking a plurality of attention modules through the combination of an end-to-end training mode and a feed-forward network architecture.
These attention modules generate attention aware functionality. The visual attention mechanism is a brain signal processing mechanism unique to human vision. Human vision obtains a target area needing attention by rapidly scanning a global image, then emphatically obtains target information needing attention, and suppresses other useless information. The attention mechanism greatly improves the efficiency and the accuracy of machine vision information processing. Compared with the traditional residual error network model mechanism, the residual error attention mechanism can achieve fine-grained feature matching, meanwhile, the influence of a target area is enhanced, the influence of a non-target area is inhibited, and the fitting speed and the generalization capability of the model are favorably improved.
In which the structural block diagram of the attention module is shown in fig. 4, the stacked structure is a basic application of the hybrid attention mechanism, which combines spatial information in the spatial domain and channel information in the channel domain. Each attention module can be divided into two branches, one called the main branch, which is the basic structure of the residual error network. The other branch is a soft mask branch, and the main part of the branch is a residual attention learning mechanism. The principle of soft mask is that key features in the image data are identified through another layer of new weights, and through learning training, the deep neural network learns the region needing attention in each new image, so that attention is brought to the region, and the essence of the soft mask is that a set of weight distribution which can be acted on a feature map is expected to be obtained through learning.
The parameter p in fig. 4 is represented as the number of preprocessed residual units before the main branch and the soft mask branch. t represents the number of residual units of the trunk branches. The relation of the t and p parameters is generally set as shown in formula (1):
t=2*p (1)
specifically, in this embodiment, p is 1, t is 2, and the feature map is denoted by x. The feature diagram is preprocessed by 1 residual unit, and then enters a main branch and a soft mask branch respectively. The residual error unit adopts a bottleeck structure to reduce the parameter number, the size of a first layer convolution kernel in the bottleeck structure is 1 multiplied by 1, and the number of channels is 64; the size of the second layer of convolution kernel is 3 multiplied by 3, and the number of channels is 64; the size of the convolution kernel of the third layer is 1 multiplied by 1, the number of channels is 256, and the activation function between the convolution layers is set to relu. The output of the bottompiece structure is the sum of the output of the third convolutional layer and the output of the identity block, and the size is 112 × 112.
The main branch mainly comprises 2 residual error units connected in series, the structure and parameters of each residual error unit are consistent with those of the residual error unit of the preprocessing operation, and the output is recorded as T (x). The soft mask branch comprises two steps of quick feedforward scanning and top-down feedback, the receptive field of the feature graph is increased through two times of down-sampling operation, after the lowest resolution is reached, the size of the feature graph is enlarged to be consistent with the size of the input original feature graph through the same number of up-sampling operation to form an attention feature graph, then 2 convolution layers of 1 × 1 are connected, and finally the attention of a mixed domain (combining the spatial information in the spatial domain and the channel information in the channel domain) is obtained through a sigmoid activation function, wherein the sigmoid function is shown as a formula (2):
in addition, jump connection is added between down sampling and up sampling to fuse the feature information of different scale feature maps. The output of the soft mask branch is marked as M (x), M (x) is firstly subjected to matrix multiplication (Element-wise Product) with the output T (x) of the main branch, the result is subjected to matrix addition (Element-wise Sum) with T (x) as shown in a formula (3), and the output of the attention module is obtained through p residual error units:
H(x)=(1+M(x))*T(x) (3)
preferably, a block diagram of a specific embodiment of the residual attention network is shown in fig. 5. The method comprises the steps that a sample image is input into a residual error attention network, convolution and maximum pooling operation are conducted for 1 time, then the sample image is interpenetrated through 3 residual error units and 3 attention modules, the sample image reaches a full connection layer after average pooling operation, finally a normalization index function Softmax is used for connecting the full connection layer at the tail end of the residual error attention network to conduct diabetes risk prediction, and a prediction result is output.
Specifically, the first convolutional layer contains convolutional kernels of size 7 × 7, step size 2 × 2, number of channels 64, fill mode set to valid, and convolutional layer output 112 × 112.
The size of the pooling window of the maximum pooling layer is 3 × 3, the step size is 2 × 2, and the size of the feature map output after the maximum pooling operation is 56 × 56.
The first residual unit adopts a bottleeck structure to reduce the number of parameters. The size of a first layer convolution kernel in the bottleeck structure is 1 multiplied by 1, and the number of channels is 64; the size of the second layer of convolution kernel is 3 multiplied by 3, and the number of channels is 64; the size of the convolution kernel of the third layer is 1 multiplied by 1, and the number of channels is 256; the activation function between convolutional layers is set to relu. The output of the bottompiece structure is the sum of the output of the third convolutional layer and the output of the identyblock, and the size is 56 × 56.
The first attention module is connected behind the first residual error unit, and the output size of the attention module is as follows: 56X 56.
A second residual error unit is connected behind the first attention module, a bottleeck structure is also adopted, 3 layers of convolution are set to reduce the parameter number, wherein the size of a convolution kernel of the first layer is 1 multiplied by 1, and the number of channels is 128; the size of the second layer of convolution kernel is 3 multiplied by 3, and the number of channels is 128; the size of the convolution kernel of the third layer is 1 multiplied by 1, and the number of channels is 512; the activation function between convolutional layers is set to relu. The output of the bottomleneck structure is the sum of the output of the third convolutional layer and the output of the identyblock, and the size is 28 × 28.
The second attention module is connected behind the second residual error unit, and the output size of the attention module is as follows: 28X 28.
A third residual error unit is connected behind the second attention module, a bottleeck structure is also adopted, 3 layers of convolution are set to reduce the parameter number, wherein the size of a convolution kernel of the first layer is 1 multiplied by 1, and the number of channels is 256; the size of the second layer of convolution kernel is 3 multiplied by 3, and the number of channels is 256; the size of a convolution kernel in the third layer is 1 multiplied by 1, and the number of channels is 1024; the activation function between convolutional layers is set to relu. The output of the bottompiece structure is the sum of the output of the third convolutional layer and the output of the identyblock, and the size is 14 × 14.
And a third attention module is connected behind the third residual error unit, and the output size of the attention module is as follows: 14 × 14.
And a fourth residual error unit is connected behind the third attention module, 3 serially connected bottleeck structures are adopted, and 3 layers of convolution are arranged to reduce the parameter number. The size of a convolution kernel of a first layer in each bottleeck is 1 multiplied by 1, and the number of channels is 512; the size of the second layer of convolution kernel is 3 multiplied by 3, and the number of channels is 512; the size of the convolution kernel of the third layer is 1 multiplied by 1, and the number of channels is 2048; the activation function between convolutional layers is set to relu. The output size of the fourth residual unit is 7 × 7.
The feature map output by the fourth residual unit is subjected to an average pooling operation, the size of a pooling window is 7 × 7, and the step size is 1 × 1. The average pooled feature size was 1 × 1.
Finally, the embodiment uses a normalized exponential function Softmax to connect with the full connectivity layer at the end of the residual attention network to predict the diabetes risk.
Specifically, in the step S5, a 5-fold cross validation method is adopted when the model effect is determined, 1 different sample is selected as a test set each time, and the other 4 samples are selected as a training set, and the experiment is repeated for 5 times. And performing model parameter adjustment according to the performance on the verification set, estimating the generalization capability of the model based on the discrimination effect on the test set, and taking the average value of 5 model tests as the evaluation index of the final model to obtain a risk prediction model with good performance.
The invention adopts a non-invasive risk prediction model of diabetes based on a residual attention network, and the residual attention network is formed by stacking a plurality of attention modules through combining an end-to-end training mode and a latest feedforward network system structure. Compared with a traditional residual error network model, the residual error attention network can achieve fine-grained feature matching, meanwhile, the influence of a target area is enhanced, the influence of a non-target area is inhibited, and the fitting speed and the generalization capability of the model are favorably improved; in addition, the invention adopts a non-invasive detection method, namely, the front face image of the testee is analyzed through a diabetes noninvasive risk prediction model, and the future diabetes onset risk of the testee is predicted. The method has the advantages of short time consumption and low cost, supports large-scale screening and remote diagnosis and treatment, is beneficial to quickly screening the diabetic patients in the high-morbidity crowd, reminds the diabetic patients to control the blood sugar as early as possible, and avoids the occurrence of diabetic complications.
Although only the preferred embodiments of the present invention have been described in detail, the present invention is not limited to the above embodiments, and various changes can be made without departing from the spirit of the present invention within the knowledge of those skilled in the art, and all changes are encompassed in the scope of the present invention.
Nothing in this specification is said to apply to the prior art.