CN112734003A

CN112734003A - Human skin temperature detection method based on deep convolutional network

Info

Publication number: CN112734003A
Application number: CN202011601981.7A
Authority: CN
Inventors: 王庆; 成孝刚; 宋丽敏; 耿鑫; 陈梦伟
Original assignee: Nanjing University of Posts and Telecommunications
Current assignee: Nanjing University of Posts and Telecommunications
Priority date: 2020-12-30
Filing date: 2020-12-30
Publication date: 2021-04-30

Abstract

The invention discloses a human body skin temperature detection method based on a deep convolutional network, which comprises the steps of collecting a video of the skin of a subject changing along with time and a skin temperature true value obtained by a temperature sensor, sampling and extracting the saturation of the video to obtain a characteristic matrix, carrying out interpolation processing on the skin temperature true value to obtain a label, processing to obtain a training set, inputting the obtained training set into an improved ResNet50V2 network to train the improved ResNet50V2 network to obtain a trained improved ResNet50V2 network, extracting characteristic information output by a first lamination layer, a first residual block, a second residual block and a third residual block by the improved ResNet50V2 network, carrying out global pooling respectively to obtain characteristic matrix characteristics of four extraction paths, and then splicing with a high-level characteristic matrix passing through an original network. The invention can predict the skin temperature of the human body in real time and has better accuracy.

Description

Human skin temperature detection method based on deep convolutional network

Technical Field

The invention relates to the field of deep learning, and relates to a human skin temperature detection method.

Background

Real-time non-invasive human body thermal comfort detection plays an important role in energy-saving control and comfort environment provision of intelligent buildings. The method aims to acquire video data of a human body in real time through visual sensing terminals such as a common camera (a mobile phone, a computer) and the like, analyze and sense the thermal comfort degree of the human body through an algorithm, and provide an analysis result as a feedback signal to a central Air Conditioning system (HVAC). On the basis, the intelligent building system can realize global energy optimization and realize energy-saving control on the premise of meeting human thermal comfort. Physiological studies have shown that the human body is in the best state in terms of thinking, observability, and skilled operation in a heat comfortable environment. Therefore, the method has important significance for realizing people-oriented intelligent buildings by recognizing individual differences and performing energy supply and optimization according to different thermal comfort requirements of each person.

The energy consumption of commercial and residential buildings accounts for 21% of the total energy consumption worldwide, and in countries and regions where the population flows to cities, the energy consumption of buildings will increase at a rate of 32% per year. Of the building energy consumption, 50% are associated with the central air conditioning system (HVAC), which is responsible for the thermal comfort of the entire building. At present, the industry provides a constant environment for buildings according to international standards (ASHRAE standard 55, ASHRAE standard 62.1), using parameters including temperature, humidity, air flow, etc. For example, the room temperature of a Swedish house is controlled to be about 25 ℃ throughout the year, and taking room temperature data monitored by the project applicant as an example, the room temperature of a Lindstedsvagen 3-4635 room in the Main school of the Imperial institute of technology, Sweden is always kept at 24.1-25.7 ℃; the indoor air quality standard in China stipulates that the heating room temperature in winter is 16-24 ℃, but in actual operation, some areas are much higher than the range and reach 27 ℃ or even 30 ℃. This constant room temperature cooling and heating method does not take into account individual variability and time variability of building users. In addition, studies have shown that even a slight room temperature adjustment (e.g., 1 ℃) has a large effect on the energy consumption of the entire building. On the basis of the Fanger's thermal comfort theory, the american society of heating, refrigeration and air conditioning engineers (ASHRAE) and the international organization for standardization ISO standard (No.7730) give a definition of a thermal comfort environment: in a certain indoor space, at least 80% of the users of the building are in a physiologically satisfactory thermal environment temperature range. Human thermal comfort is a subjective experience, and this information can only be obtained if everyone is specifically monitored. Therefore, based on the temperature of human skin, there are three methods for achieving thermal comfort, as follows:

1.2.1 questionnaire survey method

The questionnaire survey method is mainly to know the thermal preference of building users in a questionnaire form, and the thermal preference is used as the basis for environmental regulation. The paper questionnaire mode is used earlier, and after the development of the Internet, the survey is carried out through a network interface. For example, q.c.zhao et al established a data-driven thermal comfort prediction model based on the continuous voting of the building users. By online voting, four evaluation parameters are determined, and then the thermal comfort of the user is calculated by minimizing the error function of the coefficients. The questionnaire mode belongs to the subjective evaluation category, and essentially reflects the psychological state and the heat comfort level of the user according to the definition. However, the questionnaire method is poor in operability and inefficient because it depends on the continuous and frequent participation of the building users.

1.2.2 environmental monitoring method

The environment monitoring method is to detect the series parameters of indoor temperature, humidity, air flow rate and the like through a sensor, and establish the relationship between the parameters of temperature, humidity and the like and the thermal comfort of the human body by using a relevant supervision model so as to judge the comfort of the environment. Liu et al, for example, classify the room temperature environment into three categories of comfort, uncomfortable warmth and uncomfortable coolness, and conduct subjective experiments. The subjects are invited to vote, and then the perception statistics of the subjects are obtained. On the basis, 5 Artificial Neural Networks (ANN) of hidden layers are established, and model training is carried out through data of 4 parameters such as air temperature, radiation temperature, air flow, air humidity and the like. The limitation of this study is that the time-varying nature of human thermal comfort is not considered. In the operation process of the environment monitoring method, a thermal comfort model assumed in advance is taken as a basis, specific participation of a building user is lacked, and real-time thermal comfort feeling of the user cannot be obtained.

1.2.3 physiological assays

The physiological detection method is to capture the thermal comfort of human body through various physiological measurement sensors, and related parameters comprise skin temperature, pulse and the like. Physiological assays are classified into invasive, semi-invasive and non-invasive. Both invasive and non-invasive detection methods are mainly used here. The invasive detection method comprises a conventional contact temperature measuring instrument and the like, and the non-invasive method comprises a temperature sensor based on infrared perception. The data they collect can be used as truth to do the calibration work. The method comprises the following specific steps:

(1) invasive human body thermal comfort detection method

Wang et al found that the skin temperature of the fingers and the temperature gradient at the front ends of the fingers have strong correlation with the thermal comfort of the human body, and the correlation coefficients are 0.78 and 0.8, respectively. Yao et al have constructed the thermal comfort detection model of self-adaptation based on PMV (predicted Mean Vote), regard PMV numerical value as prior knowledge, under the assistance of parameters such as the weather, calculate the adaptability coefficient. Yao et al studied the Heart Rate Variability (HRV) index, brain waves (EEG) to characterize the likelihood of personal thermal comfort, with the results showing a close relationship between the three, and that HRV is more relevant to the thermal comfort data obtained by voting, but this study did not discuss how to detect EEG. Nakayama et al [18] constructed the relationship between local skin temperature and human thermal comfort, and the results showed that the mean square error was less than 1. Simone studied a thermal comfort estimation method based on energy detection, correlating individual energy expenditure rates with thermal sensations. Considering the convection and radiant heat exchange between the human body and the surrounding environment, it is calculated that the human body energy consumption rate increases as the indoor temperature rises to 24 c or falls below 22 c. Finally, a second-order polynomial relation model between the human body thermal comfort and the human body energy consumption rate is established. Liu et al investigated the possibility of describing human thermal comfort in terms of average skin temperature. Various mathematical combinations of skin temperature were performed on 26 sample points of the human body and the average of the 10 points was then selected as the most accurate measure. Bermejo et al learn individual thermal sensations on line through individual behavior towards thermostats and environments, and then construct a thermal comfort estimation algorithm based on adaptive fuzzy logic. Kingma proposed a mathematical model based on heat-sensing neurophysiology, arranging 12 subjects 'data as a training set and 8 subjects' data as a test set, with the parameters involved including skin temperature and core temperature. The results show that the mean error of the prediction model is 0.89 and the least squares error is 0.38. Takada takes the average skin temperature and the time difference as parameters and predicts the transient heat sensation of the human body, and then constructs a multivariate regression model, and when the correlation coefficient reaches 0.839, the predicted heat sensation is considered to have strong correlation. Chaudhuri proposes a data-driven approach to predict three sensations of heat, namely, uncomfortable cooling, comfort, and uncomfortable warmth. The environment parameters and the human body thermal sensation are used as input values to construct a model, and the model is compared with a plurality of algorithms such as a Support Vector Machine (SVM), an Artificial Neural Network (ANN), Linear Discriminant Analysis (LDA) and the like. The result shows that the prediction accuracy of the model reaches 73.14% -81.2%. S.y.sim et al detected human skin temperature by bracelet and invited 8 subjects to participate in the test under different thermal conditions. On the basis, a human thermal comfort model is constructed based on parameters such as average skin temperature, temperature gradient and temperature time difference. C.z.dai et al propose a Support Vector Machine (SVM) based thermal sensing prediction and HVAC energy control method based on the detection of skin temperature. The comprehensive skin temperature of different parts of the body is used for improving the model, and the prediction accuracy rate reaches 90 percent.

(2) Semi-invasive human body thermal comfort detection method

Ghahranani et al propose a skin temperature detection method based on infrared thermal imaging sensing, on which human thermal sensation is predicted. The sensors are mounted on the glasses, and three areas of the face are monitored. The paper defines two ways to describe the thermally neutral zone and estimates the thermal comfort of an individual at a 95% confidence level. In the next year, based on experimental data of a.ghahramini et al, a learning method based on Hidden Markov Models (HMMs) is proposed to capture the hot and comfortable feeling of a human face, and three states, namely, uncomfortable warmth, comfort, uncomfortable coolness, are set up. The method is used for testing 10 subjects, and the accuracy rate is 82.8%.

(3) Non-invasive human body thermal comfort detection method

Cheng expresses human thermal comfort by skin temperature, acquires a video of skin change of the back of a hand of a human body based on a common mobile phone camera, performs amplification analysis on texture of the back of the hand by a video amplification technology, and then constructs a relation between skin saturation and skin temperature by a machine learning method. On this basis, two ST (setup-temperature) models, namely a personalized ST model and a semi-personalized ST model, are proposed, wherein the absolute detection error of the personalized ST model is 1.25 ℃. The method combines computer vision and building physics, realizes non-invasive human body thermal comfort detection in a real sense for the first time, and provides a new idea for sensing the human body thermal comfort of building users.

Based on the current situation of domestic and foreign research on human body thermal comfort detection, the advantages and disadvantages of a questionnaire survey method, an environment monitoring method and a physiological detection method are summarized as follows:

1) questionnaire survey method: the psychological state of a building user can be well reflected, the human-oriented thought is reflected, the continuous and frequent feedback of the user is needed, and the operability is weak;

2) an environment monitoring method comprises the following steps: the system has better practicability, can effectively regulate the indoor environment through monitoring parameters such as room temperature, humidity and the like, but lacks the specific participation of building users and is difficult to satisfy the thermal comfort of individuals;

3) physiological detection method: the method directly captures the physiological parameters of the building user through the sensor to evaluate the individual heat sensation, and can better reflect the heat sensation of the building user. However, the invasive and semi-invasive measurement methods require sensors to be mounted on the human body. Thus, invasive and semi-invasive assays can meet the needs of laboratory studies, but are of poor utility. The non-invasive physiological detection method can remotely sense the individual heat sensation of a user, does not need to install a sensor on the human body, and is the development direction of computer vision technology (such as video amplification technology) and machine learning (such as deep learning).

Disclosure of Invention

The purpose of the invention is as follows: in order to overcome the defects in the prior art, the invention provides a human skin temperature detection method based on a deep convolutional network, which solves the problem of real-time detection of human skin temperature by using image capture equipment.

The technical scheme is as follows: in order to achieve the purpose, the invention adopts the technical scheme that:

a human skin temperature detection method based on a deep convolutional network comprises the following steps:

step 1, collecting a video of the skin of a subject changing along with time and a skin temperature true value obtained by a temperature sensor, sampling the video and extracting saturation to obtain a feature matrix, carrying out interpolation processing on the skin temperature true value to obtain a label, and processing to obtain a training set.

And 2, inputting the obtained training set into an improved ResNet50V2 network to train the improved ResNet50V2 network, so as to obtain the trained improved ResNet50V2 network. The improved ResNet50V2 network comprises a convolution layer one Conv1, a residual error module one Block A, a residual error module two Block B, a residual error module three Block C, a residual error module four Block D, a splicing module, a full connection layer FC, a global pooling layer one, a global pooling layer two, a global pooling layer three, a global pooling layer four and a global pooling layer five, wherein the convolution layer one Conv1, the residual error module one Block A, the residual error module two Block B, the residual error module three Block C, the residual error module four Block D, the global pooling layer five, the splicing module and the full connection layer FC are sequentially connected, the convolution layer one Conv1, the global pooling layer one and the splicing module are sequentially connected, the residual error module one Block A, the global pooling layer two and the splicing module are sequentially connected, the residual error module two Block B, the global pooling layer three and the splicing module are sequentially connected, and the residual error module three Block C, the global pooling layer four and the global pooling module are sequentially connected.

The feature matrix is subjected to convolution operation of a convolution layer Conv1 to obtain a feature matrix after convolution operation, the feature matrix after convolution operation is respectively input into a residual error module I Block A and a global pooling layer I, the feature matrix after first residual error processing is obtained through the residual error module I Block A, and the feature matrix after first global pooling is obtained through the global pooling layer I. And the feature matrix after the first residual error processing is respectively input into a second residual error module Block B and a second global pooling layer, the feature matrix after the second residual error processing is obtained through the second residual error module Block B, and the feature matrix after the second global pooling is obtained through the second global pooling layer. And the feature matrix after the second residual error processing is respectively input into a residual error module III Block C and a global pooling layer III, the feature matrix after the third residual error processing is obtained through the residual error module III Block C, and the feature matrix after the third global pooling is obtained through the global pooling layer III. And respectively inputting the feature matrix subjected to the third residual error processing into a residual error module four Block D and a global pooling layer four, obtaining the feature matrix subjected to the fourth residual error processing through the residual error module four Block D, and obtaining the feature matrix subjected to the fourth global pooling through the global pooling layer four. And inputting the feature matrix subjected to the fourth residual error processing into a fifth global pooling layer, and obtaining a fifth global pooled feature matrix through the fifth global pooling layer. And inputting the feature matrix of the first global pooling, the feature matrix of the second global pooling, the feature matrix of the third global pooling, the feature matrix of the fourth global pooling and the feature matrix of the fifth global pooling into a splicing module for splicing to obtain a spliced feature matrix. And inputting the spliced characteristic matrix into the full connection layer FC, and outputting a pre-measurement under the action of the full connection layer FC.

When the network is trained, the feature matrix is input into an improved ResNet50V2 network, then forward propagation is carried out, a current skin temperature predicted value is output, then comparison is carried out with a skin temperature true value, the loss of current iteration is calculated, backward propagation is carried out again, network parameters are updated, and after repeated iteration, the iteration number reaches a set value, or the accuracy reaches a set threshold value, the training is terminated. A well-trained improved ResNet50V2 network is obtained.

And 3, during detection, acquiring a video of the skin of a detector changing along with time, sampling the video, extracting saturation to obtain a characteristic matrix, processing to obtain a test set, and inputting the test set into a trained improved ResNet50V2 network to obtain the predicted skin temperature.

Preferably: the loss function is a mean square error function MSE:

wherein, y_iA magnitude of a true value is represented,

indicates the prediction value size, and N indicates the size of each batch.

Preferably: the temperature sensor is a contact temperature sensor.

Preferably: the method for sampling and extracting the saturation of the video to obtain the characteristic matrix comprises the following steps: the method comprises the steps of conducting down-sampling on a video according to a frame rate to obtain a skin RGB channel feature matrix of a hand interest domain, converting the feature matrix of the RGB channel into a picture of an HSV color gamut channel, and then extracting a saturation H channel to serve as the feature matrix.

Compared with the prior art, the invention has the following beneficial effects:

1. the improved ResNet50V2 network is adopted, so that the training time is greatly shortened, and the prediction accuracy of the human skin temperature in the weak stimulation environment is improved.

2. The real-time prediction is carried out by adopting a deep neural network model, so that the skin temperature of the human body can be detected in real time.

Drawings

Fig. 1 is a schematic diagram of the improved ResNet50V2 network of the present invention.

FIG. 2 is a schematic diagram of an embodiment of the present invention.

Detailed Description

The present invention is further illustrated by the following description in conjunction with the accompanying drawings and the specific embodiments, it is to be understood that these examples are given solely for the purpose of illustration and are not intended as a definition of the limits of the invention, since various equivalent modifications will occur to those skilled in the art upon reading the present invention and fall within the limits of the appended claims.

A method for detecting human skin temperature based on deep convolutional network, as shown in fig. 1 and 2, comprising the following steps:

step 1, collecting a video of the skin of a subject changing along with time and a skin temperature true value obtained by a temperature sensor, sampling the video and extracting saturation to obtain a feature matrix, carrying out interpolation processing on the skin temperature true value to obtain a label, and processing to obtain a training set. The temperature sensor is a contact temperature sensor iButton, or a non-contact temperature sensor.

The method for sampling and extracting the saturation of the video to obtain the characteristic matrix comprises the following steps: the method comprises the steps of conducting down-sampling on a video according to a frame rate to obtain a skin RGB channel feature matrix of a hand interest domain, converting the feature matrix of the RGB channel into a picture of an HSV color gamut channel, and then extracting a saturation H channel to serve as the feature matrix.

Step 1.1, under the condition of single variable control, an image capturing device is used for collecting images of the skin of the hand interest region of the testee, which change along with time under weak stimulation, and corresponding image frames are extracted from videos according to a certain frame rate and a certain sampling rate to form a picture data set.

And step 1.2, obtaining a temperature label corresponding to the picture sample. And opening the iButton to be placed at the corresponding position of the hand while acquiring the skin video, synchronously capturing with image capturing equipment, and interpolating according to the sampling rate of the sample to obtain a series of corresponding real value labels. And interpolating the temperature (one value per minute according to the specification standard) obtained by the iButton to obtain a corresponding true temperature value. This results in a complete data set

Step 2, improving the ResNet50V2 network: and inputting the obtained training set into the improved ResNet50V2 network to train the improved ResNet50V2 network, so as to obtain a trained improved ResNet50V2 network.

Step 2.1, training phase

Firstly, the traditional deep convolutional network may cause gradient disappearance due to a deep network structure, and further causes slow convergence rate, difficulty in training the network, and tendency of performance saturation or even rapid decline. Therefore, the improved ResNet50V2 network is adopted as a basic network to solve the problems. Secondly, aiming at the characteristic that human skin changes along with temperature, when the temperature is actually detected, low-level detail information of a picture is expected to be emphasized, the information is indispensable to the temperature detection, and after a feature matrix is subjected to feature extraction repeatedly for many times, high-level semantic information can be effectively extracted, but the low-level detail information may be lost. Therefore, feature information output by Conv1, Block A, Block B and Block C in ResNet50V2 is made into Global Pooling (GAP), and then spliced with original high-level feature information, so that a feature extraction path of a ResNet50V2 network is improved, and extraction of bottom-level detail information is hopefully enhanced. In the original ResNet50V2 network, the feature matrix is passed down layer by convolution, and the input image matrix needs to pass through convolution layer one Conv1, residual block one A, residual block two B, residual block three C and full connection layer linearly before obtaining the output value. Therefore, the change of the deep parameters of the convolutional layer depends on the shallow parameters, and the characteristic matrix of the original picture is deeper along with the number of layers, after the characteristics are repeatedly extracted for many times, the semantic information is more and more abstract, the neural network can learn higher-level association, the high-level semantic information of the original picture can be effectively extracted, but the influence of the low-level detail information of the picture on the final result is insufficient, and the contribution to the model is insufficient. Therefore, it is desirable to improve the contribution of the underlying features to the model. An improvement is made to the feature extraction path of the ResNet50V2 network. Feature information output by the convolutional layer one Conv1, the residual block one A, the residual block two B and the residual block three C in ResNet50V2 is extracted, Global Average Pooling (GAP) is respectively carried out, and four special feature maps of the convolutional layer one Conv1, the first residual block, the first two residual blocks and the first three residual blocks which only pass ResNet50V2 are respectively obtained, and the four special feature maps are additional four extraction paths. The feature matrix features which additionally pass through the four extraction paths are spliced with the high-level feature matrix which passes through the original network, so that the extraction of the bottom-level detail information is enhanced.

As shown in fig. 1, the improved ResNet50V2 network includes a convolutional layer one Conv1, a residual error module one BlockA, a residual error module two BlockB, a residual error module three BlockC, a residual error module four BlockD, a splicing module, a full connection layer FC, a global pooling layer one, a global pooling layer two, a global pooling layer three, a global pooling layer four, and a global pooling layer five, where the convolutional layer one Conv1, the residual error module one BlockA, the residual error module two BlockB, the residual error module three BlockC, the residual error module four BlockD, the global pooling layer five, the splicing module, and the full connection layer FC are sequentially connected, the convolutional layer one Conv1, the global pooling layer one, and the splicing module are sequentially connected, the residual error module one BlockA, the global pooling layer two, and the splicing module are sequentially connected, the residual error module two BlockB, the global pooling layer three, and the splicing module are sequentially connected.

And performing Global Average Pooling (GAP) on the detail feature information extracted from the convolution layer one Conv1, the residual layer one Block A, the residual layer two Block B and the residual layer three Block C in the original network. GAP has the advantages that the category class of the intermediate result can be better corresponded to the feature map of the convolutional layer firstly (each channel corresponds to one intermediate category, so that each feature map can be regarded as the category confidence map corresponding to the category); secondly it also reduces the number of parameters, GAP has no parameters, meaning that it can prevent overfitting at this level to some extent; and finally, global spatial information is integrated, so that the method is more robust to spatial transformation of an input picture. When extracting the detail feature information of convolution layer one Conv1, residual layer one BlockA, residual layer two BlockB and residual layer three BlockC, the extracted output feature matrix dimensions are 112 × 64, 56 × 256, 28 × 512 and 14 × 1024, respectively. The GAP calculates the average value of each channel of the four feature matrices to obtain a feature matrix with the size of 1 × 1 and the depth of the original depth, namely, the dimensions of the feature matrices of the four modules are 1 × 64, 1 × 256, 1 × 512 and 1 × 1024 respectively. The four feature matrices are all 1 × 1 in size with the feature matrix of original ResNet50V2 output 1 × 2048, and only the depths are different, so that the feature matrices can be spliced in the depth direction, the size of the obtained feature matrix is 1 × 1 (64+256+512+1024), the size of 1 × 1 can be omitted in the propagation process, the reaction is flatten (flattening) operation in tensoflow, a one-dimensional feature vector with the size of 3904 is obtained, and finally a predicted quantity is output through a layer of 1024 FCs (fully connected layers).

The improved ResNet50V2 network performs Global Average Pooling (GAP) on the detail features extracted by the shallow neural network, and the detail features and high-level information are stacked together to form a final feature vector.

An improved ResNet50V2 network structure is built on Tensorflow, during network training, a feature matrix is input into an improved ResNet50V2 network, forward propagation is carried out, a current skin temperature predicted value is output, then comparison is carried out with a skin temperature true value, loss of current iteration is calculated, backward propagation is carried out again, network parameters are updated, and after repeated iteration, the iteration number reaches a set value of 100, or the accuracy reaches a set threshold value, training is terminated. A well-trained improved ResNet50V2 network is obtained.

The loss function is a mean square error function MSE:

wherein, y_iA magnitude of a true value is represented,

indicates the prediction value size, and N indicates the size of each batch.

Step 2.2, test phase

The same subject is tested under the same condition on different dates to obtain the temperature prediction result of the neural network, and meanwhile the iButton is worn to obtain the temperature true value, and the prediction result is analyzed.

In the network testing stage, the constructed test set database is input into a stored network model to obtain a predicted value of the skin temperature, then the predicted value is compared with a corresponding true value, and the average percentage error MAPE (average precision) is calculated. The calculation formula for MAPE is as follows:

wherein, y_iA magnitude of a true value is represented,

indicating the prediction value size and N the size of the prediction data set.

Compared with MAPE obtained by training a VGG network and an unmodified ResNet50V2 network by taking MAPE calculated in the experiment as a judgment standard, the effect is obviously improved

The invention adopts the improved ResNet50V 2-based human skin temperature detection, and can predict the human skin temperature in real time by training the improved ResNet50V2 network, thereby having better accuracy.

The above description is only of the preferred embodiments of the present invention, and it should be noted that: it will be apparent to those skilled in the art that various modifications and adaptations can be made without departing from the principles of the invention and these are intended to be within the scope of the invention.

Claims

1. A human skin temperature detection method based on a deep convolutional network is characterized by comprising the following steps:

step 1, collecting a video of the skin of a subject changing along with time and a skin temperature true value obtained by a temperature sensor, sampling the video and extracting saturation to obtain a feature matrix, carrying out interpolation processing on the skin temperature true value to obtain a label, and processing to obtain a training set;

step 2, inputting the obtained training set into an improved ResNet50V2 network to train the improved ResNet50V2 network, so as to obtain a trained improved ResNet50V2 network; the improved ResNet50V2 network comprises a convolution layer one Conv1, a residual error module one Block A, a residual error module two Block B, a residual error module three Block C, a residual error module four Block D, a splicing module, a full connection layer FC, a global pooling layer one, a global pooling layer two, a global pooling layer three, a global pooling layer four and a global pooling layer five, wherein the convolution layer one Conv1, the residual error module one Block A, the residual error module two Block B, the residual error module three Block C, the residual error module four Block D, the global pooling layer five, the splicing module and the full connection layer FC are sequentially connected, the convolution layer one Conv1, the global pooling layer one and the splicing module are sequentially connected, the residual error module one Block A, the global pooling layer two and the splicing module are sequentially connected, the residual error module two Block B, the global pooling layer three and the splicing module are sequentially connected, and the residual error module three Block C, the global pooling layer four and the global pooling module are sequentially connected;

the feature matrix is subjected to convolution operation of a convolution layer Conv1 to obtain a feature matrix after convolution operation, the feature matrix after convolution operation is respectively input into a residual error module I Block A and a global pooling layer I, the feature matrix after first residual error processing is obtained through the residual error module I Block A, and the feature matrix after first global pooling is obtained through the global pooling layer I; the feature matrix after the first residual error processing is respectively input into a second residual error module Block B and a second global pooling layer, the feature matrix after the second residual error processing is obtained through the second residual error module Block B, and the feature matrix after the second global pooling is obtained through the second global pooling layer; inputting the feature matrix subjected to the second residual error processing into a residual error module III Block C and a global pooling layer III respectively, obtaining a feature matrix subjected to third residual error processing through the residual error module III Block C, and obtaining a feature matrix subjected to third global pooling through the global pooling layer III; inputting the feature matrix subjected to the third residual error processing into a residual error module four Block D and a global pooling layer four respectively, obtaining a feature matrix subjected to the fourth residual error processing through the residual error module four Block D, and obtaining a feature matrix subjected to the fourth global pooling through the global pooling layer four; inputting the feature matrix subjected to the fourth residual error processing into a fifth global pooling layer, and obtaining a fifth global pooling feature matrix through the fifth global pooling layer; inputting the feature matrix of the first global pooling, the feature matrix of the second global pooling, the feature matrix of the third global pooling, the feature matrix of the fourth global pooling and the feature matrix of the fifth global pooling into a splicing module for splicing to obtain a spliced feature matrix; inputting the spliced characteristic matrix into a full connection layer FC, and outputting a pre-measurement under the action of the full connection layer FC;

when the network is trained, inputting the feature matrix into an improved ResNet50V2 network, then performing forward propagation, outputting a current skin temperature predicted value, then comparing the predicted value with a skin temperature true value, calculating the loss of current iteration, performing backward propagation, updating network parameters, and terminating the training when the iteration times reach a set value or the accuracy reaches a set threshold value through repeated iteration; obtaining a well-trained improved ResNet50V2 network;

2. The method for detecting the skin temperature of the human body based on the deep convolutional network as claimed in claim 1, wherein: the loss function is a mean square error function MSE:

wherein, y_iA magnitude of a true value is represented,

indicates the prediction value size, and N indicates the size of each batch.

3. The method for detecting the skin temperature of the human body based on the deep convolutional network as claimed in claim 2, wherein: the temperature sensor is a contact temperature sensor.

4. The method for detecting the skin temperature of the human body based on the deep convolutional network as claimed in claim 3, wherein: the method for sampling and extracting the saturation of the video to obtain the characteristic matrix comprises the following steps: the method comprises the steps of conducting down-sampling on a video according to a frame rate to obtain a skin RGB channel feature matrix of a hand interest domain, converting the feature matrix of the RGB channel into a picture of an HSV color gamut channel, and then extracting a saturation H channel to serve as the feature matrix.