CN117218154A

CN117218154A - Patient attention degree judging method based on information entropy in meta-diagnosis room scene

Info

Publication number: CN117218154A
Application number: CN202311067313.4A
Authority: CN
Inventors: 杨文君; 文建全; 黄刊迪; 彭炜; 龙海; 文舸扬
Original assignee: Hunan Trasen Technology Co ltd
Current assignee: Hunan Trasen Technology Co ltd
Priority date: 2023-08-23
Filing date: 2023-08-23
Publication date: 2023-12-12

Abstract

The application provides a patient attention degree judging method based on information entropy in a meta-diagnosis room scene, which comprises the following steps: detecting a human face through a human face detection module, outputting the position of the human face and the coordinates of two eyes; capturing a binocular picture through coordinates, predicting eye pupil positions through a trained residual error network model, and calculating the vertical direction of eyeball staring according to the position coordinates, wherein the distance difference between the vertical direction and the center axis is an eye movement state value; calculating Euler angles of the head postures of the patients, and calculating Euler angle component standard deviation and change rate; substituting the eye movement state value, the standard deviation of the head gesture and the change rate into an information entropy formula, calculating to obtain an entropy value, and inverting the entropy value to obtain concentration; and comparing the concentration with a set threshold value, judging the concentration level of the patient, and giving corresponding prompts or suggestions. The application can objectively evaluate the mental state and the attention level of the patient, thereby providing more effective diagnosis and treatment reference for doctors. The remote medical service system is suitable for patients to carry out remote medical service in the meta-office scene.

Description

Patient attention degree judging method based on information entropy in meta-diagnosis room scene

Technical Field

The application relates to the technical field of remote intelligent diagnosis, in particular to a patient attention degree judging method based on information entropy in a meta-diagnosis room scene.

Background

Mental disorders have become one of the focus of social attention in modern medicine. As the population ages and the pace of life increases, various stress and emotion problems have become one of the daily challenges facing people. Mental state assessment is an important problem in the medical field, and can be used for assisting in diagnosing mental diseases such as depression, anxiety and the like, and concentration degree judgment is an important reference basis for mental state assessment of patients. Therefore, the study on the concentration degree judging technology of the patient is helpful for improving the medical quality, promoting the emotional communication between doctors and patients and assisting the psychiatrist to diagnose the patient better.

The meta-diagnosis room is a novel medical scene, utilizes high-tech equipment and information technology means to generate a virtual doctor-patient interaction diagnosis and treatment environment, and centralizes medical resources in one room to provide a medical scene of one-stop medical service. Meta-rooms typically include a variety of functional areas such as doctor workstations, patient beds, diagnostic and therapeutic equipment, and information technology equipment. A doctor can check medical record data of a patient, monitor physiological signals of the patient, diagnose and treat and the like on a workstation; the patient can receive diagnosis and treatment, monitor physiological signals and the like on the bed; the diagnosis and treatment equipment comprises various medical equipment such as electrocardiographs, sphygmomanometers, thermometers and the like; the information technology equipment comprises video monitoring equipment, remote medical equipment, intelligent diagnosis equipment and the like, and can realize functions of remote medical treatment, information sharing and the like between doctors and patients. The meta-office may also provide more personalized medical services that may enable tighter interaction between the patient and the physician. The design concept of the meta-diagnosis room is to improve the medical efficiency, reduce the medical cost and improve the medical experience, and simultaneously provide a new idea for the development of medical technology. Therefore, the patient concentration degree judging method based on the information entropy in the meta-diagnosis room scene can improve the accuracy and efficiency of mental state assessment and provide better medical services for patients.

In the meta-diagnosis room scene, based on the doctor-patient interaction process, whether the patient is focused or not can be judged based on the head gesture and eye movement of the patient so as to remind a doctor to carry out corresponding stimulation measures, finally, whether the patient suffers from psychological diseases such as depression, anxiety and the like or not is diagnosed in an auxiliary mode, and a more accurate evaluation basis is provided for the doctor. The patient concentration refers to a reaction generated by a patient to a specific event in the process of interaction with a psychiatrist, and the expression forms are the postures of the face and the head of the patient and the rotation of the eyeball. A specific event refers to a medical scale problem during the interaction of a doctor with a patient, or a doctor guiding a patient to perform a certain action, etc. The patient's facial and head pose and eye-balls are dynamically changing over a period of time, so the concept of entropy of information can be introduced to describe the changes in patient's head pose and eye state over that period of time.

In the existing studies, researchers have calculated concentration based on the head posture and concentration based on the eye-closing time period, etc., by using various technical means. However, in the existing method, the level considered in calculating the concentration can reflect only a part of the concentration information. For example, when calculating the concentration based on the head pose, only considering the pose of the head in the current image, failing to be linked to the previous state, the calculated concentration hardly reflects the actual concentration state of the patient for a certain period of time; based on the method of closing the eyes, when the head is not moving and the eyes are open, the eye watching position of the patient is difficult to judge. Therefore, the patient concentration degree distinguishing technology in the doctor-patient interaction process is further researched, the distinguishing accuracy and the real-time performance are improved, the medical quality is improved, the doctor-patient relationship is improved, and the development of medical health industry is promoted.

Disclosure of Invention

Aiming at the problems, the application provides a patient attention degree judging method based on information entropy in a meta-diagnosis room scene, which is used for assessing the mental state of special crowd, and solves the problem that in the inquiry process, doctors are difficult to judge whether patients concentrate or not; how to quantify the concentration of a patient, how to characterize the concentration of a patient by head pose estimation and eye movement state.

In order to achieve the above purpose, the application adopts the following technical scheme:

a patient attention degree judging method based on information entropy in a meta-diagnosis room scene comprises the following steps:

1) Detecting a human face in the video through a human face detection module, outputting the position of the human face and the coordinates of key points (eyes) of the human face;

2) Capturing a binocular picture through coordinates, predicting the position of eye pupils through a trained residual error network model, and calculating the vertical direction of eyeball staring according to the position coordinates, wherein the distance difference between the vertical direction and the center axis is an eye movement state value;

3) Calculating Euler angles of the head postures of the patients, and calculating standard deviation and change rate of components of each Euler angle;

4) Substituting the eye movement state value, the standard deviation of the head gesture and the change rate into an information entropy formula, calculating to obtain an entropy value, and inverting the entropy value to obtain the final concentration degree.

Compared with the prior art, the application has the beneficial effects that:

1. different from the traditional doctor-patient diagnosis mode, in the meta-diagnosis room scene, the patient with mental problems does not need to run long-distance to go to the hospital for queuing and diagnosis, and the doctor can take doctor's cost on a common computer with a camera or at a diagnosis and treatment place specially built locally.

2. Based on face detection, head pose estimation and eye movement tracking, the proposal quantifies the concentration of a patient by describing the head pose and eye change of the patient.

3. Introducing information entropy to describe the head posture change and eye change of a patient, and calculating the concentration of the patient through an entropy value formula.

As a further improvement of the scheme, the face detection module uses a Retinaface network model architecture, the network model adopts a pyramid structure, a plurality of feature layers are generated through convolution operation and up-sampling operation, and then classification, regression and face key point prediction are carried out on each feature layer.

The technical purpose of the improvement is as follows: how to accurately detect the human face in the video and output the position and key points of the human face; face detection is performed by using a Retinaface network model architecture, and the network model has the advantages of high precision, high efficiency, multitasking and the like. The application utilizes the pyramid structure to generate the feature layers with different scales, and then carries out classification, regression and face key point prediction on each feature layer. The classification task outputs classification categories of the face pixels; the regression task outputs the position of the face in the image; the face key point task outputs a plurality of numerical values which represent coordinates of the eyes, the nose tip, the corners of the mouth and the like.

As a further improvement of the above solution, the RetinaFace network model architecture uses res net-50 as a base network, where the base network includes five convolution blocks, each convolution block includes a number of residual units, each residual unit includes two convolution layers and one jump connection layer, and the jump connection layer adds the input feature map and the output feature map; the face detection module comprises a linear rectification function, wherein the linear rectification function is used for carrying out pixel correction processing on a plurality of image frames in a first image set, and replacing all negative values in a feature image in the first image set with 0, namely replacing all negative values in the plurality of image frames in the first image set with 0; the face detection module further comprises the following steps:

the characteristic layer Conv5_x is subjected to 1 multiplied by 1 convolution operation with the step length of 1 to form a characteristic layer M5;

the method comprises the steps of performing up-sampling operation on a feature layer M5, fusing the feature layer M5 with a feature layer subjected to 1X 1 convolution operation with a step length of 1 on a feature layer Conv4_x, and performing up-sampling operation on the fused feature layer once to form a feature layer M4;

after the up-sampling operation is carried out on the feature layer M4 once, the feature layer M4 is fused with the feature layer after the 1 multiplied by 1 convolution operation with the step length of 1 is adopted on the feature layer Conv3_x, and the feature layer M3 is formed after the up-sampling operation is carried out on the fused feature layer again;

after the feature layer M3 is subjected to one-time up-sampling operation, the feature layer M3 is fused with the feature layer subjected to 1 gamma 1 convolution operation with the step length of 1 on the feature layer Conv2_x, and the fused feature layer is subjected to one-time up-sampling operation to form a feature layer M2;

then, 3 gamma 3 convolution operation with the step length of 2 is adopted for the M2, M3, M4, M5 and Conv5_x characteristic layers to form effective characteristic layers P2, P3, P4, P5 and P6;

and finally, respectively carrying out classification, regression and face key point prediction on the effective feature layers P2, P3, P4, P5 and P6.

The technical purpose of the improvement is as follows: the performance and the precision of the face detection model are improved; resNet-50 is used as a basic network, and a pyramid structure is adopted to generate a plurality of scale feature layers, so that the performance and the accuracy of the face detection model are improved; performing pixel correction processing on the image frame by using a linear rectification function, so as to eliminate the influence of a negative value on a face detection result; and extracting features of images in the video by taking ResNet-50 as a basic network, generating a plurality of feature layers by utilizing a pyramid structure, and finally carrying out classification, regression and face key point prediction on each feature layer. ResNet-50 includes five convolution blocks, each convolution block including a number of residual units, each residual unit including two convolution layers and a skip connection layer that adds the input feature map to the output feature map, thereby avoiding gradient elimination and overfitting problems. The pyramid structure generates feature layers with different scales through convolution operation and up-sampling operation, so that the pyramid structure is suitable for faces with different sizes. The linear rectification function replaces all negative values in the image frame with 0, so that the influence of the negative values on the face detection result is eliminated.

As a further improvement of the above solution, the residual network model is a deep convolutional neural network comprising a plurality of residual units, each residual unit comprising two convolutional layers and a jump connection layer, the jump connection layer adding the input profile to the output profile.

The technical purpose of the improvement is as follows: the positions of the eye pupils are accurately detected in the binocular picture; the residual error network model is used for eye pupil detection, and the method has the advantages of deep learning, residual error learning, jump connection and the like; and performing feature extraction on the binocular picture by using a deep convolutional neural network, and then performing feature enhancement by using a residual unit, wherein the residual unit adds the input feature map and the output feature map through a jump connection layer, so that the problems of gradient elimination and over-fitting are avoided.

As a further improvement of the above, the euler angle component includes a roll angle, a pitch angle, and a yaw angle; wherein the rolling angle represents the angle through which the symmetry plane of the head turns around the central axis of the head; the pitch angle represents an included angle between a central axis of the head and a horizontal plane; the yaw angle represents the angle between the projection of the medial axis of the head on the horizontal plane and the ground axis.

The technical purpose of the improvement is as follows: the method is simple and easy to understand, and can effectively represent the rotation motion of the head; three Euler angle components are calculated by utilizing the coordinate system of the rigid body of the head, and the rolling angle, the pitch angle and the yaw angle of the head are respectively represented. The rolling angle represents the angle through which the symmetry plane of the head rotates around the central axis of the head; the pitch angle represents an included angle between a central axis of the head and a horizontal plane; the yaw angle represents the angle between the projection of the medial axis of the head on the horizontal plane and the ground axis.

As a further improvement of the above-described scheme, the standard deviation represents the degree of dispersion of the data; the rate of change represents an average of the amount of change in data between adjacent acquisition points.

The technical purpose of the improvement is as follows: the degree and the frequency of the change of the head posture are measured in a mathematical way; the standard deviation and the change rate are used for measuring the change degree and the frequency of the head gesture, so that the method is simple and effective, and can reflect the fluctuation condition of data; and calculating the standard deviation and the change rate of each Euler angle component in the acquisition time by utilizing the Euler angle data of the head gesture acquired in a period of time. The standard deviation represents the degree of dispersion of the data, and the larger the standard deviation is, the larger the head posture change is; the rate of change represents the average value of the amount of change in data between adjacent acquisition points, and a larger value represents a faster change in head pose.

As a further improvement of the above scheme, the information entropy formula is:

attention＝abs(-c ₁ ∑(p*log(p))-c ₂ m)

wherein c ₁ 、c ₂ Is an adjustment coefficient; p represents the degree of variation of head pose and the probability value of frequency assignment; m represents an eye movement state value; abs () is a function taking absolute value.

The technical purpose of the improvement is as follows: realizing mathematical evaluation of the concentration of the patient; the concentration degree of the patient is estimated by using an information entropy formula, and the formula can comprehensively consider two factors of the head posture and the eye movement state, so that the concentration degree of the patient is reflected more accuratelyLevel; the uncertainty of the data is described by using the physical concept of information entropy, and when the entropy value is low, the uncertainty indicates that the attention of the patient is highly concentrated, and otherwise, the uncertainty indicates that the attention of the patient is distracted. In the information entropy formula, c ₁ 、c ₂ Is an adjusting coefficient used for adjusting the range and the weight of the concentration degree score according to the actual situation; p represents the variation degree of the head gesture and the probability value of frequency allocation, and can be determined according to the distribution condition of the data; m represents an eye movement state value, and can be calculated according to the distance difference between the vertical direction and the central axis of eyeball fixation; abs () is a function taking absolute value for converting entropy into a positive number.

As a further improvement of the above-described scheme, the eye movement state value m is calculated according to the following formula:

wherein d is ₁ 、d ₂ Respectively representing the distance between the vertical direction of the gaze of the left eye and the right eye and the central axis; l represents the horizontal distance between the eyes.

The technical purpose of the improvement is as follows: the distance difference between the vertical direction and the center axis of eyeball fixation is calculated in a mathematical mode; the eye movement state value m is calculated by using the formula, and the formula can simply and effectively reflect the deflection degree of eyeballs, so that the calculation of concentration degree is influenced; the face detection module is used for obtaining the coordinates of two eyes, then the residual error network model is used for predicting the positions of the pupils, and d is calculated according to the position coordinates ₁ 、d ₂ And l, substituting the formula to calculate m; when m is larger, the difference between the vertical direction and the central axis of eyeball fixation is larger, and conversely, the difference is smaller.

As a further improvement of the scheme, the concentration degree has a value range of [0, + ]; when the concentration is higher, the entropy is lower, and the absolute value is higher; the lower the concentration, the higher the entropy value, and the lower the absolute value.

The technical purpose of the improvement is as follows: how to numerically represent the concentration level of a patient; the concentration level of the patient is represented by using the concentration value, and the concentration level of the patient can be intuitively reflected by the concentration value, so that a doctor can conveniently diagnose and judge; and calculating an entropy value by using an information entropy formula, and then inverting the entropy value to obtain the final concentration degree. Because the information entropy is a non-negative number, the concentration range is [0, ++ ] in value. When the concentration is higher, the entropy is lower, and the absolute value is higher; the lower the concentration, the higher the entropy value, and the lower the absolute value.

As a further improvement of the above, the threshold of concentration is determined according to different meta-office scenarios and patient types; when the concentration is lower than the threshold value, prompting the patient that the attention is not concentrated and the state needs to be adjusted; when the concentration is higher than the threshold, the patient is considered to be concentrated, and diagnosis and treatment can be normally performed.

The technical purpose of the improvement is as follows: reasonable concentration degree discrimination criteria are set according to different meta-diagnosis room scenes and patient types; the threshold value of the concentration degree is used for setting the concentration degree judgment standard, and the method can be flexibly adjusted according to different meta-diagnosis room scenes and patient types, so that the rationality and the effectiveness of concentration degree judgment are improved; the threshold of concentration is used to compare with the concentration of the patient to determine the patient's concentration level. When the concentration is lower than the threshold value, prompting the patient that the attention is not concentrated and the state needs to be adjusted; when the concentration is higher than the threshold, the patient is considered to be concentrated, and diagnosis and treatment can be normally performed. The threshold level of concentration is determined based on different meta-office scenarios and patient types, for example taking into account personal characteristics of the patient's age, sex, education level, mental state, etc.

As a further improvement of the above scheme, the meta-office scenario refers to a virtual office environment that implements remote medical services through internet technology; the patient type refers to personal characteristics such as age, sex, education level, psychological state and the like of the patient.

The technical purpose of the improvement is as follows: the virtual consulting room environment for realizing the remote medical service by utilizing the internet technology is called meta-consulting room scene. In the meta-office scenario, doctors and patients can communicate by video, audio, etc. For patients with mental problems, they are classified into different patient types according to their personal characteristics of age, sex, education level, mental state, etc. Different patient types may have different concentration levels and discriminant criteria.

Drawings

Fig. 1 is an overall frame of the present solution.

Fig. 2 is a RetinaFace face detection network architecture.

Fig. 3 is a schematic view of the head pose euler angles.

Fig. 4 is a schematic diagram of calculation of eye movement state values.

Detailed Description

In order that those skilled in the art will better understand the technical solutions, the following detailed description of the technical solutions is provided with reference to examples, which are exemplary and explanatory only and should not be construed as limiting the scope of the application in any way.

As shown in fig. 1-4, a patient attention degree judging method based on information entropy in a meta-diagnosis room scene has the following technical scheme:

the equipment requirements are as follows: the meta-office client requires basic equipment such as a terminal, a camera, audio and the like which can be connected with the cloud server.

The scheme integral framework comprises a face detection module, a head posture estimation module, an eye movement tracking module and a concentration degree discrimination module.

Face detection: the proposal adopts a face detection model and a RetinaFace network model architecture; wherein, the RetinaFace network model architecture adopts a pyramid structure; the face detection model comprises a linear rectification function, wherein the linear rectification function is used for carrying out pixel correction processing on a plurality of image frames in a first image set, and replacing all negative values in a feature image in the first image set with 0, namely replacing all negative values in the plurality of image frames in the first image set with 0;

forming feature graphs Conv1_x, conv2_x, conv3_x, conv4_x and Conv5_x through a bottom-up convolution operation;

then, 3 gamma 3 convolution operation with the step length of 2 is adopted for the M2, M3, M4, M5 and Conv5_x characteristic layers to form effective characteristic layers P2, P3, P4, P5 and P6, and finally classification, regression and face key point prediction are carried out on the effective characteristic layers P2, P3, P4, P5 and P6 respectively;

wherein the classification task outputs classification categories (cls) of face pixels; the regression task outputs the position (box) of the face in the image, comprising four vertex coordinates; the application adopts the PFLD face key point recognition algorithm to predict the face key points and output a plurality of values, in the application, the PFLD face key point recognition algorithm is adopted to predict the face key points (landmark) and output ten values, and the values are respectively positioned at the coordinates of eyes, nose tips and mouth angles.

Head pose estimation (euler angle): firstly, establishing a coordinate system for a head rigid body of a person in an image, and respectively calculating three Euler angle components (rolling angle, pitch angle and yaw angle) of the head rigid body, wherein the rolling angle (Roll) represents the angle of a head symmetrical plane rotating around a head center axis, and the right Roll is positive; the Pitch angle (Pitch) represents the included angle between the center axis of the head and the horizontal plane, and the head lifting is positive; the Yaw angle (Yaw) represents the angle between the projection of the head center axis on the horizontal plane and the ground axis, and the head right deviation is positive.

Eye movement tracking: as shown in fig. 4, coordinates of key points (eyes) of a face of a patient can be obtained through a face detection module, independent images of the two eyes (left eye and right eye) are intercepted from an original image through the coordinates, then the two eyes are detected through a trained residual error network model, positions of the eyes are obtained, and accordingly distances between the two eyes and the center point of the eyes can be calculated, distance values between the two eyes and the center point of the eyes are divided by total length of the eyes respectively, and then the total length of the eyes is divided by 2 after normalization, so that a final eye movement state value is obtained.

The eye movement state value m is calculated according to the following formula:

Concentration degree discrimination: in the meta-office scenario, the proposal adopts information entropy to judge the concentration degree of the patient. When the entropy value is relatively low, it indicates that the patient is highly focused, whereas it indicates that the patient is distracted.

Specifically, the present proposal assigns entropy probability values to the 3 behaviors, respectively, which are the degree of change, frequency, and eye movement state of the head posture of the patient.

The standard deviation calculation method of the head posture comprises the following steps: first, head pose euler angle data is collected over a period of time, for example, once per second. Second, for each euler angle component, the standard deviation of that component over the acquisition time is calculated. The standard deviation indicates the degree of dispersion of the data, and the larger the standard deviation is, the larger the head posture change is. Finally, the standard deviations of the three euler angle components can be averaged to obtain the average standard deviation of the head gesture as an index for measuring the variation degree.

The average change rate calculation method of the head posture comprises the following steps: first, head pose euler angle data is collected over a period of time. And secondly, calculating the difference value between adjacent acquisition points for each Euler angle component to obtain the variation of each component. And finally, the variation of each component is averaged to obtain the average variation rate of the head gesture as an index for measuring the variation frequency.

Eye movement state calculation method: firstly, detecting eyeballs in an image to obtain coordinates of the eyeballs, and secondly, calculating the vertical direction of the staring eyeballs according to the coordinates, wherein the distance difference between the vertical direction and the center axis is the eye movement state value.

After the head posture change rate, the frequency and the eye movement data of the patient are collected, the maximum value and minimum value normalization method is adopted to normalize the head posture change rate, the frequency and the eye movement state data of the patient, and the value is ensured to be between 0 and 1. Finally, the patient concentration is calculated using the following formula:

attention＝abs(-c ₁ ∑(p*log(p))-c ₂ m)

the parameters c1 and c2 are adjusting coefficients, and are used for adjusting the range and the weight of the concentration score according to actual conditions. p represents the degree of change in the patient's head pose and the probability value of the frequency assignment. m represents the patient eye movement state value. abs () is a function taking absolute value. The higher the concentration, the lower the entropy and the higher the absolute value.

The specific implementation mode is as follows:

first, a patient clicks an AI auxiliary diagnosis column on a client, and clicks to agree to start a camera, and then the cloud server starts an AI auxiliary diagnosis algorithm which comprises a patient concentration judgment method.

And secondly, a face detection module detects a face in the video, outputs the position of the face and coordinates of key points (eyes) of the face.

Thirdly, capturing a binocular picture through coordinates, predicting the positions of eye pupils through a trained residual error network, and calculating the vertical direction of eyeball staring according to the position coordinates, wherein the distance difference between the vertical direction and the center axis is the eye movement state value.

And fourthly, calculating Euler angles of the head postures of the patient, and calculating standard deviation and change rate of components of the Euler angles.

Fifthly, substituting the eye movement state value, the standard deviation of the head gesture and the change rate into an information entropy formula, calculating to obtain an entropy value, and reversing the entropy value to obtain the final concentration degree.

Assume that a patient has undergone a video communication in a meta-office scenario for a period of time, with head pose and eye movement as shown in fig. 2. The probability value of the variation degree of the head gesture and the frequency allocation is p ₁ ＝0.2，p ₂ ＝0.3，p ₃ =0.5, corresponding to the low, medium, and high three grades, respectively; the eye movement state value is m=0.4, which means that the distance difference between the vertical direction and the central axis of the eye gaze is smaller; taking the adjustment coefficient c ₁ ＝c ₂ =1, the patient's concentration is:

attention＝abs(-1×(0.2×log0.2+0.3×log0.3+0.5×log0.5)-1×0.4)

attention≈1.37

assuming that the threshold of the concentration is 0.8, the concentration of the patient is higher than the threshold, and the patient is considered to be concentrated in attention, so that diagnosis and treatment can be performed normally.

Example 2: suppose another patient had video communication in a meta-office scenario for a period of time with head pose and eye movement as shown in fig. 3. The probability value of the variation degree of the head gesture and the frequency allocation is p ₁ ＝0.5，p ₂ ＝0.3，p ₃ =0.2, corresponding to the low, medium, and high three grades, respectively; the eye movement state value is m=0.8, which indicates that the distance difference between the vertical direction and the central axis of the eye gaze is larger; taking the adjustment coefficient c ₁ ＝c ₂ =1, the patient's concentration is:

attention＝abs(-1×(0.5×log0.5+0.3×log0.3+0.2×log0.2)-1×0.8)

attention≈0.49

assuming that the threshold of concentration is 0.8, the patient's concentration is below the threshold, suggesting that the patient is not focused and needs to adjust the state.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. The principles and embodiments of the present application are described herein by applying specific examples, and the above examples are only used to help understand the method and core idea of the present application. The foregoing is merely illustrative of the preferred embodiments of this application, and it is noted that there is objectively no limit to the specific structure disclosed herein, since numerous modifications, adaptations and variations can be made by those skilled in the art without departing from the principles of the application, and the above-described features can be combined in any suitable manner; such modifications, variations and combinations, or the direct application of the concepts and aspects of the application in other applications without modification, are intended to be within the scope of the present application.

Claims

1. The patient attention degree judging method based on the information entropy in the meta-diagnosis room scene is characterized by comprising the following steps of:

1) Detecting a human face in the video through a human face detection module, outputting the position of the human face and the coordinates of two eyes of a key point of the human face;

2. The method of claim 1, wherein the face detection module uses a RetinaFace network model architecture that uses a pyramid structure to generate a plurality of feature layers through convolution operations and upsampling operations, and then performs classification, regression, and face key point prediction on each feature layer.

3. The method according to claim 2, characterized in that the RetinaFace network model architecture uses res net-50 as a base network comprising five convolution blocks, each convolution block comprising several residual units, each residual unit comprising two convolution layers and one jump connection layer, the jump connection layer adding the input feature map to the output feature map; the face detection module comprises a linear rectification function, wherein the linear rectification function is used for carrying out pixel correction processing on a plurality of image frames in a first image set, and replacing all negative values in a feature image in the first image set with 0, namely replacing all negative values in the plurality of image frames in the first image set with 0; the face detection module comprises the following steps:

1) The characteristic layer Conv5_x is subjected to 1 multiplied by 1 convolution operation with the step length of 1 to form a characteristic layer M5;

2) The method comprises the steps of performing up-sampling operation on a feature layer M5, fusing the feature layer M5 with a feature layer subjected to 1X 1 convolution operation with a step length of 1 on a feature layer Conv4_x, and performing up-sampling operation on the fused feature layer once to form a feature layer M4;

3) After the up-sampling operation is carried out on the feature layer M4 once, the feature layer M4 is fused with the feature layer after the 1 multiplied by 1 convolution operation with the step length of 1 is adopted on the feature layer Conv3_x, and the feature layer M3 is formed after the up-sampling operation is carried out on the fused feature layer again;

4) After the feature layer M3 is subjected to one-time up-sampling operation, the feature layer M3 is fused with the feature layer subjected to 1 gamma 1 convolution operation with the step length of 1 on the feature layer Conv2_x, and the fused feature layer is subjected to one-time up-sampling operation to form a feature layer M2;

5) Then, 3 gamma 3 convolution operation with the step length of 2 is adopted for the M2, M3, M4, M5 and Conv5_x characteristic layers to form effective characteristic layers P2, P3, P4, P5 and P6;

6) And finally, respectively carrying out classification, regression and face key point prediction on the effective feature layers P2, P3, P4, P5 and P6.

4. The method of claim 1, wherein the residual network model is a deep convolutional neural network comprising a plurality of residual units, each residual unit comprising two convolutional layers and one skip connect layer, the skip connect layer adding the input profile to the output profile.

5. The method of claim 1, wherein the euler angle component comprises a roll angle, a pitch angle, and a yaw angle; wherein the rolling angle represents the angle through which the symmetry plane of the head turns around the central axis of the head; the pitch angle represents an included angle between a central axis of the head and a horizontal plane; the yaw angle represents the angle between the projection of the medial axis of the head on the horizontal plane and the ground axis.

6. The method of claim 5, wherein the standard deviation represents a degree of discretization of the data; the rate of change represents an average of the amount of change in data between adjacent acquisition points.

7. The method of claim 1, wherein the information entropy formula is:

attention＝abs(-c ₁ ∑(p*log(p))-c ₂ m)

8. The method of claim 7, wherein the eye movement state value m is calculated according to the following formula:

9. The method of claim 1, wherein the step of determining the position of the substrate comprises, the value range of the concentration degree is [0, + ] infinity; when the concentration is higher, the entropy is lower, and the absolute value is higher; the lower the concentration, the higher the entropy value, and the lower the absolute value.

10. The method of claim 9, wherein the threshold of concentration is determined from different meta-office scenarios and patient types; when the concentration is lower than the threshold value, prompting the patient that the attention is not concentrated and the state needs to be adjusted; when the concentration is higher than the threshold, the patient is considered to be concentrated, and diagnosis and treatment can be normally performed.