CN117975526A - Deep learning-based anesthesia reviving facial feature prediction method and device - Google Patents

Deep learning-based anesthesia reviving facial feature prediction method and device Download PDF

Info

Publication number
CN117975526A
CN117975526A CN202410017042.XA CN202410017042A CN117975526A CN 117975526 A CN117975526 A CN 117975526A CN 202410017042 A CN202410017042 A CN 202410017042A CN 117975526 A CN117975526 A CN 117975526A
Authority
CN
China
Prior art keywords
model
video data
facial
sub
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202410017042.XA
Other languages
Chinese (zh)
Inventor
李俏敏
王涵
刘德昭
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fifth Affiliated Hospital of Sun Yat Sen University
Original Assignee
Fifth Affiliated Hospital of Sun Yat Sen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fifth Affiliated Hospital of Sun Yat Sen University filed Critical Fifth Affiliated Hospital of Sun Yat Sen University
Priority to CN202410017042.XA priority Critical patent/CN117975526A/en
Publication of CN117975526A publication Critical patent/CN117975526A/en
Pending legal-status Critical Current

Links

Landscapes

  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The invention provides a deep learning-based anesthesia reviving facial feature prediction method and device, wherein the method comprises the following steps: preprocessing sample face video data; training and testing the sample face video data; inputting face video data to be detected into a face sign detection model, carrying out key target detection through a YOLO8 sub-model, carrying out feature importance analysis through a first SHAP sub-model, carrying out parameter adjustment on the YOLO8 sub-model through a first Meta Q-Learn based on the result of the feature importance analysis, and carrying out target detection on the face video data through the parameter-adjusted YOLO8 sub-model to obtain target key sign information of an object to be detected; inputting the target key mark information into an image feature extraction model to obtain target features; and carrying out multi-dimensional panel time sequence prediction on the target characteristics through a time sequence prediction model to obtain a prediction category. Thus accurately predicting facial features.

Description

Deep learning-based anesthesia reviving facial feature prediction method and device
Technical Field
The invention relates to the technical field of anesthesia reviving facial feature prediction based on deep learning, in particular to a anesthesia reviving facial feature prediction method and device based on deep learning.
Background
Many people often have specific complications in the postoperative anesthesia and recovery stage, and at present, people can only observe facial expressions of the patients by means of manual work so as to timely cope with the situations. Very labor-intensive, and less accurate observations are possible for a less experienced person.
Disclosure of Invention
The present invention aims to solve at least one of the technical problems existing in the prior art. Therefore, the invention provides a deep learning-based anesthesia reviving facial feature prediction method and device, which can accurately predict facial features.
In a first aspect, an embodiment of the present invention provides a method for predicting a facial feature of anesthesia recovery based on deep learning, including:
acquiring sample face video data and preprocessing the sample face video data;
Dividing the preprocessed sample face video data to obtain a first training set and a first testing set, inputting the first training set into a preset face marker detection model for training, inputting the first testing set into the trained face marker detection model for testing to obtain a face marker detection model for finishing training, wherein the face marker detection module comprises a YOLO8 sub-model, a first SHAP sub-model and a first Meta Q-Learning sub-model;
Acquiring face video data to be detected of an object to be detected;
Inputting the face video data to be detected into the facial marker detection model, carrying out key target detection on the face video data to be detected through the YOLO8 sub-model, inputting the face video data subjected to key target detection into the first SHAP sub-model for feature importance analysis, carrying out parameter adjustment on the YOLO8 sub-model based on a result of feature importance analysis through the first Meta Q-Learning, and carrying out target detection on the face video data through the YOLO8 sub-model after parameter adjustment to obtain target key marker information of the object to be detected;
inputting the target key mark information into a preset image feature extraction model to obtain target features;
And carrying out multi-dimensional panel time sequence prediction on the target characteristics through a preset time sequence prediction model to obtain a prediction category.
In some embodiments of the present invention, the acquiring sample face video data and preprocessing the sample face video data includes:
Acquiring the sample face video data, and extracting a plurality of key frames from the sample face video data to obtain a plurality of sample face picture data;
And acquiring the same object from the sample face picture as a reference point, and performing alignment processing on a plurality of sample face picture data according to the reference point, wherein the reference point is an object which does not change.
In some embodiments of the present invention, the inputting the face video data to be detected into the facial marker detection model, performing key target detection on the face video data to be detected through the YOLO8 sub-model, inputting the face video data after key target detection into the first SHAP sub-model to perform feature importance analysis, performing parameter tuning on the YOLO8 sub-model based on a result of the feature importance analysis through the first Meta Q-Learning, performing target detection on the face video data through the YOLO8 sub-model after parameter tuning, and obtaining target key marker information of the object to be detected, including:
Performing key target detection on the face video data to be detected through the YOLO8 sub-model to obtain a boundary box of an object of interest;
Performing feature importance analysis on the video frame of the face video data to be detected and the interested object in the boundary frame through the first SHAP sub-model to obtain important indexes of different index features in the video frame;
Acquiring the index features of which the important indexes are higher than a preset threshold in the video frame, judging whether the index features exist in the boundary frame, if not, determining the index features as missing information, and if so, determining the index features as detection information;
inputting the detection information and the missing information into the Meta Q-Learning sub-model, and carrying out parameter adjustment on the YOLO8 sub-model according to the detection information and the missing information;
And carrying out key target detection on the face video data to be detected through the YOLO8 submodel after parameter adjustment to obtain the target key mark information.
In some embodiments of the present invention, the inputting the target key mark information into a preset image feature extraction model to obtain a target feature further includes:
Acquiring sample key mark information;
Dividing the sample key mark information to obtain a second training set and a second testing set, inputting the second training set into a preset image feature extraction model for training, inputting the second testing set into the trained image feature extraction model for testing to obtain a trained image feature extraction model, wherein the image feature extraction model comprises a CNN submodel, a second SHAP submodel and a second Meta Q-Learning submodel;
inputting the target key mark information into the image feature extraction model, carrying out image recognition on the target key mark information through the CNN submodel, inputting the facial video data after image recognition into the second SHAP submodel for feature importance analysis, carrying out parameter tuning on the CNN submodel based on a result of feature importance analysis through the second Meta Q-Learning submodel, and carrying out image recognition on the target key mark information through the CNN submodel after parameter tuning to obtain the target feature of the object to be detected.
In some embodiments of the present invention, after the dividing the preprocessed sample face video data to obtain a first training set and a first test set, inputting the first training set to a preset face-marker detection model for training, inputting the first test set to the trained face-marker detection model for testing, obtaining the trained face-marker detection model includes:
cross-verifying the trained facial marker detection model;
judging whether the facial marker detection model is effective according to the accuracy, recall and F1 score after cross-validation.
In some embodiments of the present invention, after the multi-dimensional panel time series prediction is performed on the target feature by using a preset time series prediction model, the method includes:
And sending an alarm when the predicted category is one category in a preset early warning category set.
In some embodiments of the present invention, the time series prediction model includes an RNN sub-model, a TimeSformer sub-model, and a third Meta Q-Learning sub-model, and the performing multi-dimensional panel time series prediction on the target feature by using a preset time series prediction model, to obtain a prediction class includes:
Performing time sequence prediction on the target features through the RNN submodel, and inputting a time sequence prediction result into the TimeSformer submodel;
correlating spatio-temporal data in the time series prediction result through the TimeSformer submodels;
And carrying out parameter adjustment on the RNN submodel based on a result of space-time data association through the third Meta Q-Learning submodel, and carrying out time sequence prediction on the long-term dependent sequence data in the target key mark information through the RNN submodel after parameter adjustment to obtain the prediction category.
In a second aspect, an embodiment of the present invention provides a deep learning-based anesthesia wakeup facial feature prediction device, including at least one control processor and a memory communicatively coupled to the at least one control processor; the memory stores instructions executable by the at least one control processor to enable the at least one control processor to perform the deep learning based method of anesthesia wakeup facial feature prediction as described in the first aspect above.
In a third aspect, an embodiment of the present invention provides an electronic device, including the anesthesia reviving facial feature prediction apparatus based on deep learning according to the second aspect.
In a fourth aspect, embodiments of the present invention provide a computer-readable storage medium storing computer-executable instructions for performing the deep learning-based anesthesia wake facial feature prediction method according to the first aspect.
The anesthesia reviving facial feature prediction method based on deep learning provided by the embodiment of the invention has at least the following beneficial effects: acquiring sample face video data and preprocessing the sample face video data; dividing the preprocessed sample face video data to obtain a first training set and a first testing set, inputting the first training set into a preset face marker detection model for training, inputting the first testing set into the trained face marker detection model for testing to obtain a face marker detection model for finishing training, wherein the face marker detection module comprises a YOLO8 sub-model, a first SHAP sub-model and a first Meta Q-Learning sub-model; acquiring face video data to be detected of an object to be detected; inputting the face video data to be detected into the facial marker detection model, carrying out key target detection on the face video data to be detected through the YOLO8 sub-model, inputting the face video data subjected to key target detection into the first SHAP sub-model for feature importance analysis, carrying out parameter adjustment on the YOLO8 sub-model based on a result of feature importance analysis through the first Meta Q-Learning, and carrying out target detection on the face video data through the YOLO8 sub-model after parameter adjustment to obtain target key marker information of the object to be detected; inputting the target key mark information into a preset image feature extraction model to obtain target features; and carrying out multi-dimensional panel time sequence prediction on the target key mark information through a preset time sequence prediction model to obtain a prediction category. The facial feature expression can be comprehensively judged by identifying different parts of the face through the facial marker detection model, namely, key target detection is carried out. In addition, by adopting the element learning dynamic parameter adjustment of YOLO8+SHAP+META Q-LEARNING, the dynamic parameter adjustment of the spoken language interpretability and the model accuracy can be realized, and the model accuracy is improved.
Drawings
FIG. 1 is a flow chart of a deep learning based method for predicting facial features of anesthesia wakeup according to an embodiment of the present invention;
Fig. 2 is a schematic diagram of a specific application scenario of an anesthesia wakeup facial feature prediction method based on deep learning according to another embodiment of the present invention;
Fig. 3 is a block diagram of an apparatus for predicting a facial feature of anesthesia recovery based on deep learning according to another embodiment of the present invention.
Detailed Description
Embodiments of the present invention are described in detail below, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to like or similar elements or elements having like or similar functions throughout. The embodiments described below by referring to the drawings are illustrative only and are not to be construed as limiting the invention.
In the description of the present invention, it should be understood that references to orientation descriptions such as upper, lower, front, rear, left, right, etc. are based on the orientation or positional relationship shown in the drawings, are merely for convenience of description of the present invention and to simplify the description, and do not indicate or imply that the apparatus or elements referred to must have a particular orientation, be constructed and operated in a particular orientation, and thus should not be construed as limiting the present invention.
In the description of the present invention, a number means one or more, a number means two or more, and greater than, less than, exceeding, etc. are understood to not include the present number, and above, below, within, etc. are understood to include the present number. The description of the first and second is for the purpose of distinguishing between technical features only and should not be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated or implicitly indicating the precedence of the technical features indicated.
In the description of the present invention, unless explicitly defined otherwise, terms such as arrangement, installation, connection, etc. should be construed broadly and the specific meaning of the terms in the present invention can be reasonably determined by a person skilled in the art in combination with the specific contents of the technical scheme.
The control method according to the embodiment of the present invention is further described below based on the drawings.
Referring to fig. 1, fig. 1 is a flowchart of a deep learning-based anesthesia wakeup facial feature prediction method according to an embodiment of the present invention, where the deep learning-based anesthesia wakeup facial feature prediction method includes, but is not limited to, the following steps:
Step S100, acquiring sample face video data and preprocessing the sample face video data;
step S200, dividing the preprocessed sample face video data to obtain a first training set and a first testing set, inputting the first training set into a preset face marker detection model for training, inputting the first testing set into the trained face marker detection model for testing to obtain a trained face marker detection model, wherein the face marker detection module comprises a YOLO8 sub-model, a first SHAP sub-model and a first Meta Q-Learning sub-model;
step S300, obtaining face video data to be detected of an object to be detected;
step S400, inputting the face video data to be detected into a facial marker detection model, carrying out key target detection on the face video data to be detected through a YOLO8 sub-model, inputting the face video data subjected to key target detection into a first SHAP sub-model for feature importance analysis, carrying out parameter adjustment on the YOLO8 sub-model based on a result of the feature importance analysis through a first Meta Q-Learning, and carrying out target detection on the face video data through the YOLO8 sub-model after parameter adjustment to obtain target key marker information of an object to be detected;
Step S500, inputting target key mark information into a preset image feature extraction model to obtain target features;
And S600, carrying out multi-dimensional panel time sequence prediction on the target characteristics through a preset time sequence prediction model to obtain a prediction category.
It should be noted that, the sample facial video data mainly refers to facial video data of postoperative anesthesia awakening.
In the training process of the facial marker detection model by using the sample facial video data, the facial video data is analyzed frame by using a visual model, and facial features in the sample video data are identified and detected, wherein the visual model can be a neural network model such as a CNN model, and the visual model is not limited herein. The four phases of anesthesia wakeup are time segment marked [0,4], respectively: 0: anesthesia does not wake up; 1: gradual resumption of sensory and motor function by a reduction in depth of anesthesia; 2: spontaneous respiration occurs; 3, respiratory tract reflection recovery; 4, waking up. The specific complications during the process of anesthesia and recovery are marked in time periods [5,8], which are respectively: 5, cardiovascular accidents; 6, reflux aspiration; airway obstruction; 8, agitation. The facial marker detection model is trained based on the time segment-labeled sample facial video data.
The facial marker detection model and the image feature extraction model are used for extracting features of the facial video data to be detected. Relevant features are extracted from the frames taken from the video over the marked period, including features of the picture itself (color histogram, etc.), changes in eyebrow movements, eye strabismus, lip twitch, or subtle facial muscle contractions.
It should be noted that the facial marker detection model (yolo8+shap+meta Q-LEARNING) is used to identify key target key marker information such as eyes, nose, mouth, and facial contours. Helping to track facial expressions and movements.
It should be noted that, facial expression analysis is performed on the target key mark information by using the image feature extraction model to obtain target features, and further, statistics is performed on the target features, for example: (1) head movement: statistical data (mean, standard deviation) of head movements including yaw, pitch and roll angles are calculated. (2) blink rate: the number of blinks per minute was counted. (3) mouth movement: features related to opening and closing of the mouth, such as the aspect ratio of the mouth, are measured. (4) gaze direction: gaze direction (e.g., left, right, up, down) is estimated based on the position of the eye markers. (5) pupil dilation: the pupil size was tracked over time. And comparing the statistical result with a preset threshold value, and taking the out-of-range characteristics as an index of alertness, wherein the important attention is paid. Still further, for sample facial video data, the extracted features are aggregated. Statistical data is calculated, including mean, standard deviation, or histogram of the feature values. Using the aggregated facial features as input, using annotated state labels as target variables, a machine learning classifier is trained during training of the sample facial video data.
YOLO8+ SHAP + META Q-LEARNING in the face video data to be measured, YOLO (You Only Look Once) is adopted to achieve target detection, SHAP (SHAPLEY ADDITIVE exPlanations) is adopted to achieve feature importance analysis, META Q-Learning is adopted to achieve model optimization dynamic parameter adjustment and robustness improvement.
In another embodiment, acquiring and preprocessing sample face video data includes:
Acquiring sample face video data, extracting a plurality of key frames from the sample face video data, and obtaining a plurality of sample face picture data;
the same object is obtained from the sample face picture as a reference point, and the alignment processing is performed on the plurality of sample face picture data according to the reference point, wherein the reference point is an object which does not change.
It should be noted that, because the patient may change the position continuously during the video, the position of the video capturing device may also change. Thus in order to obtain a more accurate facial expression. The video needs to be preprocessed, and it can be understood that both the sample face video data and the face video data to be detected need to be preprocessed. Single frames or key frames are periodically extracted from the video, and the frames of the video acquisition are properly aligned and cropped to focus on the patient's face, based on a reference in the video. It will be appreciated that the reference cannot be an object that will change position.
In another embodiment, inputting the face video data to be detected into a facial marker detection model, performing key target detection on the face video data to be detected through a YOLO8 sub-model, inputting the face video data after the key target detection into a first SHAP sub-model for feature importance analysis, performing parameter adjustment on the YOLO8 sub-model based on a result of the feature importance analysis through a first Meta Q-Learning, performing target detection on the face video data through the parameter-adjusted YOLO8 sub-model, and obtaining target key marker information of an object to be detected, including:
Performing key target detection on the face video data to be detected through the YOLO8 sub-model to obtain a boundary box of the object of interest;
Carrying out feature importance analysis on the video frame of the face video data to be detected and the interested object in the boundary frame through the first SHAP sub-model to obtain important indexes of different index features in the video frame;
Acquiring index features of important indexes higher than a preset threshold in a video frame, judging whether the index features exist in a boundary frame, if not, determining the index features as missing information, and if so, determining the index features as detection information;
inputting the detection information and the missing information into a Meta Q-Learning sub-model, and carrying out parameter adjustment on the YOLO8 sub-model according to the detection information and the missing information;
And carrying out key target detection on the face video data to be detected through the YOLO8 submodel after parameter adjustment to obtain target key mark information.
Note that, the YOLO8 sub-model annotates the face video data to be detected with a bounding box around an object of interest (such as a medical device) to be monitored, indicating whether there is an annotation of a specific object or device in the face video data to be detected. It will be appreciated that during training with sample face video data, pre-trained YOLO8 sub-models are fine-tuned on annotated datasets for object detection on the sample face video data. The YOLO8 sub-model ultimately outputs a bounding box of the detected object.
It should be noted that, the SHAP submodel for feature importance analysis: the SHAP submodel is integrated into the object detection pipeline. For each object of interest in the video frame and bounding box of face video data under test, SHAP is applied to explain why certain objects are detected or missed, involving analysis of the importance of particular image regions or features. The important index is an important index for realizing model evaluation.
It should be noted that meta-Q LEARNING for model adjustment: a meta q-learning framework is set in which the proxy (YOLO 8 sub-model) learns to adjust YOLO8 sub-model parameters (anchor box size, model architecture, detection threshold) based on SHAP sub-model interpretation and feedback of object detection performance.
Feedback loop: during the video analysis, the results of object detection and SHAP sub-model interpretation are continuously analyzed. Iterative updating of the YOLO8 sub-model using the meta q-learning sub-model adjusts the super-parameters to improve object detection accuracy.
In another embodiment, the target key mark information is input into a preset image feature extraction model to obtain a target feature, and the method further includes:
Acquiring sample key mark information;
Dividing sample key mark information to obtain a second training set and a second testing set, inputting the second training set into a preset image feature extraction model for training, inputting the second testing set into the trained image feature extraction model for testing to obtain a trained image feature extraction model, wherein the image feature extraction model comprises a CNN submodel, a second SHAP submodel and a second Meta Q-Learning submodel;
Inputting target key mark information into an image feature extraction model, carrying out image recognition on the target key mark information through a CNN sub-model, inputting face video data after image recognition into a second SHAP sub-model for feature importance analysis, carrying out parameter adjustment on the CNN sub-model based on a feature importance analysis result through a second Meta Q-Learning sub-model, and carrying out image recognition on the target key mark information through the CNN sub-model after parameter adjustment to obtain target features of an object to be detected.
The cnn+shap+third Meta Q-Learning combines Convolutional Neural Network (CNN), SHAP (SHAPLEY ADDITIVE Explanations) and Meta Learning (Meta Q-Learning), and performs image data processing on the image in the target key flag information through the CNN submodel. The output of the model is interpreted by calculating the contribution of the features to the predicted outcome by the SHAP sub-model. And determining the attention degree of the CNN submodel model to the image features in the target key mark information. And the CNN submodel is subjected to parameter adjustment by utilizing a third Meta Q-Learning, so that a more accurate model is obtained.
In another embodiment, after dividing the preprocessed sample face video data to obtain a first training set and a first test set, inputting the first training set to a preset face-marker detection model for training, inputting the first test set to the trained face-marker detection model for testing, and after obtaining a trained face-marker detection model, the method includes:
Cross-verifying the trained facial marker detection model;
Judging whether the facial marker detection model is effective according to the accuracy, recall rate and F1 score after cross verification.
After the training of the facial marker detection model is completed, evaluation and verification are also required: cross-validation: cross-validation is performed to evaluate the performance of the model on the new face video data under test. The validity of the model is assessed using appropriate metrics such as accuracy, recall, and F1 score. Deployment: the trimmed YOLO model is deployed, in combination with SHAP and Meta Q-Learning, for real-time analysis of facial videos during anesthesia wakeup. It can be appreciated that evaluation and verification can also be performed during the training of the image feature extraction model.
In another embodiment, after performing multi-dimensional panel time series prediction on the target key mark information through a preset time series prediction model to obtain a prediction category, the method includes:
and sending an alarm when the predicted category is one category in a preset early warning category set.
It should be noted that the monitoring and alarm: the face video data to be measured is continuously monitored and analyzed during the wakeup process. If a specific object is detected near the face or the predicted class of face is cardiovascular accidents; reverse flow aspiration; airway obstruction; one of which is rested, an alarm is sent.
In another embodiment, the time series prediction model includes an RNN sub-model, a TimeSformer sub-model, and a third Meta Q-Learning sub-model, and the performing multi-dimensional panel time series prediction on the target key mark information through the preset time series prediction model, to obtain a prediction category includes:
performing time sequence prediction on the long-term dependent sequence data in the target key mark information through the RNN submodel, and inputting a time sequence prediction result into the TimeSformer submodel;
Correlating the spatio-temporal data in the time sequence prediction result through TimeSformer submodels;
And carrying out parameter adjustment on the RNN submodel based on a result of space-time data association through a third Meta Q-Learning submodel, and carrying out time sequence prediction on the long-term dependent sequence data in the target key mark information through the RNN submodel after parameter adjustment to obtain a prediction type.
After extracting the features, a prediction model needs to be constructed for the postoperative anesthesia awakening and specific disease complication panel data time sequence: and based on the target features obtained by the facial marker detection model and the image feature extraction model, realizing multi-dimensional panel data time sequence prediction. Deep learning based panel time series prediction involves predicting future values of variables of a plurality of entities (panels) observed over time. This can be accomplished using a deep learning model that can capture complex time dependencies and patterns in the data. The following is a summary of the steps of the time series prediction model: (1) data preparation: -organizing the panel time series data into a suitable format, the data being 3D tensors, wherein the dimensions represent entities, time steps and features (variables). The data is split into a training set, a validation set and a test set. Since time series data has time dependency, time division must be used to avoid data leakage. (2) functional engineering: the function to be used for prediction is decided. These characteristics may include hysteresis values of the target variable, exogenous variables, and other related time series characteristics. (3) data normalization: the data is normalized to ensure that each feature has a similar scale. Common normalization techniques include min-max scaling or z-score normalization. (4) sequence generation: a sequence (input-output pair) is created for training the deep learning model. For example, a fixed length sequence is created from time series data using a sliding window method. (5) Model construction options (rnn+ TimeSformer +meta Q-LEARNING): selecting an appropriate deep learning architecture for time series prediction: recurrent Neural Network (RNN): LSTM or GRU networks are commonly used to process sequence data with long-term dependencies. Converter-based models: timeSformer submodels for processing spatio-temporal data correlations. A third META QLEARNING sub-model for model convergence and automatic parameter adjustment
(6) Model training: training a deep learning model on the generated sequence using the training data. The model is optimized using an appropriate loss function (e.g., the mean square error of the regression task) and optimization algorithms such as Adam or RMSprop. And (7) model verification: the verification set is used to adjust hyper-parameters such as the number of hidden units in the model or the learning rate to avoid overfitting. (8) model evaluation: the performance of the model was evaluated using the test set. Common evaluation metrics for time series prediction include Mean Absolute Error (MAE) and Root Mean Square Error (RMSE). (9) model deployment: once the model is trained and evaluated, it can be deployed to predict new panel time series data. The time sequence prediction model which is completed to be deployed is put into application, the time sequence prediction is carried out on the long-term dependency sequence data in the target key mark information through the RNN submodel, and the time sequence prediction result is input into the TimeSformer submodel; correlating the spatio-temporal data in the time sequence prediction result through TimeSformer submodels; and carrying out parameter adjustment on the RNN submodel based on a result of space-time data association through a third Meta Q-Learning submodel, and carrying out time sequence prediction on the long-term dependent sequence data in the target key mark information through the RNN submodel after parameter adjustment to obtain a prediction type. (10) monitoring and updating: the performance of the model is continuously monitored and updated as new data becomes available. It should be noted that deep learning models require large amounts of data to perform well, especially when dealing with complex time series patterns or small panels. Furthermore, feature engineering, data preprocessing, and hyper-parameter tuning are critical to achieving accurate predictions.
One of the embodiments is provided below, with reference to fig. 2:
1.1 training data collection: a large number of labeled facial images or video datasets are collected, including macroscopic facial features (visible facial features) and microscopic facial features. The data set should cover a wide range of moods and individuals.
1.2 Pretreatment: the collected data is preprocessed by normalizing the image, aligning the face, and reducing noise or artifacts.
1.3 Facial marker detection: facial key points, including eyes, eyebrows, nose, and mouth, are identified using facial marker detection algorithms. These landmarks serve as reference points for subsequent analysis.
1.4 Region of interest extraction: a specific region of interest within the face that is most rich in facial feature analysis information is defined. The ROI is a face+neck ROI, a face ROI, an eye ROI, an eyebrow ROI, a nose ROI, a mouth ROI, a cannula and other medical instrument ROI respectively.
1.5 Feature extraction: relevant features such as texture descriptors, optical flow and intensity gradients are extracted from the defined ROI. These features capture subtle facial changes associated with facial features.
1.6 Machine learning training: and (3) realizing facial features by using a YOLO8+SHAP+META Q-LEARNING network, and fusing the extracted features with the last layer of up-sampling feature values to realize classification prediction. The likelihood (possibilities, P) of each classification of the ROIs is calculated, and the average value of P of all the ROIs is obtained, and the emotion is represented as the emotion with a property greater than 75%.
1.7 Trimming and optimization: and fine tuning is carried out on the trained model by using cross verification, super parameter optimization, data expansion and other technologies so as to improve the performance and generalization capability.
1.8 Real-time detection: the trained model is applied to real-time or streaming video data for facial feature detection. This may involve processing a frame or video sequence using a trained model to identify and classify facial features as they occur.
1.9 Evaluation and verification: the performance of the algorithm is evaluated using appropriate metrics such as accuracy, precision, recall, or F1 score. The algorithm was validated on invisible data to evaluate its generalization ability.
1.10 Model optimization: user feedback is collected and algorithms are continually updated and improved based on the real world usage scenario. Feedback from a combination of psychological or behavioral analysis specialists is considered to improve the accuracy and reliability of the algorithm.
As shown in fig. 3, fig. 3 is a block diagram of an anesthesia wakeup facial feature prediction device based on deep learning according to an embodiment of the present invention. The invention also provides a device for predicting the anesthesia reviving facial features based on deep learning, which comprises the following steps:
The processor 310 may be implemented by a general purpose central processing unit (Central Processing Unit, CPU), a microprocessor, an Application SPECIFIC INTEGRATED Circuit (ASIC), or one or more integrated circuits, etc. for executing related programs to implement the technical solution provided by the embodiments of the present application;
The Memory 320 may be implemented in the form of a Read Only Memory (ROM), a static storage device, a dynamic storage device, or a random access Memory (Random Access Memory, RAM). Memory 320 may store an operating system and other application programs, and when implementing the technical solutions provided in the embodiments of the present disclosure by software or firmware, relevant program codes are stored in memory 320, and the processor 310 invokes the method for predicting the wake-up facial feature of anesthesia based on deep learning to perform the embodiments of the present disclosure;
an input/output interface 330 for implementing information input and output;
The communication interface 340 is configured to implement communication interaction between the device and other devices, and may implement communication in a wired manner (e.g., USB, network cable, etc.), or may implement communication in a wireless manner (e.g., mobile network, WIFI, bluetooth, etc.);
Bus 350 transmits information between the various components of the device (e.g., processor 310, memory 320, input/output interface 330, and communication interface 340);
wherein the processor 310, the memory 320, the input/output interface 330 and the communication interface 340 are communicatively coupled to each other within the device via a bus 350.
The embodiment of the application also provides electronic equipment, which comprises the anesthesia awakening facial feature prediction device based on deep learning.
The embodiment of the application also provides a storage medium, which is a computer readable storage medium, and the storage medium stores a computer program which is executed by a processor to realize the anesthesia wakeup facial feature prediction method based on deep learning.
The memory, as a non-transitory computer readable storage medium, may be used to store non-transitory software programs as well as non-transitory computer executable programs. In addition, the memory may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory optionally includes memory remotely located relative to the processor, the remote memory being connectable to the processor through a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof. The apparatus embodiments described above are merely illustrative, in which the elements illustrated as separate components may or may not be physically separate, implemented to reside in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
Those of ordinary skill in the art will appreciate that all or some of the steps, systems, and methods disclosed above may be implemented as software, firmware, hardware, and suitable combinations thereof. Some or all of the physical components may be implemented as software executed by a processor, such as a central processing unit, digital signal processor, or microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit. Such software may be distributed on computer readable media, which may include computer storage media (or non-transitory media) and communication media (or transitory media). The term computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data, as known to those skilled in the art. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital Versatile Disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by a computer. Furthermore, as is well known to those of ordinary skill in the art, communication media typically include computer readable instructions, data structures, program modules, or other data in a modulated data signal, such as a carrier wave or other transport mechanism, and may include any information delivery media.
While the preferred embodiment of the present invention has been described in detail, the present invention is not limited to the above embodiments, and those skilled in the art can make various equivalent modifications or substitutions without departing from the spirit and scope of the present invention, and these equivalent modifications or substitutions are included in the scope of the present invention as defined in the appended claims.

Claims (10)

1. A deep learning-based method for predicting facial features of anesthesia and resuscitation, comprising:
acquiring sample face video data and preprocessing the sample face video data;
Dividing the preprocessed sample face video data to obtain a first training set and a first testing set, inputting the first training set into a preset face marker detection model for training, inputting the first testing set into the trained face marker detection model for testing to obtain a face marker detection model for finishing training, wherein the face marker detection module comprises a YOLO8 sub-model, a first SHAP sub-model and a first Meta Q-Learning sub-model;
Acquiring face video data to be detected of an object to be detected;
Inputting the face video data to be detected into the facial marker detection model, carrying out key target detection on the face video data to be detected through the YOLO8 sub-model, inputting the face video data subjected to key target detection into the first SHAP sub-model for feature importance analysis, carrying out parameter adjustment on the YOLO8 sub-model based on a result of feature importance analysis through the first Meta Q-Learning, and carrying out target detection on the face video data through the YOLO8 sub-model after parameter adjustment to obtain target key marker information of the object to be detected;
inputting the target key mark information into a preset image feature extraction model to obtain target features;
And carrying out multi-dimensional panel time sequence prediction on the target characteristics through a preset time sequence prediction model to obtain a prediction category.
2. The depth learning based anesthesia wakeup facial feature prediction method according to claim 1, wherein the acquiring sample facial video data and preprocessing the sample facial video data includes:
Acquiring the sample face video data, and extracting a plurality of key frames from the sample face video data to obtain a plurality of sample face picture data;
And acquiring the same object from the sample face picture as a reference point, and performing alignment processing on a plurality of sample face picture data according to the reference point, wherein the reference point is an object which does not change.
3. The method for predicting the anesthesia wakeup facial feature based on deep Learning according to claim 1, wherein the inputting the face video data to be detected into the facial marker detection model, performing key target detection on the face video data to be detected through the YOLO8 sub-model, inputting the face video data after key target detection into the first SHAP sub-model to perform feature importance analysis, performing tuning on the YOLO8 sub-model through the first Meta Q-Learning based on a result of feature importance analysis, performing target detection on the face video data through the YOLO8 sub-model after tuning, and obtaining target key marker information of the object to be detected, includes:
Performing key target detection on the face video data to be detected through the YOLO8 sub-model to obtain a boundary box of an object of interest;
Performing feature importance analysis on the video frame of the face video data to be detected and the interested object in the boundary frame through the first SHAP sub-model to obtain important indexes of different index features in the video frame;
Acquiring the index features of which the important indexes are higher than a preset threshold in the video frame, judging whether the index features exist in the boundary frame, if not, determining the index features as missing information, and if so, determining the index features as detection information;
inputting the detection information and the missing information into the Meta Q-Learning sub-model, and carrying out parameter adjustment on the YOLO8 sub-model according to the detection information and the missing information;
And carrying out key target detection on the face video data to be detected through the YOLO8 submodel after parameter adjustment to obtain the target key mark information.
4. The method for predicting the anesthesia wakeup facial feature based on deep learning according to claim 1, wherein the step of inputting the target key mark information into a preset image feature extraction model to obtain a target feature further comprises the steps of:
Acquiring sample key mark information;
Dividing the sample key mark information to obtain a second training set and a second testing set, inputting the second training set into a preset image feature extraction model for training, inputting the second testing set into the trained image feature extraction model for testing to obtain a trained image feature extraction model, wherein the image feature extraction model comprises a CNN submodel, a second SHAP submodel and a second Meta Q-Learning submodel;
inputting the target key mark information into the image feature extraction model, carrying out image recognition on the target key mark information through the CNN submodel, inputting the facial video data after image recognition into the second SHAP submodel for feature importance analysis, carrying out parameter tuning on the CNN submodel based on a result of feature importance analysis through the second Meta Q-Learning submodel, and carrying out image recognition on the target key mark information through the CNN submodel after parameter tuning to obtain the target feature of the object to be detected.
5. The method for predicting the deep learning-based anesthesia wakeup facial feature according to claim 1, wherein after dividing the preprocessed sample facial video data to obtain a first training set and a first test set, inputting the first training set into a preset facial marker detection model for training, inputting the first test set into the trained facial marker detection model for testing, obtaining the trained facial marker detection model, comprising:
cross-verifying the trained facial marker detection model;
judging whether the facial marker detection model is effective according to the accuracy, recall and F1 score after cross-validation.
6. The deep learning-based anesthesia wakeup facial feature prediction method according to claim 1, wherein after the multi-dimensional panel time series prediction is performed on the target feature by a preset time series prediction model, the method includes:
And sending an alarm when the predicted category is one category in a preset early warning category set.
7. The deep Learning-based anesthesia wakeup facial feature prediction method according to claim 1, wherein the time series prediction model includes an RNN submodel, a TimeSformer submodel, and a third Meta Q-Learning submodel, and the performing multi-dimensional panel time series prediction on the target feature through a preset time series prediction model, to obtain a prediction category includes:
Performing time sequence prediction on the target features through the RNN submodel, and inputting a time sequence prediction result into the TimeSformer submodel;
correlating spatio-temporal data in the time series prediction result through the TimeSformer submodels;
And carrying out parameter adjustment on the RNN submodel based on a result of space-time data association through the third Meta Q-Learning submodel, and carrying out time sequence prediction on the long-term dependent sequence data in the target key mark information through the RNN submodel after parameter adjustment to obtain the prediction category.
8. An anesthesia wakeup facial feature prediction device based on deep learning, comprising at least one control processor and a memory for communication connection with the at least one control processor; the memory stores instructions executable by the at least one control processor to enable the at least one control processor to perform the deep learning based anesthesia wakeup facial feature prediction method according to any one of claims 1 to 7.
9. An electronic device comprising the deep learning-based anesthesia reviving facial feature prediction apparatus according to claim 8.
10. A computer-readable storage medium storing computer-executable instructions for causing a computer to perform the deep learning-based anesthesia wakeup facial feature prediction method according to any one of claims 1 to 7.
CN202410017042.XA 2024-01-03 2024-01-03 Deep learning-based anesthesia reviving facial feature prediction method and device Pending CN117975526A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410017042.XA CN117975526A (en) 2024-01-03 2024-01-03 Deep learning-based anesthesia reviving facial feature prediction method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410017042.XA CN117975526A (en) 2024-01-03 2024-01-03 Deep learning-based anesthesia reviving facial feature prediction method and device

Publications (1)

Publication Number Publication Date
CN117975526A true CN117975526A (en) 2024-05-03

Family

ID=90862428

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410017042.XA Pending CN117975526A (en) 2024-01-03 2024-01-03 Deep learning-based anesthesia reviving facial feature prediction method and device

Country Status (1)

Country Link
CN (1) CN117975526A (en)

Similar Documents

Publication Publication Date Title
Fridman et al. Cognitive load estimation in the wild
Soukupova et al. Eye blink detection using facial landmarks
Valstar et al. Fully automatic recognition of the temporal phases of facial actions
US9530048B2 (en) Automated facial action coding system
Lucey et al. Automatically detecting pain in video through facial action units
Pantic et al. Detecting facial actions and their temporal segments in nearly frontal-view face image sequences
US9443144B2 (en) Methods and systems for measuring group behavior
CN110634116B (en) Facial image scoring method and camera
Koelstra et al. Non-rigid registration using free-form deformations for recognition of facial actions and their temporal dynamics
JP2017215963A (en) Attention range estimation device, learning unit, and method and program thereof
Du et al. A multimodal fusion fatigue driving detection method based on heart rate and PERCLOS
JP2010231254A (en) Image analyzing device, method of analyzing image, and program
Luo et al. The driver fatigue monitoring system based on face recognition technology
Adireddi et al. Detection of eye blink using svm classifier
CN117975526A (en) Deep learning-based anesthesia reviving facial feature prediction method and device
Youwei Real-time eye blink detection using general cameras: a facial landmarks approach
CN113180594A (en) Method for evaluating postoperative pain of newborn through multidimensional space-time deep learning
Xie et al. Revolutionizing Road Safety: YOLOv8-Powered Driver Fatigue Detection
Pantic et al. Learning spatio-temporal models of facial expressions
Zhao et al. Face quality assessment via semi-supervised learning
US20220313132A1 (en) Alertness estimation apparatus, alertness estimation method, and computer-readable recording medium
Zhang et al. Automatic construction and extraction of sports moment feature variables using artificial intelligence
Tryhub et al. Detection of driver’s inattention: a real-time deep learning approach
CN117408564B (en) Online academic counseling system
Robin et al. A novel approach to detect & track iris for a different and adverse dataset

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination