CN116798630B

CN116798630B - Myopia physiotherapy compliance prediction method, device and medium based on machine learning

Info

Publication number: CN116798630B
Application number: CN202310820057.5A
Authority: CN
Inventors: 吴栩平
Original assignee: Guangzhou Shijing Medical Software Co ltd
Current assignee: Guangzhou Shijing Medical Software Co ltd
Priority date: 2023-07-05
Filing date: 2023-07-05
Publication date: 2024-03-08
Anticipated expiration: 2043-07-05
Also published as: CN116798630A

Abstract

The invention discloses a myopia physiotherapy compliance prediction method, equipment and medium based on machine learning, wherein the method comprises the following steps: collecting patient data, the patient data including physiotherapy data and personal data; extracting data features from the patient data using a multi-branch feature extraction network, the multi-branch feature extraction network comprising a self-encoder, a time series analysis model and a graph-based feature extraction model; adopting an attention mechanism to perform feature fusion on the data features; training the self-adaptive random forest model by using the fusion characteristics to obtain a classification model; and inputting patient data to be predicted into the trained classification model to obtain a prediction result. According to the invention, the data characteristics are extracted from the patient data through three different angles by the self-encoder, the time sequence analysis model and the graph-based characteristic extraction model in the multi-branch characteristic extraction network, so that the accuracy and the reliability of prediction are improved.

Description

Myopia physiotherapy compliance prediction method, device and medium based on machine learning

Technical Field

The invention belongs to the technical field of medicine, and particularly relates to a myopia physiotherapy compliance prediction method, equipment and medium based on machine learning.

Background

Myopia has become a common eye health problem worldwide, and affects the quality of life of people. There are many methods for correcting myopia, and red light irradiation has been attracting attention as a non-drug treatment method. The red light irradiation can improve visual fatigue, promote blood circulation of eyes and has a certain myopia control effect. However, patient compliance is critical to the effectiveness of the treatment, but there is currently a lack of effective methods to assess patient compliance with myopia treatment.

Furthermore, existing compliance assessment techniques suffer from the following drawbacks:

the data processing capacity is insufficient: the prior art may not adequately process complex medical data, such as time dependencies, nonlinearities, inherent correlations of data, complex structures, and the like. If these complex data cannot be processed efficiently, it may result in a decrease in the accuracy and reliability of the prediction.

The feature extraction and fusion are insufficient: traditional methods may employ only one or a limited feature extraction and fusion approach, which may ignore the diversity of the data and reduce the generalization ability of the model.

Classifier optimization problem: existing classifiers, such as conventional random forests, generally assume that the weights of all trees are the same, which may not be optimal in the face of different problems and data.

In recent years, machine learning has been increasingly used in the medical field. Machine learning is an artificial intelligence technique that automatically predicts and makes decisions by learning rules and patterns from data. In compliance assessment, machine learning can identify features and patterns related to patient compliance by analyzing a large number of data samples, thereby providing more accurate assessment and prediction.

Disclosure of Invention

In order to overcome the technical defects, the invention provides a myopia physiotherapy compliance prediction method based on machine learning, which is used for extracting data features from patient data through three different angles by a self-encoder, a time sequence analysis model and a graph-based feature extraction model in a multi-branch feature extraction network, so that the accuracy and reliability of prediction are improved.

In order to solve the problems, the first aspect of the invention discloses a myopia physiotherapy compliance prediction method based on machine learning, which comprises the following steps:

collecting patient data, the patient data including physiotherapy data and personal data;

extracting data features from the patient data using a multi-branch feature extraction network, the multi-branch feature extraction network comprising a self-encoder, a time series analysis model and a graph-based feature extraction model;

adopting an attention mechanism to perform feature fusion on the data features;

training the self-adaptive random forest model by using the fusion characteristics to obtain a classification model;

and inputting patient data to be predicted into the trained classification model to obtain a prediction result.

Further, the physiotherapy data comprises physiotherapy duration, physiotherapy frequency, physiotherapy intensity, eye movement track, pupil size change and/or blink frequency; the personal data includes age and/or gender.

Further, before the step of extracting the data features from the patient data using the multi-branch feature extraction network, the method comprises the steps of:

the patient data is preprocessed, and preprocessing content comprises data cleaning, missing value processing and/or outlier processing.

Further, the steps of extracting data features from patient data using a multi-branch feature extraction network include the steps of:

the patient data is processed using the self-encoder to extract a first data feature.

and processing the patient data by adopting a time sequence analysis method, and extracting second data characteristics.

the patient data is processed based on the feature extraction of the graph, and a third data feature is extracted.

Further, the step of performing feature fusion on the data features by adopting an attention mechanism comprises the following steps:

inputting data features into a feature embedding layer:

h _i ＝E _i (x _i )

wherein h is _i Is the embedded data feature, x _i Is the original data of type i;

the attention score of the data feature is calculated by adopting the full connection layer:

a _i ＝σ(W _a h _i +b _a )

wherein W is _a And b _a Is a parameter of the full connection layer, σ () is Sigmoid activation function, a _i Is the attention score of data feature i;

and carrying out weighted summation according to the attention scores, and calculating fusion characteristics.

Further, the step of inputting patient data to be predicted into a trained classification model, and before obtaining a prediction result, comprising the steps of:

and evaluating the trained model by using a K-fold cross validation method.

Compared with the prior art, the invention has the following beneficial effects:

the invention discloses a myopia physiotherapy compliance prediction method based on machine learning, which is characterized in that data features are extracted from patient data through three different angles by a self-encoder, a time sequence analysis model and a graph-based feature extraction model in a multi-branch feature extraction network, so that the accuracy and reliability of prediction are improved; in addition, the invention also improves the accuracy of prediction by adopting an attention mechanism to perform feature fusion and adopting a self-adaptive random forest classifier.

In a second aspect the invention discloses a computer device comprising a memory storing a computer program and a processor implementing the steps of the method described above when executing the computer program.

A third aspect of the present invention discloses a computer readable storage medium having stored thereon a computer program which when executed by a processor realizes the steps of the above method.

Drawings

The invention is described in further detail below with reference to the attached drawing figures, wherein:

FIG. 1 is a flowchart of a compliance prediction method described in example 1;

fig. 2 is a schematic diagram of the computer device described in embodiment 2.

Detailed Description

The preferred embodiments of the present invention will be described below with reference to the accompanying drawings, it being understood that the preferred embodiments described herein are for illustration and explanation of the present invention only, and are not intended to limit the present invention.

For a better understanding of the embodiments of the present application, the related art of the embodiments of the present application will be described first.

Random forests are an ensemble learning algorithm that consists of a number of decision trees, with the final prediction result determined by voting or averaging. Each tree is built based on a part of the training data randomly selected feature subsets, so that the occurrence of the over-fitting phenomenon can be reduced, and the generalization capability of the model is improved. In random forests, each decision tree randomly samples the data set and each time a division occurs, the best division point is selected in a random subset of features. Thus, each tree has certain independence, and the problem of over fitting is avoided. In the prediction process, the prediction results of all trees in the random forest are integrated in a voting or average number mode and the like, so that a final prediction result is obtained. Random forests generally have better performance and higher accuracy than single decision trees. It is suitable for classification and regression tasks and can process datasets with a large number of features and samples.

Example 1

As shown in fig. 1, the embodiment discloses a myopia physiotherapy compliance prediction method based on machine learning, which comprises the following steps:

s1, collecting patient data, wherein the patient data comprise physiotherapy data and personal data, and the physiotherapy data comprise physiotherapy duration, physiotherapy frequency, physiotherapy intensity, eye movement track, pupil size change and/or blink frequency; the personal data includes age and/or gender.

In the above embodiment, the step S2 is preceded by the following steps:

and preprocessing the patient data, wherein preprocessing contents comprise data cleaning, missing value processing and/or abnormal value processing, so that the quality and consistency of the data are ensured.

S2, extracting data features from the patient data by adopting a multi-branch feature extraction network, and extracting features from different types of data. The branch feature extraction network comprises a self-encoder, a time sequence analysis model and a graph-based feature extraction model.

Specifically, step S2 includes the following steps:

processing patient data using a self-encoder to extract a first data feature:

the objective function of the self-encoder is:

wherein ||x-g (phi (f (theta (x))))) ||j ² Is a reconstruction error;is a sparsity regularization term; λ is a regularization parameter for controlling the strength of the sparsity constraint; ρ is a sparsity parameter representing the expected average activation value of hidden layer neurons; />Is the actual average activation value.

Inputting patient data into the objective function of the self-encoder to obtain a hidden representation h=f _θ (x) H is taken as the first data characteristic.

Processing the patient data by adopting a time sequence analysis model, and extracting second data characteristics:

the time-dependent features in the physiotherapy data are processed by a time-series analysis model. For time series { x } in physiotherapy data _t An autoregressive model is represented as:

wherein c is a constant, phi _i Is a model parameter, epsilon _t Is the error term and p is the time series length. Inputting the time sequence in the physiotherapy data into an autoregressive model to obtain model parametersAs a second data feature.

Processing the patient data based on the feature extraction method of the graph, and extracting a third data feature:

in complex medical data, the inherent relevance of the data and the complex structure can generally be better expressed by the graph structure. Patient data is converted into a graph structure in which nodes represent patients and edges represent similarities between patients. The node characteristics include physiotherapy data and personal data.

Extracting complex patterns of the graph on node characteristics and graph structures through a graph convolution network, wherein the basic operation of the graph convolution network is as follows:

H ^(l+1) ＝σ(D ^-1 AH ^(l) W ^(l) )

wherein A is the adjacency matrix of the graph, D is the node degree matrix, H ^(l) Is a node feature of the first layer, W ^(l) Is the weight matrix of the first layer, σ is a nonlinear activation function.

Inputting patient data to obtain new characteristics of each node, and aggregating the characteristics to obtain the characteristic H of the whole graph ^Tu As a third data feature.

And S3, carrying out feature fusion on each data feature by adopting an attention mechanism.

Specifically, step S3 includes the following steps:

inputting each data feature obtained through step S2 to the feature embedding layer:

h _i ＝E _i (x _i )

wherein h is _i Is the embedded data feature, x _i Is the original data of the data characteristic type i. Mapping each data feature to a shared hidden space through a feature embedding layer; for each data feature type i, there is a separate embedded network E _i Implemented with a fully connected layer or a small neural network.

Feature fusion is carried out according to the importance of each data feature, and attention scores of the data features are calculated by adopting a full connection layer firstly:

a _i ＝σ(W _a h _i +b _a )

wherein W is _a And b _a Is a parameter of the full connection layer, σ () is Sigmoid activation function, a _i Is the attention score of the data feature i.

And carrying out weighted summation on each data characteristic according to the attention score, and calculating a fusion characteristic.

The feature fusion can increase the diversity and information richness of features, improve the expression capacity of the model and enhance the modeling capacity of complex data relations. By fusing a plurality of feature sources, more comprehensive and accurate feature representation can be obtained, so that the performance and prediction capability of the machine learning model are improved.

And S4, training the self-adaptive random forest model by using the fusion characteristics to obtain a classification model.

Specifically, step S4 includes the following steps:

each tree is assigned the same weight:

where n is the total number of trees in the random forest.

After each training period is finished, calculating the accuracy acc of each tree _i The weight of each tree is then updated with the following formula:

w _i ＝w _i ·acc _i

to ensure that the sum of the weights of all trees is 1, the weights of all trees are normalized:

in the prediction, the voting weight of each tree is determined by the weight of each tree.

Specifically, let c _j Represents class j, p _ij Representing the ith tree pre-runThe probability of class j is measured, then the final predicted probability is:

finally, the class with the highest probability is selected as the prediction result.

In the above embodiment, the step S5 further includes the steps of:

and evaluating the trained model by using a K-fold cross validation method. Patient data is divided into a plurality of folds, training and validation is performed on each fold, and the model is evaluated by an evaluation index. The evaluation index comprises accuracy, precision, recall, F1 score and the like.

S5, inputting patient data to be predicted into the trained classification model, and obtaining a prediction result about patient compliance. The predicted outcome may be a label, such as "compliance" or "non-compliance"; or probability values, indicating the likelihood that the patient belongs to a certain category. The medical professional can assess patient compliance based on the prediction and take appropriate intervention and therapeutic measures.

In the above embodiment, the method further includes the steps of:

and acquiring input feedback data, and iterating and optimizing the model. Wherein the feedback data comprises actual compliance of the patient and treatment effect, and the optimization content comprises adjustment of feature selection, adjustment of model parameters, improvement of data collection and the like.

And the model is optimized according to the prediction result and the actual condition, so that the accuracy and the practicability of the model are improved.

The following is further described by the specific implementation procedure:

patient data is acquired, wherein the patient data comprises physiotherapy data and personal data, and the physiotherapy data comprises physiotherapy duration, physiotherapy frequency, physiotherapy intensity, eye movement track, pupil size change and/or blink frequency; the personal data includes age and/or gender.

And performing data cleaning, missing value processing and abnormal value processing on the patient data to ensure the quality and consistency of the data.

The multi-branch feature extraction network is adopted to extract data features from patient data through three different angles, such as extracting certain data patterns which are difficult to observe from the data by a self-encoder, finding a certain trend between physiotherapy duration and physiotherapy frequency by a time sequence analysis model, and finding the relationship between physiotherapy intensity and other factors by a feature extraction model based on a graph. The data features from different sources are fused by using an attention mechanism, so that the model can better understand and utilize the data features, and the prediction accuracy is improved.

The self-adaptive random forest model is used for training, and after each training period is finished, the weight of each tree is self-adaptively adjusted according to the prediction performance, so that the model can be used for various conditions, and the prediction accuracy is improved.

The patient A is a myopia crowd, red light irradiation in myopia physiotherapy needs to be regularly received, after model training is finished, new data of the patient A are input into the model when physiotherapy is finished, compliance evaluation results of the patient A are obtained, and if 80% of the patient A is predicted to be treated according to doctor's advice. Based on the compliance assessment results, the medical professional can assess compliance of patient a and take appropriate intervention and therapeutic measures.

In addition, the actual compliance condition and treatment effect of the inputted patient A are obtained, and if the patient A is not actually treated according to the estimated compliance, model parameters or optimization feature selection and the like can be adjusted so as to improve the accuracy and the practicability of the model.

According to the invention, the quality and consistency of the data are ensured by preprocessing the acquired data, so that the accuracy and reliability of the model can be increased; the data characteristics are extracted from the patient data through three different angles through a self-encoder, a time sequence analysis model and a graph-based characteristic extraction model in a multi-branch characteristic extraction network, so that the accuracy and the reliability of prediction are improved, and the expressive power and the generalization capability of the model are improved; the feature fusion is carried out through the attention mechanism, so that the model can be helped to better understand and utilize different types of data, and the prediction accuracy is improved; in addition, the invention also adopts the self-adaptive random forest classifier to improve the prediction accuracy.

Example 2

As shown in fig. 2, the present embodiment discloses a computer device, including a memory and a processor, the memory storing a computer program, the processor being configured to execute the computer program, the processor implementing the following steps when executing the computer program:

adopting an attention mechanism to perform feature fusion on the data features;

Specifically, the processor when executing the computer program further implements the following steps:

inputting data features into a feature embedding layer:

h _i ＝E _i (x _i )

a _i ＝σ(W _a h _i +b _a )

and evaluating the trained model by using a K-fold cross validation method.

Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, database, or other medium used in the various embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile memory may include Read-only memory (ROM), magnetic tape, floppy disk, flash memory, optical memory, high density embedded nonvolatile memory, resistive random access memory (ReRAM), magnetic random access memory (Magnetoresistive Random Access Memory, MRAM), ferroelectric memory (Ferroelectric Random Access Memory, FRAM), phase change memory (Phase Change Memory, PCM), graphene memory, and the like. Volatile memory can include random access memory (Random Access Memory, RAM) or external cache memory, and the like. By way of illustration, and not limitation, RAM can be in the form of a variety of forms, such as static random access memory (Static Random Access Memory, SRAM) or dynamic random access memory (Dynamic RandomAccess Memory, DRAM), and the like. The databases referred to in the various embodiments provided herein may include at least one of relational databases and non-relational databases. The non-relational database may include, but is not limited to, a blockchain-based distributed database, and the like. The processors referred to in the embodiments provided herein may be general purpose processors, central processing units, graphics processors, digital signal processors, programmable logic units, quantum computing-based data processing logic units, etc., without being limited thereto.

Example 3

The present embodiment discloses a computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements the steps of the method described in embodiment 1.

The present invention is not limited to the preferred embodiments, and any modifications, equivalent variations and modifications made to the above embodiments according to the technical principles of the present invention are within the scope of the technical proposal of the present invention.

Claims

1. A myopia physiotherapy compliance prediction method based on machine learning is characterized by comprising the following steps:

adopting an attention mechanism to perform feature fusion on the data features;

inputting patient data to be predicted into a trained classification model to obtain a prediction result;

the step of extracting data features from patient data using a multi-branch feature extraction network includes the steps of:

processing patient data by using a self-encoder to extract first data features;

processing the patient data by adopting a time sequence analysis model, and extracting second data characteristics;

processing the patient data based on the feature extraction method of the graph, and extracting a third data feature;

the steps of processing patient data using a self-encoder to extract a first data feature, comprising:

the objective function of the self-encoder is:

wherein ||x-g (phi (f (theta (x))))) ||j ² Is a reconstruction error;is a sparsity regularization term; λ is a regularization parameter for controlling the strength of the sparsity constraint; ρ is a sparsity parameter representing the expected average activation value of hidden layer neurons; />Is the actual average activation value;

inputting patient data into the objective function of the self-encoder to obtain a hidden representation h=f _θ (x) Taking h as a first data characteristic;

processing the patient data by using a time series analysis model, extracting second data features, including:

processing time-dependent features in physiotherapy data by a time-series analysis model, time-series { x } in physiotherapy data _t An autoregressive model is represented as:

wherein c is a constant, phi _i Is a model parameter, epsilon _t Is an error term, p is the time sequence length, and the time sequence in the physiotherapy data is input into an autoregressive model to obtain model parametersAs a second data feature;

the step of processing the patient data based on the feature extraction of the graph, extracting a third data feature, comprising:

H ^(l+1) ＝σ(D ^-1 AH ^(l) W ^(l) )

wherein A is the adjacency matrix of the graph, D is the node degree matrix, H ^(l) Is a node feature of the first layer, W ^(l) Is the weight matrix of the first layer, sigma is a nonlinear activation function;

2. The compliance prediction method of claim 1, wherein the therapy data comprises therapy duration, therapy frequency, therapy intensity, eye movement trajectory, pupil size variation, and/or blink frequency; the personal data includes age and/or gender.

3. The compliance prediction method according to claim 1, wherein before the step of extracting data features from patient data using the multi-branch feature extraction network, comprising the steps of:

4. The compliance prediction method according to claim 1, wherein the step of feature fusing the data features using an attention mechanism comprises the steps of:

inputting data features into a feature embedding layer:

h _i ＝E _i (x _i )

a _i ＝σ(W _a h _i +b _a )

5. The compliance prediction method according to claim 1, wherein the step of inputting patient data to be predicted into a trained classification model, before obtaining a prediction result, comprises the steps of:

and evaluating the trained model by using a K-fold cross validation method.

6. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any one of claims 1 to 5 when the computer program is executed.

7. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 5.