WO2020211592A1 - 一种糖尿病风险预警系统 - Google Patents

一种糖尿病风险预警系统 Download PDF

Info

Publication number
WO2020211592A1
WO2020211592A1 PCT/CN2020/080251 CN2020080251W WO2020211592A1 WO 2020211592 A1 WO2020211592 A1 WO 2020211592A1 CN 2020080251 W CN2020080251 W CN 2020080251W WO 2020211592 A1 WO2020211592 A1 WO 2020211592A1
Authority
WO
WIPO (PCT)
Prior art keywords
diabetes
exercise
cluster
early warning
point
Prior art date
Application number
PCT/CN2020/080251
Other languages
English (en)
French (fr)
Inventor
高秀娥
陈波
陈世峰
桑海涛
Original Assignee
岭南师范学院
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from CN201910314236.5A external-priority patent/CN110085322A/zh
Priority claimed from CN201910340600.5A external-priority patent/CN110060781A/zh
Application filed by 岭南师范学院 filed Critical 岭南师范学院
Priority to US16/967,620 priority Critical patent/US20220301708A1/en
Publication of WO2020211592A1 publication Critical patent/WO2020211592A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/20ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H40/00ICT specially adapted for the management or administration of healthcare resources or facilities; ICT specially adapted for the management or operation of medical equipment or devices
    • G16H40/60ICT specially adapted for the management or administration of healthcare resources or facilities; ICT specially adapted for the management or operation of medical equipment or devices for the operation of medical equipment or devices
    • G16H40/63ICT specially adapted for the management or administration of healthcare resources or facilities; ICT specially adapted for the management or operation of medical equipment or devices for the operation of medical equipment or devices for local operation
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/30ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for calculating health indices; for individual health risk assessment
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/70ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • G06F18/24133Distances to prototypes
    • G06F18/24137Distances to cluster centroïds
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H20/00ICT specially adapted for therapies or health-improving plans, e.g. for handling prescriptions, for steering therapy or for monitoring patient compliance
    • G16H20/30ICT specially adapted for therapies or health-improving plans, e.g. for handling prescriptions, for steering therapy or for monitoring patient compliance relating to physical therapies or activities, e.g. physiotherapy, acupressure or exercising

Definitions

  • the invention relates to the technical field of medical information technology, in particular to a diabetes risk early warning system.
  • the publication date is 2017-11-28 and the patent document with the publication number CN107403072A provides a type 2 diabetes prediction and early warning method based on machine learning.
  • This method uses the K-means algorithm and the Logistic Regression algorithm to establish clustering and then classification Diabetes two-layer predictive analysis model.
  • This method uses the K-means algorithm to perform unlabeled cluster analysis on the data set.
  • this method introduces a hierarchical algorithm-the next level Logistic Regression algorithm To find a stable initial clustering center, the additional calculation amount of the algorithm is greatly increased, and the method of setting the threshold according to the problem-solving experience destroys the convergence of the algorithm, and it is still difficult to achieve the stability of the clustering results in the end.
  • the diabetes prediction model has more and more data features and larger data dimensions, it has brought more non-key information and redundant information, and the prediction model has become more and more complex.
  • the prediction method is difficult to directly apply to the prediction of diabetes.
  • the existing literature such as Ke Zhenglin of Beijing Jiaotong University and others in the application of Lasso and related methods in multiple linear regression models apply the ideas of Lasso method and related methods to the variables of multiple linear regression models Selection.
  • the traditional LARS algorithm for variable selection of multiple linear regression models is proposed, and the specific implementation of the variable selection method is given through diabetes statistical data and a simulation-generated multiple statistical data.
  • the purpose of the present invention is to provide a diabetes risk early warning system, aiming at the problem that the k-means clustering algorithm randomly selects the initial cluster center to cause unstable clustering results, and proposes an improved k-means algorithm for initial cluster center optimization, and Combined with the diabetes segmentation function, an improved method of k-means clustering diabetes warning model is proposed.
  • the present invention fully considers the impact of different diabetes characteristics on the prediction results, provides an improved method for calculating the correlation between the characteristic independent variable and the dependent variable, simplifies the diabetes prediction model, and proposes a feature weight-based method LARS diabetes prediction method.
  • the diabetes risk early warning system proposed by the present invention includes at least one processor, system memory and at least one computer-readable storage medium.
  • the at least one computer-readable storage medium is loaded with computer-executable instructions for enabling the processor to implement various aspects of the present invention.
  • At least one processor is used to execute the computer-executable instructions, such as the flowcharts and block diagrams in the accompanying drawings, which show the possible implementation architecture and functions of the system, method, and computer program product according to multiple embodiments of the present invention And operation.
  • each block in the flowchart or block diagram may represent a module, program segment, or part of an instruction, and the module, program segment, or part of an instruction contains one or more executable instructions for realizing the specified logic function. .
  • Each block in the block diagram and/or flowchart, and the combination of the blocks in the block diagram and/or flowchart, can be implemented by a dedicated hardware-based system that performs the specified functions or actions, or can be implemented by dedicated hardware Realized in combination with computer instructions.
  • various aspects of the present invention are described with reference to flowcharts and/or block diagrams of methods, devices (systems) and computer program products according to embodiments of the present invention. It should be understood that each block of the flowcharts and/or block diagrams and combinations of blocks in the flowcharts and/or block diagrams can be implemented by computer-readable program instructions.
  • the above-mentioned computer-readable storage medium may be a tangible device that can hold and store instructions used by the instruction execution device.
  • the computer-readable storage medium may be, for example, but not limited to, an electrical storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing.
  • the processor is a functional unit that interprets and executes instructions, also known as the central processing unit or cpu, as the computing and control core of the computer system, and the final execution unit for information processing and program operation.
  • the storage device can be a read only memory (ROM), a random access memory (RAM), an external memory such as a hard disk, a floppy disk, an optical disk, a U disk, etc., or a storage server.
  • a diabetes risk early warning system comprising: a storage device; a first processor coupled to the storage device and configured to obtain stable cluster centers based on the selected first cluster center point, substituting diabetes piecewise function, to give warning model of diabetes, wherein the selected data sets, defined between the cluster number of clusters k, field radius [epsilon], select X i and the sample point and the sample point of maximum distance as the first poly Cluster center point, so that the first cluster center point falls on the center of each cluster.
  • a device for obtaining regression coefficients of a diabetes prediction model based on a LARS algorithm based on feature weights comprising a plurality of modules respectively configured to execute at least one step in the LARS diabetes prediction method based on feature weights.
  • a diabetes risk early warning system comprising: a storage device; a second processor coupled to the storage device and configured to: calculate an independent variable feature weight vector and an original correlation degree vector; based on the independent variable feature weight The vector and the original correlation vector output the regression coefficient ⁇ of the LARS diabetes model based on the feature weight.
  • a device for constructing a diabetes prediction model comprising a plurality of modules respectively configured to execute at least one step in the method for constructing the diabetes prediction model.
  • a diabetes risk early warning system comprising: a storage device; at least one processor coupled to the storage device and configured to obtain stable cluster centers based on the selected first cluster center point, substituting piecewise diabetes, diabetes resulting predictive model, wherein the selected data set, the number of clusters defined cluster k, field radius [epsilon], the choice between the sample and the sample points X i and a maximum distance of a point as the first cluster Center point, so that the first cluster center point falls on the center of each cluster; calculate the independent variable feature weight vector and the original correlation vector; based on the independent variable feature weight vector and the original correlation vector, output The regression coefficient ⁇ of the diabetes prediction model.
  • FIG. 1 is a flowchart of a LARS diabetes prediction method based on feature weights in an embodiment of the present invention
  • FIG. 2 is a solution diagram of the standard LARS algorithm
  • FIG. 3 is a solution diagram of LARS algorithm with feature weight added in an embodiment of the present invention.
  • Figure 4 is a diagram of the change path of the regression variable ⁇ of the standard LARS algorithm
  • Figure 5 is the change path of the regression variable ⁇ of the LARS algorithm based on feature weights
  • Fig. 6 is a graph showing the change curve of the standard LARS algorithm and the feature weight-based LARS algorithm of the present invention with the number of iterations ACC;
  • Fig. 7 is a graph showing the variation of ROC of the standard LARS algorithm and the feature weight-based LARS algorithm of the present invention with the number of iterations.
  • Fig. 8 is an algorithm flow chart of an improved method of k-means clustering diabetes early warning model in an embodiment of the present invention
  • FIG. 9 is a line graph comparing the average convergence speeds of different algorithms on the new diabetes data set in an embodiment of the present invention.
  • Fig. 10 is a line graph of the comparison of the average ARI of multiple clustering results of different algorithms on a new diabetes data set in an embodiment of the present invention.
  • Fig. 11 is a schematic diagram of the simplified module connection relationship of the preferred diabetes risk early warning system of the present invention.
  • the first processor 2 The second processor
  • the present invention combines the feature weights obtained by the PCA algorithm in the LARS algorithm solving step.
  • the probability that the feature independent variables are selected due to different weights has changed, which can speed up the approach of the algorithm to the key features, thereby Speed up the algorithm solving speed and accuracy; and because the PCA algorithm is used to constrain the changes of the respective variables, the robustness of the model is increased.
  • the system provided by the present invention is used for constructing a diabetes prediction model.
  • the system at least includes a storage device and a second processor 2.
  • the second processor 2 is coupled to the storage device and is configured to execute LARS diabetes based on feature weights At least one step in the prediction method.
  • the LARS diabetes prediction method based on feature weights includes at least one of the following steps:
  • the present invention provides a LARS diabetes prediction method based on feature weights.
  • the specific steps are as follows:
  • Step 1 Normalize the diabetes data set matrix X so that the range values of different features of the diabetes data set are mapped to the same 0-1 fixed range.
  • the fit value refers to the fit of each iteration to the real result, and is initialized to 0.
  • the residual is the difference between the real result and the fitted value.
  • the calculation formula 1 is:
  • y is the true result label vector
  • is the current fitted value
  • Step 2 Calculate the initial weight of each characteristic independent variable.
  • the calculation method 3 is:
  • ⁇ i is the mean value of the i-th feature.
  • the Lasso model can get sparse solutions, and the obtained diabetes prediction model has better generalization ability.
  • the LARS algorithm can solve the Lasso model, but it has the problems of slow approximation speed and low accuracy when applied to the diabetes data set.
  • the present invention improves the LARS algorithm through PCA principal component analysis, obtains the weights of different features, and distinguishes the importance of each feature independent variable, and therefore provides a LARS diabetes prediction method based on the feature weights.
  • step 3 Take the column vector from X that is in the same direction as y, and set it as X A , and X A is the column vector in the A index set taken from X.
  • 1 A is a column vector in which all elements of k dimensions are 1, and k is the number of elements in A.
  • the regression coefficient, the current fitted value and the current residual can be updated from the calculation formula in step 4, and the calculation formula 9 of the regression coefficient vector can be updated:
  • the plus sign above min means that only the minimum value of positive numbers in the set is calculated.
  • C i and a i are respectively the i-th element in C and a, and the value of i is such that Get the minimum i.
  • Step 5 Determine whether the L2 norm of the residual error in step 4 is less than a certain tolerance, if yes, the end is finished, and the regression coefficient is output; otherwise, step 3 to step 5 are repeated.
  • regression coefficient in the regression equation represents the parameter of the influence of the independent variable x on the dependent variable y.
  • the slope b is called the regression coefficient, which means that every time X changes by one unit, on average, Y will change by b units.
  • the Lasso model can be used for feature selection to obtain the key features of diabetes.
  • the key feature variables of diabetes are screened, which simplifies the diabetes prediction model; improves the accuracy of the diabetes prediction model and provides more accurate diabetes prevention and treatment measures For the early stage.
  • the diabetes prediction method proposed by the present invention is compared with the standard LARS method, and the regression coefficient path, the prediction accuracy (ACC) curve, and the receiver operating characteristic (ROC) curve of the model are used as evaluation indicators.
  • the regression coefficient path can intuitively see the changes in the coefficients of each characteristic independent variable
  • the ACC curve can intuitively compare the approximation speed and accuracy of the algorithm
  • the ROC curve is a tool to measure imbalance problems. The area under the curve The larger the model, the better.
  • the standard LARS algorithm first finds the independent variable x k that has the largest correlation with y, and uses it to approximate y until another x l appears, its correlation with y and the correlation between x k and y Equal, at this time start to approach y in the direction of the angular division of x k and x l .
  • the third x p has a sufficiently large correlation with the dependent variable, it is also added to the approximation queue, and three The direction of the angular division line common to the vectors ("the angular division line" refers to the bisector of the high-order vector), and so on, until the residual is small enough or all independent variables have been obtained, and the algorithm ends.
  • the standard LARS algorithm maintains the complexity of the previous selection algorithm, and only requires m steps at most, where m is the number of independent variables, while ensuring the optimal result in the independent variable subspace.
  • the standard LARS algorithm and the LARS algorithm with feature weights are respectively solved. Taking two feature independent variables as an example, the correlation calculation will be performed before each approximation. It can be seen from Figure 3 that after the feature weights are added, the probability of the two feature independent variables being selected due to the different weights has changed, the approach direction has also changed, and finally the regression coefficients have changed.
  • the 2-hour plasma glucose concentration is used to understand the function of pancreatic ⁇ -cells and the body's ability to regulate blood sugar, which is widely used in clinical practice;
  • the normal diastolic blood pressure of adults is ⁇ 90mmHg (12kpa);
  • the thickness of subcutaneous fat of the triceps muscle If the thickness of the adult triceps skin wrinkle wall is greater than 10.4mm in men and 17.5mm in women, it is considered obese;
  • two-hour serum insulin serum insulin is the only hormone in the body that lowers blood sugar, and is the only simultaneous promotion of glycogen, fat, A hormone for protein synthesis.
  • Table 1 and Table 2 are the compression coefficient ⁇ values and regression coefficient ⁇ values corresponding to the different iteration times n of the standard LARS and the feature weight-based LARS. It can be seen from the two tables that all three of the fifth iteration ⁇ are 0. Shows that the regression coefficients of the three independent variables in the final model are 0. At this time, the model is simplified and the ACC reaches the highest.
  • Table 1 The values of compression coefficient ⁇ and regression coefficients corresponding to different iteration times n of standard LARS
  • Figure 7 shows the 100 false positive rates and true positive rates calculated from the threshold t from 0 to 1 in steps of 0.01, and then the ROC curves of the two LARS algorithms are obtained.
  • the red dashed line is the ROC curve of random guessing It can be seen that the ROC curve of LARS based on feature weight is closer to the upper left corner.
  • the AUC of LARS based on feature weight is 0.8953
  • the AUC of standard LARS is 0.8664
  • the AUC of LARS algorithm based on feature weight is the highest .
  • the ACC of the diabetes prediction model calculated by the LARS algorithm based on feature weights is higher than that of the standard LARS, and can be approximated to the optimal model faster.
  • the AUC value of the ROC curve of the LARS algorithm based on feature weights is higher than the standard LARS. Therefore, the LARS algorithm based on feature weights is better than the standard LARS algorithm in solving the diabetes prediction model.
  • this embodiment has made further improvements to it, and the repeated content will not be repeated.
  • this embodiment proposes a diabetes risk early warning system, which includes at least the second processor 2 as described in the foregoing embodiment 1, a storage device coupled to the processor, and an interface 5 between them.
  • the diabetes risk early warning system is suitable for risk management of rehabilitation exercises for high-risk diabetic patients.
  • the diabetes risk early warning system at least includes a sensor module, a second processor 2 and an exercise program adjustment module.
  • the sensor module is configured to collect initial data of the diabetes risk early warning system, system application parameters, and user data about the current user
  • the second processor 2 is configured to use the data set collected by the sensor module according to The salient features selected by machine learning and the relationship between the extracted exercise monitoring data and autonomous behavior ability are used to identify the user’s exercise diabetes risk.
  • the exercise plan adjustment module is configured to be based on the analysis by the second processor 2 The determined relationship between exercise monitoring data and autonomous behavior ability dynamically adjusts the configuration parameters in the exercise program by identifying the risk of exercise diabetes.
  • the sensor module is used to collect the initial data of the diabetes risk early warning system, the parameters of the system application, and user data about the current user.
  • the collected data is collected in the database, and the integrated data set is transmitted to the second processor 2.
  • the sensor module is mainly established in the following two forms: for example, a smart watch worn on the wrist of the current user is used to collect or monitor the physiological monitoring parameters and exercise monitoring data of the current user during the exercise process.
  • the smart electronic device 4 may be a small and portable self-mixing coherent laser radar non-invasive blood glucose measurement device provided by the patent document with the publication number CN202051710U on November 30, 2011, which uses laser radar frequency modulation continuous wave Combined with self-mixing coherent technology to realize non-invasive blood glucose measurement for users.
  • Exercise monitoring data includes sedentary time, exercise intensity, exercise time, exercise duration, exercise frequency, exercise type, etc.
  • the mechanical equipment is provided with a pinch force sensor, a grip force sensor, a torsion force sensor, an electromyography collector, at least one digital quantity transmitter, and a joint mobility sensor, which are respectively used to collect the movement ability data of the upper limbs or the lower limbs.
  • the analog voltage signal collected by the pinch force sensor/grip force sensor/torque sensor is processed by a digital transmitter and then becomes a digital voltage signal.
  • the electromyography collector includes electrodes, a low-pass filter circuit module, a band-pass amplifier circuit module, and an analog-to-digital conversion circuit module.
  • the biological myoelectric signal on the human body surface is collected by the electromyography collector and converted into a digital voltage signal.
  • the joint mobility sensor may be an inclination sensor or an angle sensor, for example.
  • the second processor 2 is configured to use a large number of data sets collected by the sensor module to identify the user's movement according to the relationship between the extracted movement monitoring data and the autonomous behavior ability Diabetes risk. Analyze a large number of collected data sets sent by the sensor module to establish a diabetes prediction model to identify the user's diabetes risk during exercise.
  • the “exercise monitoring data” refers to the real-time monitored data when the current user implements the exercise program.
  • the “autonomous behavior ability” refers to the independently completed behavior ability of the current user at the current stage output by the second processor 2 after evaluating a large amount of data about the current user.
  • Autonomous behavior ability is represented by at least one exercise ability evaluation data.
  • the exercise ability evaluation data is used to describe the degree of the current user’s ability to perform independently at the current stage.
  • the exercise ability evaluation data is based on the user’s historical exercise monitoring data such as action duration, movement It is generated by at least one data of amplitude or action frequency.
  • Autonomous behavior ability is used to provide a comparison basis for identifying the user's exercise diabetes risk.
  • Exercise monitoring data is used to describe the real-time data of the current user's exercise ability when implementing exercise programs. Based on the relationship between exercise monitoring data and autonomous behavior ability, exercise is calculated The real-time data of the ability exceeds the load data of the historical data of the athletic ability. Then the calculated load data is compared with the preset load data threshold, so that the risk of exercise diabetes can be predicted based on the load data.
  • the exercise program when the exercise program is not implemented, the user’s ability to perform independently in the current stage is evaluated, which is used to provide a comparison basis for identifying the user’s exercise diabetes risk.
  • real-time data on the user’s exercise ability is obtained by monitoring The data is compared with the evaluation data, and it is determined whether the currently implemented exercise program exceeds the preset control conditions.
  • at least one of the exercise programs can be dynamically adjusted accordingly Configuration parameters such as grip strength requirements, strength requirements, joint mobility requirements and other athletic ability data.
  • “diabetes prediction model” is used to describe the relationship between exercise monitoring data and autonomous behavior ability. It presets a load data threshold and adjusts the load data threshold according to user operations or is based on the second processor 2's big data Analysis to automatically adjust the load data threshold.
  • the relationship between the motion monitoring data to be extracted and the autonomous behavior ability is determined according to the salient features selected by machine learning.
  • the irrelevant features among multiple features are eliminated to determine the significant features that have a higher correlation with diabetes risk.
  • the second processor 2 is further explained: the second processor 2 first obtains a large number of diabetes diagnosis case samples through information interaction with other intelligent electronic devices 4, and screens out the exercise plan and diabetes risk according to machine learning.
  • the salient features selected according to machine learning refer to the output set generated by inputting the training set into the LARS diabetes model based on feature weights as in Example 1.
  • the training set refers to a large number of diabetic diagnosis case samples, each case sample includes at least its exercise plan data such as sedentary time, exercise intensity, exercise time, exercise duration, exercise frequency, exercise type, and diabetes risk trend.
  • each case sample includes at least its exercise plan data such as sedentary time, exercise intensity, exercise time, exercise duration, exercise frequency, exercise type, and diabetes risk trend.
  • the diabetes risk change trend is determined according to the changes of diabetes risk indicators before and after the exercise program is implemented.
  • the diabetes risk indicators are, for example, blood sugar peak, heart rate peak, blood pressure peak, etc.
  • the Lasso regression model input some exercise program data as the feature to be screened (or called the feature matrix of the diabetes data set), determine the diabetes risk change trend as the screening target (or called the true result label), and use the LARS algorithm based on feature weights to perform the model Solve the significant features selected by the output and their corresponding regression coefficient values.
  • the selected salient features can effectively eliminate irrelevant items in the data that are not related to exercise performance data and/or diabetes risk , which can reduce the complexity and feedback time of the diabetes prediction model when evaluating data in real time.
  • the second processor 2 uses the data set collected by the sensor module according to the relationship between the exercise monitoring data extracted by machine learning and the autonomous behavior ability to identify the user's exercise diabetes risk.
  • the exercise program adjustment module is configured to dynamically adjust the configuration parameters in the exercise program based on the relationship between the exercise monitoring data and the autonomous behavior ability.
  • the Pima diabetes dataset is first adopted. Because the existing k-means algorithm uses random selection of initial cluster centers, it is easy to cause unstable clustering results. Therefore, the selection of initial cluster centers needs to be improved to make it as far as possible The ground falls in the center of each cluster.
  • the Pima diabetes data set refers to the Pima Indian Diabetes data set in the widely used University of California, Irvine (UCI) machine learning database.
  • the system provided by the present invention is used to construct a diabetes prediction model.
  • the system at least includes a storage device and a first processor 1.
  • the first processor 1 is coupled to the storage device and is configured to perform an improved k- mean s at least one step in the clustered diabetes early warning method.
  • the diabetes early warning method based on improved k-means clustering includes at least one of the following steps:
  • the first cluster center point selection Select the data set, define the number of clusters k, the field radius ⁇ , and select the point with the largest sum of the distance between the sample point x i and the sample as the first cluster center point;
  • Cluster marking Calculate the distance between each sample and the cluster center, determine the cluster label of the sample according to its closest distance, and divide the sample into the corresponding cluster;
  • Diabetes early warning model Obtain stable cluster centers, and substitute the diabetes piecewise function to obtain an early warning model of diabetes.
  • the improved k-means clustering algorithm effectively overcomes the problem of unstable clustering results.
  • an improved method of k-means clustering diabetes early warning model is established to improve The early warning capability of diabetes provides a basis for the diagnosis and treatment of different stages of diabetes.
  • diabetes early warning segmentation function is:
  • an improved method for improving the k-means clustering diabetes early warning model and the standard k-means clustering, non-patent literature mentioned in the background technology [1], Non-patent literature [2] and other methods are compared, with homogeneity, completeness, FMI, ARI mean, CHI, average convergence speed, average convergence times, and algorithm time as evaluation indicators, and comparison and analysis are conducted through these indicators and curves.
  • ARI Adjusted Rand Index
  • the PCATDKM algorithm is proposed to add PCA, TD and the maximum and minimum distance algorithm to the traditional Kmeans algorithm.
  • the PCA algorithm can reduce the dimensionality of the data object collection and accelerate the clustering process.
  • the initial clustering center can be selected dynamically according to the actual distribution of the data object, so that the initial k clustering centers obtained by the clustering algorithm correspond to the actual clusters.
  • Literature [2] refers to: YuanQL ,Shi H B, Zhou X F.
  • An optimized initialization center K- means clustering algorithm based on density[C]//IEEE International Conference on Cyber Technology in Automation, Control, and Intelligent Systems(CYBER), Shenyang,IEEE, 2015 790-794.
  • a method for optimizing the initial center point of K-means is proposed.
  • the algorithm uses a density-sensitive similarity measure to calculate the density of objects. By calculating the minimum distance between this point and other points with higher density, select Candidate points are selected. Then, combined with the average density, outliers are screened out. Finally, the initial center of the K-means algorithm is screened out.
  • Experimental results show that the initial center point obtained by the algorithm has high accuracy and can effectively filter out abnormalities.
  • Table 3 shows the ARI mean values obtained by running the standard k-means, improved k-means, literature [1], literature [2], and Agglomerative on the diabetes data set 300 times. From Table 1, it can be seen that the improved k-means algorithm, literature [1] algorithm and literature [2] algorithm get the model ARI mean value is significantly higher than using the standard k-means algorithm, in which the improved k-means and literature [2] algorithm combined density The ARI value of the model obtained is better than that of the algorithm in literature [1]. But whether it is the standard k-means algorithm or the improved k-means algorithm, the performance of the obtained model is not as good as the density-based clustering algorithm Agglomerative algorithm.
  • Table 4 shows the homogeneity, completeness, FMI, ARI mean, CHI, etc. of 5 algorithms, including standard k-means, improved k-means, literature [1], literature [2], and Agglomerative on the new diabetes dataset.
  • Table 4 The mean values of the models of different algorithms on the 5 indicators on the new diabetes dataset
  • the ARI of a cluster is taken as the ordinate
  • the number of iterations of the algorithm in a cluster is the ordinate
  • the average number of iterations and the average ARI of a cluster are obtained by running 300 times.
  • the algorithm of the present invention the algorithm of the document [1], and the algorithm of the document [2] have a higher ARI value at the beginning of the iteration. Due to the improved initial center selection method, the obtained initial cluster centers are more accurate.
  • the algorithm of the present invention, Document [1] and document [2] have significantly fewer iterations in a clustering algorithm, and the algorithm of the present invention has the least number of iterations.
  • Table 5 is the standard k-means, improved k-means, literature [1], literature [2] and other four algorithms to get the average convergence times and algorithm time of the model. It can be seen that the number of iterations of the standard k-means algorithm is basically Twice the other improved algorithms. However, it can be seen from the average algorithm time that the time to solve the model using the standard k-means algorithm is not the longest. The algorithm time of literature [1] and [2] exceeds it. This is due to the addition of literature [1] and [2] Excessive mathematical calculations have reduced the number of iterations, but the time of a clustering algorithm is longer. Although the algorithm of the present invention also adds density calculations, it is only calculated once, and more is combined with the idea of probability. The entire data set matrix needs to be calculated repeatedly.
  • the ordinate ARI value is the mean value of each result ARI of the 5 data sets, and the abscissa is the number of clustering. It can be seen from Figure 10 that due to the characteristics of the algorithm itself, the clustering results of the Agglomerative algorithm are the same each time, so it is a straight line; the model results obtained by the standard k-means algorithm fluctuate sharply.
  • the algorithm of the present invention and the literature [2] The model results obtained by the algorithm perform well; by calculating the variance of the curve, it is known that the algorithm of the present invention is 3.19*10 -5 , the algorithm of document [2] is 6.68*10 -5 , and the algorithm of document [1] is 2.94 *10 -4 , while the standard k-means algorithm is 2.78*10 -3 . It can be seen that the model obtained by the algorithm of the present invention is the most stable, followed by the algorithm in literature [2], and the model obtained by the standard k-means algorithm is the most unstable.
  • the model indexes obtained by the algorithm of the present invention, the algorithm of the literature [1], the algorithm of the literature [2] and the Agglomerative algorithm are better than the standard k-means algorithm; the algorithm of the present invention has the best convergence and algorithm time, and the literature [1] and Document [2] Although the convergence is better than the standard k-means algorithm, the algorithm time is longer; the model obtained by the algorithm of the present invention, the document [1] algorithm, and the document [2] algorithm is more stable than the standard k-means algorithm.
  • the model obtained by the algorithm of the present invention is the most stable.
  • the present invention combines the improved k-means clustering algorithm with the diabetes segmentation function, and invents an improved method for the k-means clustering diabetes warning model, which overcomes the problem of unstable clustering results of the k-means algorithm , Improve the accuracy and stability of the early warning model.
  • this embodiment makes further improvements to it, and the repeated content will not be repeated.
  • this embodiment proposes a diabetes risk early warning system.
  • the system at least includes the first processor 1 and the second processor 2 described in the foregoing embodiment 3, respectively coupled to the first processor 1 and the storage device of the second processor 2, and the interface between them 5.
  • the diabetes risk early warning system monitors whether the user’s behavior is potentially risky before the user’s physiological information is abnormal. Based on the user’s behavior that is closely related to individual differences, it is based on exercise intervention. The risk of diabetes is controlled hierarchically to improve the treatment effect, eliminating the problems of severe monitoring time delay and high data sensitivity.
  • the diabetes risk early warning system is used to control the diabetes risk of the current user, especially the risk of diabetes caused by the exercise process.
  • current users include early diabetes patients and/or diabetic patients.
  • Early-stage diabetes patients refer to individuals who have a predisposition to develop diabetes.
  • Diabetes risk includes the risk of developing diabetes from the early stage of diabetes and/or the risk of causing diabetes.
  • the diabetes risk early warning system may be a wearable smart device, a smart mobile terminal, etc.
  • the diabetes risk early warning system includes a first processor 1 configured to use a diabetes early warning model to predict whether the current user is suffering from diabetes and the stage of diabetes.
  • the prediction result includes one of health, level I warning, and level II warning.
  • the diabetes risk early warning system also includes an exercise program generation module.
  • the exercise plan generation module is used to obtain exercise monitoring data about the current user and execute a first-level risk warning based on the user data to determine the exercise risk model about the current user.
  • “Exercise monitoring data” includes at least the length of sitting, exercise intensity, exercise time, exercise duration, exercise frequency, exercise type, etc.
  • the motion monitoring data is obtained through information interaction between the motion plan generation module and other intelligent electronic devices 4.
  • “User data” includes at least the current user's diabetes stage, diet monitoring data, drug monitoring data, medical history data, geographic location information, physiological monitoring data, physical fitness evaluation data, etc.
  • the user data may be obtained by information interaction with other smart electronic devices 4 through the exercise program generation module.
  • the "diet monitoring data” can be obtained based on the analysis and processing of the diet pictures taken by the current user, or obtained by the diet time and food types and components recorded by the current user on the smart mobile terminal.
  • drug monitoring data can be obtained based on the user's drug treatment plan and the medication time recorded by the current user.
  • the “medical treatment history data” includes the complications of the user, the exercise treatment plan recommended by the doctor, and the medication treatment plan.
  • the aforementioned exercise program generation module/smart electronic device 4 may be a wearable smart device such as a smart bracelet, a smart mobile terminal such as a smart phone, and so on.
  • “Physical fitness assessment data” can be body mass index or BMI, which is defined as body weight (in kilograms) divided by height (in meters) squared (in kg/m2).
  • the first-level risk early warning is executed when the exercise plan generation module analyzes and determines that the current user's exercise monitoring data exceeds the preset risk range and the physiological monitoring data does not exceed the preset risk range.
  • the exercise plan generation module continuously monitors the current user’s exercise status. And analyze and process the acquired exercise monitoring data and physiological monitoring data.
  • the monitoring of exercise behavior has priority over the monitoring of abnormal physiological information on the preventive level.
  • the preset risk range includes a preset threshold range corresponding to meal time, sedentary time, exercise volume, and exercise amplitude respectively.
  • the preset risk range may be a dynamically changing value set based on the individual differences of different users.
  • the preset risk range is a limiting condition that may cause diabetes risk in the future relative to the current user. Exceeding the preset risk range will not trigger a warning, thereby meeting the dual requirements of diabetes risk control for monitoring timeliness and data sensitivity .
  • hypoglycemic drugs Especially after taking hypoglycemic drugs or injecting insulin, the hypoglycemic effects of the drugs and exercise are superimposed at the same time, which is most likely to cause hypoglycemia. Especially if you exercise within half an hour after insulin injection or hypoglycemic drugs are taken, the absorption of hypoglycemic drugs will be accelerated and hypoglycemia is more likely to occur.
  • diabetic patients often have diseases such as hypertension and dyslipidemia.
  • diabetes and its comorbidities are not controlled for a long time, a variety of other complications will occur, such as nephropathy, neuropathy, and cardiovascular disease. Disease, retinopathy, musculoskeletal disease, etc.
  • different exercise risk models/exercise monitoring schemes are needed according to personal conditions. For example, for patients with complications of retinopathy: mild retinopathy can choose moderate and low-intensity aerobic exercise, and weight lifting and other apnea activities should be avoided; moderate and low-intensity aerobic exercise can also be chosen for moderate retinopathy. In addition, it is necessary to avoid moving the head downwards. For severe retinopathy and the risk of fundus bleeding, exercise should be strictly restricted, and only some low-intensity exercises are recommended.
  • the early warning system proposed by the present invention prioritizes the first-level risk early warning through the first-level risk early warning refers to the current movement data of the current user acquired based on the first-level risk early warning conditions determined by user data, especially medical history data. Perform comparative analysis.
  • the first-level risk early warning conditions are determined based on the medical history data associated with the exercise data in the user data.
  • the medical history data includes the history of complications (such as type of complication, severity of complications, number of complications, etc.), drug treatment plan (such as time of hypoglycemic drugs, dosage of hypoglycemic drugs, insulin injection time, insulin injection dose) Etc.), and the preliminary prediction result of diabetes determined by the first processor 1.
  • the storage device of the diabetes risk early warning system pre-stores first-level risk early warning conditions containing at least one attribute, and at least one pre-stored sports risk model can be accessed based on the attributes of the first-level risk early warning conditions. Several attributes correspond to at least one feature.
  • Features refer to different types of user data, such as one or more of the current user's diabetes stage, diet monitoring data, drug monitoring data, medical history data, geographic location information, physiological monitoring data, and physical fitness evaluation data.
  • the attribute refers to the restriction condition of at least one motion monitoring data corresponding to each feature. Attributes include athletic ability level and exercise program level.
  • the feature of the preliminary diabetes prediction result determined by the first processor 1 which corresponds to the attribute of the exercise program level.
  • the preliminary diabetes prediction result determined by the first processor 1 is a level II early warning, and the level attribute of the exercise plan is level A (or numerical).
  • the feature of the type of complication corresponds to the attribute of the level of the exercise plan.
  • the attribute of the level of the exercise plan is B (or numerical).
  • the level attribute of the exercise plan is B (or numerical)
  • the characteristics of the drug treatment plan when the user's antidiabetic drug taking time does not exceed the preset hypoglycemic
  • the level attribute of the exercise program is Grade A (or numeric)
  • Grade A is finally adopted as the exercise program level of the user.
  • physical fitness evaluation data when the user's physical fitness evaluation data is not up to standard, its athletic ability level is C level.
  • the exercise risk model of regular exercise intensity includes several exercise programs, such as standing, walking, and doing housework. Each exercise program includes sedentary time, Exercise intensity, exercise time, exercise duration, exercise frequency and their respective appropriate control ranges.
  • the exercise program generation module has obtained several exercise programs that meet the current user's situation.
  • different users have different adaptability to their physical conditions when using different exercise programs. Therefore, the present invention then uses a secondary risk early warning analysis to The current exercise program is further analyzed, and the individual differences of users are fully considered, and the way of controlling diabetes risk from the level of exercise intervention is to provide users with a safer and more effective exercise treatment program.
  • the exercise plan generation module is also configured to perform a secondary risk early warning analysis in the case that the current user's current exercise data does not meet the primary risk early warning condition. Specifically: the exercise plan generation module is also configured to determine the association relationship between the current user’s current exercise monitoring data and the physiological information based on the statistical change trend curve between the current user’s historical physiological information and historical exercise monitoring data, and obtain The current physiological information about the current user provided by the smart electronic device 4 is analyzed in a trend with the determined association relationship to predict the predicted value of the physiological information under the condition of continuing the exercise, and the result is obtained after the trend analysis Predict physiological information to determine at least one exercise risk warning and/or exercise guidance program generated by the exercise risk model on the basis of excluding the exercise restriction program determined based on the user’s predicted physiological information, and use a prompt module or other smart electronic device 4 Issue exercise risk warnings and/or exercise guidance suggestions to current users.
  • the exercise risk model refers to a statistical change trend curve between historical physiological information and historical exercise monitoring data determined according to user data.
  • the statistical change trend curve can intuitively reflect the change trend of physiological information when the user performs various exercises, and provide a basis for analysis and prediction for the user's next exercise.
  • the exercise plan generation module generates exercise intervention information about the current user based on the exercise risk model.
  • the exercise intervention information includes exercise risk warnings and/or exercise guidance recommendations.
  • the sports intervention information provides intervention and prevention suggestions for the current user's sports behavior from two directions.
  • the sports risk warning allows the current user to know which sports or actions have potential risks, so that the user is not only in the next exercise In the process, such dangerous actions can be avoided, and more importantly, the user can also clearly need to avoid such dangerous actions in his life thereafter, which is beneficial to improve the user's treatment effect in the short and long term.
  • Exercise risk warnings and/or exercise guidance recommendations are generated on the basis of excluding the current user’s restricted exercise program.
  • the restricted exercise program is determined based on the user's predicted physiological information. While analyzing and determining the restricted exercise program, the exercise therapy program that is not suitable for the current user or the two attribute values of the exercise ability level and the exercise program level are obtained. In this way, after excluding the part of the obtained exercise risk model related to the current user that satisfies the restricted exercise plan, the exercise risk model is updated based on it and the exercise plan, and then it is fed back to the current user for viewing or prompting.
  • the predicted physiological information is predicted by trend analysis between the current physiological information and the associated relationship of the current user. Among them, the predicted physiological information refers to the predicted value of the physiological information when the exercise is continued.
  • the association relationship is determined based on the statistical change trend curve between the current user's historical physiological information and historical exercise monitoring data. The association relationship refers to the prediction of the change trend between the current user's current exercise monitoring data and physiological information.
  • the early warning system in this embodiment further includes the sensor module, the second processor 2 and the exercise program adjustment module as described in Embodiment 2.
  • the sensor module is configured to collect initial data of the diabetes risk early warning system, parameters applied by the system, and user data about the current user.
  • the second processor 2 is configured to: after monitoring that the current user is performing the exercise treatment plan generated by the above exercise plan generation module and the prediction result generated by the first processor 1 regarding the current user is level II In this case, using the data set collected by the sensor module according to the salient features filtered by machine learning and the relationship between the extracted exercise monitoring data and the autonomous behavior ability to identify the user’s exercise diabetes risk, the exercise program adjustment module It is configured to dynamically adjust the configuration parameters in the exercise plan based on the relationship between the exercise monitoring data analyzed and determined by the second processor 2 and the autonomous behavior ability to identify exercise diabetes risk.
  • the system provided by the present invention includes: an intelligent electronic device 4 operated or worn by the current user.
  • the smart electronic device 4 is provided with a first processor 1, a second processor 2, a motion plan generation module, a sensor module, a motion plan adjustment module, and the like.
  • Several processors arranged on the smart electronic device operated or worn by the current user communicate with the smart electronic device operated or worn by the current user/caregiver through a computer network.
  • the first processor 1, the second processor 2, the motion plan generation module, the motion plan adjustment module, and other processors may be interconnected by a communication bus (solid line) such as a motherboard.
  • the first processor 1 and at least one intelligent electronic device send the collected and processed data to the exercise plan generation module and the second processor 2.
  • the exercise plan generation module caches the data and sends the processed data to at least one smart electronic device (for example, a smart phone operated by the current user) or prompt module, and the exercise plan adjustment module.
  • the exercise plan generation module transmits the generated exercise plan to the exercise plan adjustment module, so that the exercise plan adjustment module dynamically adjusts its configuration parameters based on the exercise plan it generates to optimize the exercise plan of the current user.
  • the exercise program adjustment module processes the data it receives and sends the processed data to at least one smart electronic device (for example, a smart phone operated by the current user) or a prompt module, and an exercise program generation module.

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Public Health (AREA)
  • Medical Informatics (AREA)
  • Biomedical Technology (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Epidemiology (AREA)
  • Primary Health Care (AREA)
  • Databases & Information Systems (AREA)
  • Pathology (AREA)
  • General Business, Economics & Management (AREA)
  • Business, Economics & Management (AREA)
  • Measuring And Recording Apparatus For Diagnosis (AREA)

Abstract

一种糖尿病风险预警系统,该系统包括:存储设备;基于改进k-means聚类的第一处理器(1),其耦合到所述存储设备并被配置为:选择第一个聚类中心点;基于第一个聚类中心点,得到稳定的各簇中心,代入糖尿病分段函数,得到糖尿病的预警模型,其中,选定数据集,定义聚类簇数k、领域半径ε,选择样本点Xi与样本之间距离之和最大的点作为第一个聚类中心点,以使得第一个聚类中心点落在各簇类的中心部位。该系统改进了聚类中心方法,建立了糖尿病分段函数预警模型,提高了糖尿病预警能力,为糖尿病不同阶段的诊断和治疗提供依据。并从糖尿病数据集特征出发,筛选出糖尿病关键特征变量,简化了糖尿病预测模型;提高了糖尿病预测模型的准确性,从而有助于提供准确的糖尿病预防与治疗措施。

Description

一种糖尿病风险预警系统 技术领域
本发明涉及医疗信息化技术领域,具体涉及一种糖尿病风险预警系统。
背景技术
目前研究人员在对糖尿病的各个方面(诊断、病理生理学、医治过程等)进行的广泛研究产生了大量相关数据。如公开日为2017-11-28公开号为CN107403072A的专利文献所提供的一种基于机器学习的2型糖尿病预测预警方法,该方法通过K-means算法和Logistic Regression算法建立先聚类再分类的糖尿病双层预测分析模型,该方法采用K-means算法可以对数据集进行无标签的聚类分析,针对初始聚类中心的选择,该方法是通过引入分层算法——下一级Logistic Regression算法来寻求稳定的初始聚类中心,导致算法额外的计算量大大增加且根据求解问题经验来设定阈值的方式破坏了算法的收敛性,最终仍难以实现聚类结果的稳定性。
另一方面,随着糖尿病预测模型的数据特征越来越多、数据维数越来越大,带来了较多的非关键信息和冗余信息,预测模型也变得越来越复杂,传统预测方法难于直接应用于糖尿病的预测中。针对该问题,现有文献如北京交通大学柯郑林等人在Lasso及其相关方法在多元线性回归模型中的应用一文中,将Lasso方法及其相关方法的思想运用于多元线性回归模型的变量选择,其中提出了多元线性回归模型的变量选择的传统LARS算法,并通过糖尿病统计数据和一个模拟生成的多元统计数据,给出了变量选择方法的具体实现。但由于在采用传统的LARS算法在求解Lasso回归系数时,存在逼近速度慢且准确度不高的问题,并且由于LARS算法的迭代方向是根据目标的残差而定,所以该算法对样本的噪声极为敏感,因此,LARS算法难以直接用于数据特征越来越多、数据维数越来越大的糖尿病预测中。
发明内容
本发明的目的是在于提供一种糖尿病风险预警系统,针对k-means聚类算法随机选择初始聚类中心导致聚类结果不稳定的问题,提出初始聚类中心优化的改进k-means算法,并结合糖尿病分段函数,提出k-means聚类糖尿病预警模型的改进方法。此外,本发明根据PCA主成分分析,充分考虑到不同糖尿病特征对预测结果的影响,给出了改进的特征自变量与因变量相关度的计算方法,简化糖尿病预测模型,提出了基于特征权重的LARS糖尿病预测方法。
本发明所提出的糖尿病风险预警系统包括至少一个处理器、系统存储器以及至少一个计算机可读存储介质。所述至少一个计算机可读存储介质上载有其上载有用于使处理器实现本发明的各个方面的计算机可执行指令。至少一个处理器用于执行所述计算机可执行指令,如附图中的流程图和框图,其显示了根据本发明的多个实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。其中,流程图或框图中的每个方框可以代表一个模块、程序段或指令的一部分,所述模块、程序段或指令的一部分包 含一个或多个用于实现规定的逻辑功能的可执行指令。框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或动作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。这里参照根据本发明实施例的方法、装置(系统)和计算机程序产品的流程图和/或框图描述了本发明的各个方面。应当理解,流程图和/或框图的每个方框以及流程图和/或框图中各方框的组合,都可以由计算机可读程序指令实现。上述计算机可读存储介质可以是可以保持和存储由指令执行设备使用的指令的有形设备。计算机可读存储介质例如可以是但不限于是电存储设备、磁存储设备、光存储设备、电磁存储设备、半导体存储设备或者上述的任意合适的组合。处理器是解释和执行指令的功能单元,也称为中央处理器或cpu,作为计算机系统的运算和控制核心,是信息处理、程序运行的最终执行单元。存储设备可以是只读存储器(ROM)、随机存取存储器(RAM)、外部存储器如硬盘、软盘、光盘、U盘等或存储服务器。
为实现上述发明的目的,本发明采取的技术方案如下:
一种糖尿病风险预警系统,该系统包括:存储设备;第一处理器,其耦合到所述存储设备且被配置为:基于所选择的第一个聚类中心点,得到稳定的各簇中心,代入糖尿病分段函数,得到糖尿病的预警模型,其中,选定数据集,定义聚类簇数k、领域半径ε,选择样本点X i与样本之间距离之和最大的点作为第一个聚类中心点,以使得第一个聚类中心点落在各簇类的中心部位。
一种装置,用于基于特征权重的LARS算法来获得糖尿病预测模型的回归系数,所述装置包括分别被配置为执行所述基于特征权重的LARS糖尿病预测方法中的至少一个步骤的多个模块。
一种糖尿病风险预警系统,该系统包括:存储设备;第二处理器,其耦合到所述存储设备且被配置为:计算自变量特征权重向量和原始相关度向量;基于所述自变量特征权重向量和所述原始相关度向量,输出基于特征权重的LARS糖尿病模型的回归系数ω。
一种装置,用于构建糖尿病预测模型,所述装置包括分别被配置为执行所述糖尿病预测模型的构建方法中的至少一个步骤的多个模块。
一种糖尿病风险预警系统,该系统包括:存储设备;至少一个处理器,其耦合到所述存储设备且被配置为:基于所选择的第一个聚类中心点,得到稳定的各簇中心,代入糖尿病分段函数,得到糖尿病预测模型,其中,选定数据集,定义聚类簇数k、领域半径ε,选择样本点X i与样本之间距离之和最大的点作为第一个聚类中心点,以使得第一个聚类中心点落在各簇类的中心部位;计算自变量特征权重向量和原始相关度向量;基于所述自变量特征权重向量和所述原始相关度向量,输出糖尿病预测模型的回归系数ω。
附图说明
图1是本发明实施例中基于特征权重的LARS糖尿病预测方法的流程图;
图2是标准的LARS算法的求解图;
图3是本发明实施例中加入特征权重的LARS算法求解图;
图4是标准LARS算法的回归变量ω的变化路径图;
图5是基于特征权重的LARS算法的回归变量ω的变化路径;
图6是标准LARS算法和本发明基于特征权重的LARS算法随着迭代次数ACC的变 化曲线图;
图7是标准LARS算法和本发明基于特征权重的LARS算法随着迭代次数ROC的变化曲线图。
图8是本发明实施例中k-means聚类糖尿病预警模型的改进方法算法流程图;
图9是本发明实施例中不同算法在新糖尿病数据集上平均收敛速度对比的线型图;和
图10是本发明实施例中不同算法在新糖尿病数据集上多次聚类结果平均ARI对比的线型图;和
图11是本发明优选的糖尿病风险预警系统的简化模块连接关系示意图。
附图标记列表
1:第一处理器                           2:第二处理器
4:智能电子设备                         5:接口
具体实施方式
下面结合附图和实施例,对本发明的具体实施方式作进一步详细描述,以下实施例用于说明本发明,但不用来限制本发明的范围。
实施例1
由机器学习和PCA(Principal Component Analysis,主成分分析)理论可知,一个多维的样本中通常存在少数几个关键特征或者主成份。在糖尿病的众多特征中同样只有少数几个关键特征,研究发现采用LARS算法可以得到由关键特征表示的泛化能力更好的预测模型。泛化能力指的是经过训练的网络对于不是样本集的输入也能给出合适的输出的性质,机器学习算法是使用线性回归算法实现的,但是线性回归算法不可避免的会出现过拟合问题,训练的越多,模型就越匹配训练数据,而逐渐丧失了对新数据的“预测性”。在传统的采用LARS算法在求解Lasso回归系数时,存在逼近速度慢且准确度不高的问题,并且由于LARS算法的迭代方向是根据目标的残差而定,致使该算法对样本的噪声极为敏感。
为解决上述问题,本发明通过在LARS算法求解步骤中结合PCA算法得到的特征权重,特征自变量因权重不同被选入的可能性发生了改变,可以加快算法向关键特征的逼近速度,从而可以加快算法求解速度和准确率;并且由于采用PCA算法对各自变量的变化进行了约束,模型的鲁棒性增加。本发明所提供的系统,用于构建糖尿病预测模型,所述系统至少包括存储设备和第二处理器2,第二处理器2耦合到所述存储设备并被配置为执行基于特征权重的LARS糖尿病预测方法中的至少一个步骤。其中,基于特征权重的LARS糖尿病预测方法至少包括如下至少一个步骤:
首先定义糖尿病数据集特征矩阵:
Figure PCTCN2020080251-appb-000001
即为m条n维特征组成的矩阵,x k1,x k2,…,x kn为各特征自变量。
真实结果标签:y=(y 1,y 2,…,y m) T
结合图1所示,本发明提供一种基于特征权重的LARS糖尿病预测方法,具体步骤如下:
步骤1,归一化糖尿病数据集矩阵X,使糖尿病数据集不同特征的范围值映射到相同的0-1固定范围中。初始化当前拟合值和当前的残差。拟合值指每次迭代对真实结果的拟合,初始化为0。残差为真实结果与拟合值的差,计算公式1为:
Figure PCTCN2020080251-appb-000002
y为真实结果标签向量,μ为当前拟合值。
步骤2,计算每个特征自变量初始权重,计算方式3为:
Figure PCTCN2020080251-appb-000003
式中,
Figure PCTCN2020080251-appb-000004
为特征方程
Figure PCTCN2020080251-appb-000005
的特征值。特征方程中R为糖尿病数据集矩阵X的协方差矩阵,其计算公式4为:
Figure PCTCN2020080251-appb-000006
其中,
Figure PCTCN2020080251-appb-000007
θ i为第i个特征的均值。
再计算每个特征自变量与真实结果的初始相关度,计算公式2为:
c=X Ty           (2)
Lasso模型可以得到稀疏解,得到的糖尿病预测模型泛化能力更好,LARS算法可以求解Lasso模型,但应用于糖尿病数据集时存在逼近速度慢和准确率不高的问题。而本发明通过PCA主成分分析改进LARS算法,得到不同特征的权重,区分各特征自变量的重要程度,因此提供基于特征权重的LARS糖尿病预测方法。
结合图3所示,步骤3:从X中取出与y同向的列向量,令其为X A,X A为从X中取出的A指标集里的列向量。
再通过公式8和公式7计算X A中向量的角平分线u A
Figure PCTCN2020080251-appb-000008
Figure PCTCN2020080251-appb-000009
u A=X Aω A          (8)
其中,1 A为k维所有元素均为1的列向量,k为A中元素个数。
计算新相关度,计算公式5为:
C=c Tβ                  (5)
其中c=X T(y-μ A),μ A为前一步的拟合值,β为通过PCA算法求得各权重的度量值向量,再得到C中的最大值公式6为:
C_max=max{|C|}                  (6)
由步骤4计算公式即可以更新回归系数,当前拟合值和当前残差,更新回归系数向量的计算公式9:
ω A=ω A+γω A                  (9)
拟合值向量的计算公式10:
μ A=μ A+γu A                  (10)
残差向量的计算公式11:
Figure PCTCN2020080251-appb-000010
其中γ为沿着角平分线u A的前进步长,设a=X Tu A,γ计算公式12为:
Figure PCTCN2020080251-appb-000011
式中,min上面的加号表示只计算集合中正数的最小值,C i,a i分别为C,a中第i个元素,且i的取值为使得
Figure PCTCN2020080251-appb-000012
取得最小值的i。
步骤5,判断步骤4中残差的L2范数是否小于某个容忍度,若是则结束,输出回归系数;否则重复步骤3到步骤5。
其中,回归系数(regression coefficient)在回归方程中表示自变量x对因变量y影响大小的参数。回归系数越大表示x对y影响越大,正回归系数表示y随x增大而增大,负回归系数表示y随x增大而减小。例如回归方程式Y=bX+a中,斜率b称为回归系数,表示X每变动一单位,平均而言,Y将变动b单位。
采用Lasso模型可进行特征选择得到糖尿病关键特征,根据PCA主成分分析,筛选出糖尿病关键特征变量,简化了糖尿病预测模型;提高了糖尿病预测模型的准确性,为提供更准确的糖尿病预防与治疗措施作前期铺垫。
将本发明提出糖尿病预测方法与标准的LARS方法进行对比,以回归系数路径、预测准确率(ACC)曲线和模型的受试者特征(Receiver Operating Characteristic,ROC)曲线等为评判指标。其中,回归系数路径可以直观的看出各特征自变量系数的变化情况,ACC曲线则可以直观的比较算法的逼近速度与准确率,ROC曲线则是一种度量非均衡问题的工具,曲线下面积越大则模型越好。
从直观上看,标准的LARS算法是先找到与y相关度最大的自变量x k,并用其对y逼近,直到出现另一个x l,它与y的相关度和x k与y的相关度相等,此时开始以x k与x l的角分线方向逼近y,同样,当出现第三个x p与因变量的相关度足够大时,也将它加入到逼近队列中,取三个向量共同的角分线方向(“角分线”指的是高位向量的平分线),如此继续,直到残差足够小或已取得了所有自变量,算法结束。如图2所示,初始时x 1 与y的相关度较高,用它进行逼近,直到y出现在x 1与x 2的角分线上,此后用x 1与x 2的角分线方向逼近因变量y,标准LARS算法保持了前项选择算法的复杂度,最大只需要m步,m为自变量的个数,同时保证了结果时自变量子空间中最优的。结合图2和图3,分别为标准的LARS算法和加入特征权重的LARS算法求解图,以两个特征自变量为例,在每次逼近前都会进行相关度计算。由附图3可以看出加入了特征权重后,两个特征自变量因权重不同被选入的可能性发生了改变,逼近方向也发生了改变,最终导致回归系数发生改变。
结合图4和图5,分别为标准LARS算法和基于特征权重的LARS算法的回归变量ω的变化路径。坐标纵轴为ω值大小,横轴为迭代次数,比较图4和图5发现基于特征权重的LARS算法8个特征自变量的回归系数路径都发生了改变,由于使用PCA算法得到了初始权重,对各自变量的变化进行了约束,因此回归系数并不会差别太大,模型的鲁棒性增加,其次,由于较大权重的自变量的回归系数会增加更快,使结果更加合理,如糖尿病遗传函数和年龄比怀孕次数对是否患病的影响更大。其中,2小时血浆葡萄糖浓度是用以了解胰岛β细胞功能和机体对血糖的调节能力,其广泛应用于临床实践中;成人正常的舒张压为<90mmHg(12kpa);肱三头肌皮下脂肪厚度,如果成人的肱三头肌皮肤皱壁厚度男性大于10.4mm、女性大于17.5mm则属于肥胖;两小时血清胰岛素,血清胰岛素是机体内唯一降低血糖的激素,也是唯一同时促进糖原、脂肪、蛋白质合成的激素。成人正常指标为29~172pmol/L,大于60岁的正常指标为42~243pmol/L;体重指数,成人正常值为21-23kg/m2,个人健康值为18.5-24.9kg/m2,高风险值为25.0-29.9kg/m2,BMI=体重公斤数/(身高米数) 2
结合图6两种算法随着迭代次数ACC的变化曲线。可以看出基于特征权重的LARS在逼近速度和准确率均高于标准的LARS,基于特征权重的LARS算法在迭代3次后就达到了标准LARS最好的ACC,在迭代第5次后。基于特征权重的LARS和标准LARS的ACC都达到最高,基于特征权重的LARS的ACC比标准LARS高了约0.8个百分点。同时可以看出两条曲线在迭代后面几次时ACC都下降了,由于每次迭代回归系数被压缩的程度在减小,当压缩系数α为0时已无法对回归系数进行压缩,反而使得ACC下降。结合表1、表2分别是标准LARS和基于特征权重的LARS不同的迭代次数n对应的压缩系数α值以及回归系数ω值,从两张表中可以看出第五次迭代ω都有三个为0,表明最后的模型中三个自变量的回归系数为0。这时模型得到精简同时ACC达到最高。
表1 标准LARS不同的迭代次数n对应的压缩系数α值以及回归系数值
Figure PCTCN2020080251-appb-000013
表2 基于特征权重的LARS不同的迭代次数n对应的压缩系数α值以及回归系数ω值
Figure PCTCN2020080251-appb-000014
结合图7为阈值t从0开始以0.01为步长增至1计算得出的100个伪阳率和真阳率,进而得到的两种LARS算法的ROC曲线,红色虚线为随机猜测的ROC曲线,可以看出基于特征权重的LARS的ROC曲线更加靠近左上角,通过计算ROC曲线下面积AUC,基于特征权重的LARS的AUC为0.8953,标准LARS的AUC为0.8664,特征权重的LARS算法的AUC最高。
综上,基于特征权重的LARS算法计算得到的糖尿病预测模型的ACC要高于标准LARS,且可以更快的逼近到最优模型。另外处理此次糖尿病不均衡样本的时候,基于特征权重的LARS算法的ROC曲线的AUC值要高于标准LARS。因此基于特征权重LARS算法在求解糖尿病预测模型时优于标准LARS算法。
实施例2
针对实施例1所提出的基于特征权重的第二处理器2,本实施例对其作了进一步的改进,重复的内容不再赘述。具体地,本实施例提出了一种糖尿病风险预警系统,该系统至少包括如前述实施例1所述的第二处理器2,耦合至该处理器的存储设备,以及它们之间的接口5。
该糖尿病风险预警系统适用于高危糖尿病患者的康复运动风险管理。
结合图8所示,该糖尿病风险预警系统至少包括传感器模块、第二处理器2以及运动方案调整模块。所述传感器模块被配置为采集该糖尿病风险预警系统的初始数据、系统应用的参数以及关于当前用户的用户数据,所述第二处理器2被配置为利用由传感器模块所采集到的数据集根据机器学习筛选出的显著性特征以及提取出的运动监测数据与自主行为能力之间的关系来辨识用户运动糖尿病风险,所述运动方案调整模块被配置为基于由所述第二处理器2所分析确定的运动监测数据与自主行为能力之间的关系以辨识运动糖尿病风险的方式来动态地调整运动方案中的配置参数。
针对传感器模块进一步说明:该传感器模块用于采集糖尿病风险预警系统的初始数据、系统应用的参数以及关于当前用户的用户数据。所采集到的数据被收集到数据库中,并将集成的数据集传送至第二处理器2。主要通过以下两种形式来建立传感器模块:例如佩戴于当前用户手腕处的智能手表,用于采集或监测运动过程当前用户的生理监测参数以及运动监测数据。优选地,智能电子设备4可以是公告日为2011年11月30日的公开号为CN202051710U的专利文献所提供的小型且便携的自混合相干激光雷达无创血 糖测量装置,其通过激光雷达调频连续波与自混合相干技术相结合实现对用户的无创血糖测量。运动监测数据包括久坐时长、运动强度、运动时间、运动时长、运动频率、运动类型等。该机械器材上设置有捏力传感器、握力传感器、扭力传感器、肌电采集器、至少一个数字量变送器、关节活动度传感器,分别用于采集上肢或下肢的运动能力数据。捏力传感器/握力传感器/扭力传感器采集的模拟电压信号经过数字量变送器处理后变为数字电压信号。肌电采集器包括电极、低通滤波电路模块、带通放大电路模块和模数转换电路模块,人体体表的生物肌电信号经肌电采集器采集后转换为数字电压信号。关节活动度传感器例如可以是倾角传感器或角度传感器。
针对第二处理器2进一步说明:第二处理器2被配置为利用由传感器模块所采集到大量数据集,根据其所提取出的运动监测数据与自主行为能力之间的关系,来辨识用户运动糖尿病风险。对传感器模块所发送来的大量采集数据集进行分析,建立糖尿病预测模型,用以辨识用户运动糖尿病风险。
其中“运动监测数据”指的是实时所监测到的当前用户实施运动方案时的数据。“自主行为能力”指的是第二处理器2对关于当前用户的大量数据进行评估后所输出的当前用户当前阶段下的独立完成的行为能力。自主行为能力用至少一个运动能力评估数据来表示,运动能力评估数据用以描述当前用户当前阶段下的可独立完成的行为能力程度,运动能力评估数据是根据用户历史运动监测数据例如动作时长、动作幅度或是动作频率中的至少一个数据来生成的。自主行为能力用于为辨识用户运动糖尿病风险提供对比依据,运动监测数据是用于描述当前用户在实施运动方案时的运动能力实时数据,基于运动监测数据与自主行为能力之间的关系,计算运动能力实时数据超出运动能力历史数据的负荷数据。再将所计算得到的负荷数据与预设负荷数据阈值进行比对,以此基于负荷数据能够对运动糖尿病风险进行预测。
即在未实施运动方案时就对用户的当前阶段下可独立完成的行为能力的程度进行评估,用于为辨识用户运动糖尿病风险提供对比依据,在监测获得用户运动能力的实时数据时,将实时数据与评估数据进行比对,并判断当前所实施的运动方案是否超出预设的控制条件,在判断确定运动方案超出预设的控制条件的情况下,可以动态地相应地调整运动方案中至少一个配置参数例如握力要求、力度要求、关节活动度要求等运动能力数据。
其中“糖尿病预测模型”用于描述运动监测数据与自主行为能力之间的关系,其预设有负荷数据阈值且根据用户操作来调整负荷数据阈值或是基于由第二处理器2对大数据的分析来自动调整负荷数据阈值。
但是,在建立糖尿病预测模型时,由于由传感器模块所发送来的大量采集数据集的查找空间很大,导第二处理器2评估数据的计算速度大大降低;并且,由传感器模块所发送来的采集数据集中许多数据与运动能力数据和/或糖尿病风险无关,因此需要花费额外的时间来学习剔除这些无关项,导致第二处理器2评估数据的复杂度以及反馈时长增加。
针对上述现有技术之不足,优选地,根据机器学习筛选出的显著性特征来确定需提取的运动监测数据与自主行为能力之间的关系。在分析运动监测数据与自主行为能力之间的关系之前,剔除多个特征中的无关特征以确定与糖尿病风险相关性较高的显著性特征。
对第二处理器2进一步说明:所述第二处理器2先是通过其与其他智能电子设备4进行信息交互的方式获取到大量糖尿病诊断病例样本,并根据机器学习筛选出运动方案中与糖尿病风险相关性较大的若干个显著性特征。其中,根据机器学习筛选出的显著性特征指的是通过将训练集输入如实施例1中基于特征权重的LARS糖尿病模型后所生成的输出集。
训练集指的是大量糖尿病诊断病例样本,各病例样本至少包括其运动方案数据例如久坐时长、运动强度、运动时间、运动时长、运动频率、运动类型、糖尿病风险变化趋势。例如针对某一病例样本其在某一时间段内采取的运动方案,根据实施运动方案前后的糖尿病风险指标变化确定糖尿病风险变化趋势,糖尿病风险指标例如是血糖峰值、心率峰值、血压峰值等。定义Lasso回归模型,输入若干运动方案数据作为待筛选特征(或称糖尿病数据集特征矩阵),确定糖尿病风险变化趋势为筛选目标(或称真实结果标签),采用基于特征权重的LARS算法对模型进行求解输出筛选出的显著性特征以及各自对应的回归系数值。
以此,通过采用在逼近速度和准确率上均高于标准LARS的基于特征权重的LARS算法,筛选出的显著性特征能够有效地剔除数据中与运动能力数据和/或糖尿病风险无关的无关项,能够降低糖尿病预测模型实时评估数据时的复杂度以及反馈时长。
基于所确定的显著性特征,所述第二处理器2利用由传感器模块所采集到的数据集根据机器学习提取的运动监测数据与自主行为能力之间的关系来辨识用户运动糖尿病风险。
优选地,运动方案调整模块被配置为基于运动监测数据与自主行为能力之间的关系来动态地调整运动方案中的配置参数。
实施例3
结合图8,首先采用Pima糖尿病数据集,因为现有k-means算法采用随机选取初始聚类中心,易导致聚类结果不稳定,因此需对初始聚类中心的选择进行改进,使其尽可能地落在各簇类的中心部位。其中,Pima糖尿病数据集指的是被广泛应用的University of California,Irvine(UCI)机器学习数据库中的Pima Indian Diabetes数据集。
首先,本发明所提供的系统,用于构建糖尿病预测模型,所述系统至少包括存储设备和第一处理器1,第一处理器1耦合到所述存储设备并被配置为执行基于改进k-mean s聚类的糖尿病预警方法中的至少一个步骤。其中,基于改进k-means聚类的糖尿病预警方法至少包括如下至少一个步骤:
(1)第一个聚类中心点选择。选定数据集,定义聚类簇数k、领域半径ε,选择样本点x i与样本之间距离之和最大的点作为第一个聚类中心点;
(2)选择新的聚类中心。计算每个样本点与其最近聚类中心的距离之和Sum(D(x)),在Sum(D(x))内取一个随机值Random,计算Random-=D(x),直到Random≤0,选择得到新的聚类中心;
(3)遍历操作。重复上一步骤直至得到所需k个中心点,记为{μ j,j=1,...,k};
(4)簇标记。计算每个样本与聚类中心的距离,根据其距离最近确定样本的簇标记,并将样本划入相应的簇;
(5)更新操作。更新所有聚类中心点;
(6)糖尿病预警模型。得到稳定的各簇中心,代入糖尿病分段函数,得到糖尿病的 预警模型。
采用改进k-means聚类算法有效克服了聚类结果不稳定的问题,结合改进k-means聚类算法和糖尿病分段函数相结合,建立了k-means聚类糖尿病预警模型的改进方法,提高了糖尿病预警能力,为糖尿病不同阶段的诊断和治疗提供了依据。
针对上述步骤,如下逐一进行详细说明:
首先定义聚类簇数k、领域半径ε,计算每一个点与第一个聚类中心点的距离dist(x),选取dist(x)较大的点作为新的聚类中心,即对每一个dist(x)求和得到:sum i=sum i+dist i,i为聚类中心个数。
最大的Sum(dist(x))为的第一个聚类中心点,即:sum_max=max(sumi)。
选择新的聚类中心,计算每一个点与第一个聚类中心点的距离为dist(x),选取dist(x)较大的点作为新的聚类中心,即对每一个dist(x)求和得到Sum(dist(x)),取一个在Sum(dist(x))内的随机值Random,重复通过公式计算,所述公式为:Random=Random-dist(x)。
直至Random≤0,则该点为下一个聚类中心点,保证距离较大的dist(x)被较大概率选中,并将所需k个中心点,记为{μ j,j=1,...,k}。
标记样本簇,是计算每个样本x i与聚类中心{μj,j=1,...,k}的距离dist od,根据其距离最近确定样本x i的簇标记λ i,并将样本x i划入相应的簇:
Figure PCTCN2020080251-appb-000015
更新所有聚类中心点,是计算所有新的聚类中心,其公式为:
Figure PCTCN2020080251-appb-000016
构建糖尿病预警模型,是根据上述步骤得到稳定的各簇中心,代入糖尿病分段函数,得到糖尿病的预警模型,糖尿病预警分段函数为:
Figure PCTCN2020080251-appb-000017
其中,μ i(i=1,2,3)为第i个聚类中心,0表示健康、1表示I级预警、2表示II级预警,利用该预警模型来预测是否患糖尿病及糖尿病所处阶段。
为进一步验证本发明所提出的模型的有效性,如下将对本发明提出改进k-means聚类糖尿病预警模型的改进方法与标准k-means聚类、背景技术提及的非专利文献[1]、非专利文献[2]等方法进行对比,以同质性、完整性、FMI、ARI均值、CHI、平均收敛速度、平均收敛次数和算法时间等为评判指标,通过这些指标及曲线进行对比分析。
其中,作为聚类效果的评价指标之一的ARI(Adjusted Rand Index(兰德指数),ARI取值范围为[-1,1],从广义的角度来讲,ARI衡量的是两个数据分布的吻合程度,值越大说明聚类效果越好。其中,文献[1]指的是:刘荣凯,孙忠林.针对K-means初始聚类中心优化的PCA-TDKM算法[J].软件导刊,2018,17(09):85-87.提出了PCATDKM算法在传统的Kmeans算法中增加了PCA、TD与最大最小距离算法。PCA算法能够对数据对象集合进行降维,加速聚类过程。TD算法能够在选择初始聚类中心时根据数据对象的实际分布情况进行动态选择,使得通过聚类算法得到的初始k个聚类中心与实际聚类相对应。文献[2]指的是:Yuan Q L,Shi H B,Zhou X F.An optimized initialization center K- means clustering algorithm based on density[C]//IEEE International Conference on Cyber Technology in Automation,Control,and Intelligent Systems(CYBER),Shenyang,IEEE,2015:790-794.提出了一种优化K均值初始中心点的方法.该算法利用密度敏感的相似性度量来计算物体的密度.通过计算该点与其他密度较高的点之间的最小距离,选出候选点。然后,结合平均密度,筛选出离群点。最后筛选出K-均值算法的初始中心.实验结果表明,该算法获得的初始中心点精度高,能够有效地滤除异常。
结合图8,指出如何修改标准的k-means聚类算法,采用Pima糖尿病数据集,选用240例数据作为实验样本,其中训练集200例,测试集40例;使用python对算法进行编程,设计了不同算法的对比分析。
表3为标准k-means、改进k-means、文献[1]、文献[2]、Agglomerative等5种算法在糖尿病数据集上运行300次得到的ARI均值,从表1可以看出,使用改进k-means算法、文献[1]算法和文献[2]算法得到模型的ARI均值均明显高于使用标准k-means算法的,其中本文的改进k-means和文献[2]算法都结合了密度的思想,得到的模型ARI值要好于文献[1]算法的。但无论是标准k-means算法还是改进k-means算法,得到模型的表现都不如基于密度的聚类算法Agglomerative算法,这是由于基于密度的Agglomerative算法初始聚类中心在密度可达距离参数确定后,聚类结果很稳定,但是其实在处理高维数据时,由于算法本身的特点,没有k-means算法的扩展性好。
表3 不同算法在新糖尿病数据集上的ARI均值
Figure PCTCN2020080251-appb-000018
表4为标准k-means、改进k-means、文献[1]、文献[2]、Agglomerative等5种算法在新糖尿病数据集上在同质性、完整性、FMI、ARI均值、CHI等5个指标上的均值。从表2可以看出,本发明得到的模型在5种指标上的表现均好于使用标准的k-means算法,同时也好于使用文献[1]算法和Agglomerative算法得到的模型,略好于使用文献[2]算法得到的模型。可以发现在ARI和CHI使用另外4种算法的模型都较明显的好于使用标准k-menas算法的模型,但在同质性、完整性和FMI上5种算法的模型表现相差不大,这是因为这三种指标主要用来衡量聚类结果的准确率,可以看出使用标准k-means算法的模型在训练集上准确率并不是很差,但是由于算法的不稳定导致得到的模型分布较差,意味着模型的泛化性能较差。
表4 不同算法的模型在新糖尿病数据集上在5个指标上的均值
Figure PCTCN2020080251-appb-000019
Figure PCTCN2020080251-appb-000020
结合图9,这里以一次聚类的ARI作为纵坐标,一次聚类中算法的迭代次数为纵坐标,运行300次求得一次聚类的迭代次数均值和ARI均值。可看出,本发明算法、文献[1]算法、文献[2]算法在开始迭代时ARI值更高,由于改进了初始中心选取方法,得到的初始聚类中心更准确,可知本发明算法、文献[1]和文献[2]算法一次聚类中迭代次数明显更少,其中本发明算法次数最少。
表5是标准k-means、改进k-means、文献[1]、文献[2]等4种算法得到模型的平均收敛次数和算法时间,可以看出使用标准k-means算法迭代次数基本上是其他改进算法的两倍。不过从平均一次的算法时间可以看出使用标准k-means算法求解模型时间并不是最长,文献[1]和[2]算法时间都超过它,这是由于文献[1]和[2]加了过多的数学计算,虽然减少了迭代次数,但是一次聚类算法时间却更长,本发明算法虽然也加了密度计算,但只是计算了一次,更多的是结合了概率的思想,不需重复计算整个数据集矩阵。
表5 不同算法在新糖尿病数据集上平均收敛次数和算法时间
Figure PCTCN2020080251-appb-000021
结合图10,纵坐标ARI值是5种数据集的每次结果ARI的均值,横坐标为聚类次数。从图10可以看出,Agglomerative算法因为算法本身特点,每次聚类结果是一样的,所以是一条直线;标准k-means算法求得的模型结果上下波动剧烈,本发明算法和文献[2]算法求得的模型结果都表现较好;通过计算曲线的方差得知,本发明算法为3.19*10 -5,文献[2]算法的为6.68*10 -5,文献[1]算法的为2.94*10 -4,而标准k-means算法的为2.78*10 -3,可见本发明算法得到的模型最为稳定,文献[2]算法次之,而标准k-means算法得到的模型最不稳定。
综上,本发明算法、文献[1]算法、文献[2]算法和Agglomerative算法得到的模型指标均优于标准k-means算法;本发明算法收敛情况和算法时间最好,文献[1]和文献[2]虽然收敛情况优于标准k-means算法,但算法时间要更长;本发明算法、文献[1]算法、文献[2]算法得到的模型比标准k-means算法更稳定,其中本发明算法得到的模型最为稳定。
基于此,本发明将改进k-means聚类算法和糖尿病分段函数相结合,发明了一种k-means聚类糖尿病预警模型的改进方法,克服了k-means算法聚类结果不稳定的问题,提高了预警模型的准确性和稳定性。
实施例4
针对实施例1所提出的第二处理器2与实施例3所提出的第一处理器1,本实施例对其作了进一步的改进,重复的内容不再赘述。具体地,结合图4,本实施例提出了一种糖尿病风险预警系统,该系统至少包括如前述实施例3所述的第一处理器1以及第二处理器2,分别耦合至第一处理器1以及第二处理器2的存储设备,以及它们之间的接口5。
针对用户例如糖尿病早期患者,在医师的建议下可能自身具有主动运动的意识,但具体运动过程中,极其容易出现运动过量或是运动时间不当等引发潜在危险的情况的问题。现有技术中无论是可穿戴智能设备还是专利文献中所提供的糖尿病患者专用的管理系统,由于其监测的目标始终是用户的生理信息,并在生理信息异常时即给出警报,生理信息的变化是由用户行为所引发的,因此此类系统均存在监测时刻滞后严重以及数据敏感性过高的问题,无法为用户提供更为及时可靠的糖尿病风险控制。区别于上述现有技术,本发明所提供的糖尿病风险预警系统是在用户的生理信息异常之前就对用户行为是否存在潜在风险的情况进行监测,基于与个体差异息息相关的用户行为,从运动干预的层面分级地控制糖尿病风险以提升治疗效果,消除了监测时刻滞后严重以及数据敏感性过高的问题。
该糖尿病风险预警系统用于对当前用户的糖尿病发生风险进行控制,尤其是对其由于运动过程所引起的糖尿病发生风险进行控制。其中,“当前用户”包括糖尿病早期患者和/或糖尿病患者。糖尿病早期患者指的是存在发展成糖尿病的前期倾向的个体。“糖尿病发生风险”包括从糖尿病早期发展为糖尿病的风险和/或引发糖尿病发病的风险。该糖尿病风险预警系统可以是可穿戴智能设备、智能移动终端等。
该糖尿病风险预警系统包括第一处理器1,该第一处理器1被配置为利用糖尿病预警模型来预测当前用户是否患糖尿病及糖尿病所处阶段。预测结果包括健康、I级预警、II级预警中之一。
该糖尿病风险预警系统还包括运动方案生成模块。运动方案生成模块用于获取关于当前用户的运动监测数据并根据用户数据执行一级风险预警,以确定关于当前用户的运动风险模型。“运动监测数据”至少包括久坐时长、运动强度、运动时间、运动时长、运动频率、运动类型等。运动监测数据是通过运动方案生成模块与其他智能电子设备4进行信息交互来获取的。“用户数据”至少包括当前用户的糖尿病所处阶段、饮食监测数据、药物监测数据、就诊历史数据、地理位置信息、生理监测数据、身体素质评估数据等。用户数据可以是通过运动方案生成模块与其他智能电子设备4进行信息交互来获取的。“饮食监测数据”可以是基于对当前用户所拍摄的饮食图片进行分析处理所获取到的,或是由当前用户记录于智能移动终端上的饮食时间及食物种类分量等获取到的。同样地,“药物监测数据”可以是基于该用户的药物治疗方案及由当前用户所记录的服药时间来获取的。“就诊历史数据”包括该用户患有的并发症、医师所推荐的运动治疗方案、药物治疗方案等。上述运动方案生成模块/智能电子设备4可以是可穿戴智能设备如智能手环、智能移动终端如智能手机等。“身体素质评估数据”可以是身体质量指数或称BMI,BMI被定义为体重(以千克计)除以身高(以米计)的平方(单位kg/m2)。
以下通过对“一级风险预警”进行详细说明以进一步明确本发明为解决现有技术所存在的监测时刻滞后严重以及数据敏感性过高的问题所提出的解决方案:
一级风险预警是在运动方案生成模块分析确定当前用户的运动监测数据超出预设风 险范围而其生理监测数据未超出预设风险范围的情况来执行的。换句话说,在生理监测数据未超出预设风险范围时,即当前用户的当前状态无法从生理信息来判断是否存在潜在风险的时候,运动方案生成模块持续地对当前用户的运动情况进行监测,并对获取到的运动监测数据以及生理监测数据进行分析处理。运动行为的监测在预防层面上优先于生理信息异常的监测。
在运动监测数据超出预设风险范围时,即当前用户的运动情况可能存在潜在风险时,执行一级风险预警,以确定与当前用户相适配的运动风险模型。其中,预设风险范围包括用餐时间、久坐时长、当天运动量、运动幅度分别对应的预先设置的阈值范围。预设风险范围可以是基于不同用户个体化差异分别设定的动态变化值。例如血糖异常情况多发的餐后1~2h,例如高强度身体锻炼也无法抵消负面影响的长时间久坐,例如不同运动类型各自对应的适宜运动时长或是统计当天该用户已完成的运动量,或是预先设定的身体活动幅度及持续进行时间,以对用户持续进行的幅度过大的身体活动及时地进行监测。预设风险范围是相对于当前用户而言可能后续引发糖尿病风险的限制条件,预设风险范围的超出不会引发警告,以此同时满足了糖尿病风险控制对监测及时性与数据敏感性的双重要求。
由于无论有氧运动还是无氧运动,只要糖尿病人进行运动,就会有降糖效果,随之而来的还有低血糖风险。特别是服用降糖药或注射胰岛素后,药物与运动两者的降糖作用同时叠加,最容易引发低血糖。尤其是在胰岛素刚注射完或降糖药服用半小时内就运动,会加快降糖药物的吸收,更易发生低血糖。此外,糖尿病患者除血糖升高外,常合并高血压、血脂紊乱等疾病,如果糖尿病及其合并症长期得不到控制,还会发生多种其他并发症,如肾病病变、神经病变、心血管病变、视网膜病变、肌肉骨骼病变等。以此在获取到当前用户合并有不同疾病的并发症时,需要因人制宜不同的运动风险模型/运动监测方案。如针对患有视网膜病变并发症的患者:轻度视网膜病变可选择中、低强度的有氧运动,且需避免举重等闭气活动;中度视网膜病变也可选择中、低强度的有氧运动,且需避免头部向下等用力活动;重度视网膜病变有眼底出血危险者,需严格限制运动,仅建议进行部分低强度运动。
对此本发明所提出的预警系统通过一级风险预警优先地对一级风险预警是指基于由用户数据尤其是就诊历史数据所确定的一级风险预警条件对获取到的当前用户的当前运动数据进行比对分析。一级风险预警条件是基于用户数据中与运动数据相关联的就诊历史数据所确定的。
就诊历史数据包括并发症发病史(例如并发症类型、并发症严重程度、并发症发病次数等),药物治疗方案(例如降糖药服用时间、降糖药服用剂量、胰岛素注射时间、胰岛素注射剂量等),以及由第一处理器1所确定的糖尿病初步预测结果。该糖尿病风险预警系统的存储设备预先存储有包含至少一个属性的一级风险预警条件,基于一级风险预警条件的属性可调取预先存储的至少一个运动风险模型。若干个属性与至少一个特征相对应。
特征指的是用户数据的不同类型,如当前用户的糖尿病所处阶段、饮食监测数据、药物监测数据、就诊历史数据、地理位置信息、生理监测数据、身体素质评估数据中的一个或几个。属性指的是各个特征所对应的至少一个运动监测数据的限制条件。属性包括运动能力级别、运动方案级别。
如下针对上述特征与属性之间的关系举例说明:例如,为由第一处理器1所确定的 糖尿病初步预测结果的特征,其对应的是为运动方案级别的属性。如由第一处理器1所确定的糖尿病初步预测结果为II级预警,其运动方案级别属性为甲级(或是数值型)。例如,为并发症类型的特征,其对应的是为运动方案级别的属性,针对如中度视网膜病变并发症的特征,其运动方案级别属性为乙级(或是数值型)。如存在某一特征所对应的属性的限制条件高于其他特征所对应的同一属性的限制条件,则以较高的限制条件为准,以全面性考虑该用户的潜在糖尿病风险或恶化风险。例如,针对如中度视网膜病变并发症的特征,其运动方案级别属性为乙级(或是数值型),而药物治疗方案的特征,当该用户的降糖药服用时间未超出预设降糖药服用时长时,即药物降糖作用尚还有效的期间,其运动方案级别属性为甲级(或是数值型),则最终采取甲级为该用户的运动方案级别。而针对如身体素质评估数据的特征,当该用户的身体素质评估数据为不达标时,其运动能力级别为丙级。
如下针对一级风险监测条件与运动风险模型之间的对应关系举例说明:如上述在确定该用户与包含丙级运动能力级别、甲级运动方案级别两个属性的一级风险监测条件相对应时,判断该用户当前情况适合常规运动强度的运动风险模型,常规运动强度的运动风险模型中包括若干个运动方案,例如站立、散步、做家务等运动方案,每个运动方案均包括久坐时长、运动强度、运动时间、运动时长、运动频率及各自对应的适宜控制范围。
至此,运动方案生成模块得到了符合当前用户情况的若干个运动方案,但针对用户个体性差异,不同用户采用不同运动方案时身体情况的适应能力不同,因此本发明继而采用二级风险预警分析对当前的运动方案作进一步地分析,充分考虑用户个体性差异问题,从运动干预的层面分级地控制糖尿病风险的方式为用户提供更为安全有效的运动治疗方案。
运动方案生成模块还被配置为在当前用户的当前运动数据不满足一级风险预警条件的情况下执行二级风险预警分析。具体地:运动方案生成模块还被配置为基于当前用户的历史生理信息与历史运动监测数据之间的统计变化趋势曲线来确定当前用户的当前运动监测数据与生理信息之间的关联关系,并获取由智能电子设备4所提供的关于当前用户的当前生理信息,将其与所确定的关联关系之间进行趋势分析以预测持续进行该运动的情况下生理信息的预测值,根据趋势分析后得到的预测生理信息来确定运动风险模型在排除基于用户的预测生理信息所确定的限制运动方案的基础上所生成的至少一个运动风险警告和/或运动引导方案,并通过提示模块或是其他智能电子设备4向当前用户发出运动风险警告和/或运动引导建议。
优选地,运动风险模型指的是根据用户数据所确定的其历史生理信息与历史运动监测数据之间的统计变化趋势曲线。该统计变化趋势曲线能够直观地反映该用户进行各类运动时生理信息的变化趋势,为用户接下来的运动提供分析预测依据。运动方案生成模块基于该运动风险模型来生成关于当前用户的运动干预信息。运动干预信息包括运动风险警告和/或运动引导建议。其中,运动干预信息从两个方向对当前用户的运动行为给出干预及预防建议,从运动风险警告可使当前用户明确哪些运动或是动作存在潜在风险,以此用户不仅是在接下来的运动过程中能够避免此类危险动作,更重要的是用户在此后的生活也能够明确需要避免此类危险动作,短期及长期上均有利于提升用户的治疗效果。运动风险警告和/或运动引导建议是在排除当前用户的限制运动方案的基础上来生成的。
限制运动方案是基于用户的预测生理信息所确定的。在分析确定限制运动方案的同时,即获取到不适用于当前用户的运动治疗方案或是运动能力级别及运动方案级别两个属性值。以此将上述获得的与当前用户相关的运动风险模型中满足限制运动方案的部分排除后,基于其与运动方案更新运动风险模型,继而将其反馈至当前用户进行查看或提示。预测生理信息是通过关于当前用户的当前生理信息与关联关系之间进行趋势分析所预测得到的。其中,预测生理信息指的是持续进行该运动的情况下生理信息的预测值。而关联关系是基于当前用户的历史生理信息与历史运动监测数据之间的统计变化趋势曲线来确定的。关联关系指的是当前用户的当前运动监测数据与生理信息之间变化趋势的预测。
作为一种优选实施方式,本实施例中预警系统还包括如实施例2所述的传感器模块、第二处理器2以及运动方案调整模块。所述传感器模块被配置为采集该糖尿病风险预警系统的初始数据、系统应用的参数以及关于当前用户的用户数据。所述第二处理器2被配置为:在监测到当前用户正在执行由上述运动方案生成模块所生成的运动治疗方案且关于当前用户的由第一处理器1所生成的预测结果为II级的情况下,利用由传感器模块所采集到的数据集根据机器学习筛选出的显著性特征以及提取出的运动监测数据与自主行为能力之间的关系来辨识用户运动糖尿病风险,所述运动方案调整模块被配置为基于由所述第二处理器2所分析确定的运动监测数据与自主行为能力之间的关系以辨识运动糖尿病风险的方式来动态地调整运动方案中的配置参数。
作为一种优选实施方式,本发明所提供的系统包括:由当前用户所操作或佩戴的智能电子设备4。该智能电子设备4上设置有第一处理器1、第二处理器2、运动方案生成模块、传感器模块、运动方案调整模块等。设于由当前用户所操作或佩戴的智能电子设备上的若干处理器之间通过计算机网络与由当前用户/护理人员所操作或佩戴的智能电子设备进行通信。第一处理器1、第二处理器2、运动方案生成模块、运动方案调整模块以及其他处理器之间可以通过诸如母板的通信总线(实线)互连。优选地,第一处理器1和至少一个智能电子设备(例如传感器模块)将其采集处理后的数据发送给运动方案生成模块以及第二处理器2。运动方案生成模块缓存该数据并将其处理后的数据发送至至少一个智能电子设备(例如由当前用户所操作的智能手机)或提示模块、以及运动方案调整模块。运动方案生成模块将生成的运动方案传送至运动方案调整模块,以此运动方案调整模块基于其生成的运动方案来动态地调整其配置参数以优化当前用户的运动方案。运动方案调整模块对其接收到的数据进行处理并将处理后生成的数据发送至至少一个智能电子设备(例如由当前用户所操作的智能手机)或提示模块、以及运动方案生成模块。
以上所述,仅为本发明较佳的具体实施方式,但本发明的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本发明披露的技术范围内,根据本发明的技术方案及其发明构思加以等同替换或改变,都应涵盖在本发明的保护范围之内。

Claims (15)

  1. 一种糖尿病风险预警系统,其特征是,该系统包括:
    存储设备;
    基于改进k-means聚类的第一处理器(1),其耦合到所述存储设备并被配置为:
    基于所选择的第一个聚类中心点,得到稳定的各簇中心,代入糖尿病分段函数,得到糖尿病的预警模型,
    其中,选定数据集,定义聚类簇数k、领域半径ε,选择样本点X i与样本之间距离之和最大的点作为第一个聚类中心点,以使得第一个聚类中心点落在各簇类的中心部位。
  2. 根据权利要求1所述的糖尿病风险预警系统,其特征是,样本点X i与样本之间距离之和最大的点的选择是通过以下至少一个步骤来实现的:
    计算每一个点与第一个聚类中心点的距离dist(x);
    选取dist(x)较大的点作为新的聚类中心;
    对每一个dist(x)求和;
    最大的Sum(dist(x))为第一个聚类中心点。
  3. 根据权利要求2所述的糖尿病风险预警系统,其特征是,所述第一处理器(1)还被配置为:
    选择得到新的聚类中心,
    其中,选择样本点X i与第一个聚类中心点之间距离较大的点作为新的聚类中心。
  4. 根据权利要求3所述的糖尿病风险预警系统,其特征是,样本点X i与第一个聚类中心点之间距离较大的点的选择是通过以下至少一个步骤来实现的:
    计算每一个点与第一个聚类中心点的距离dist(x);
    选取dist(x)较大的点作为新的聚类中心;
    即对每一个dist(x)求和得到Sum(dist(x));
    取一个在Sum(dist(x))内的随机值Random;
    重复通过公式计算,所述公式为:Random=Random-dist(x);
    直至Random≤0,则该点为下一个聚类中心点。
  5. 根据权利要求4所述的糖尿病风险预警系统,其特征是,所述第一处理器(1)还被配置为:
    遍历操作,其中,重复上步骤2直至得到所需k个中心点,记为{μ j,j=1,...,k}。
  6. 根据权利要求5所述的糖尿病风险预警系统,其特征是,所述第一处理器(1)还被配置为:
    标记样本簇,
    其中,计算每个样本X i与聚类中心{μ j,j=1,...,k}的距离dist od,根据其距离最近确定样本X i的簇标记λ i,并将样本X i划入相应的簇:
    Figure PCTCN2020080251-appb-100001
  7. 根据权利要求6所述的糖尿病风险预警系统,其特征是,所述第一处理器(1)还被配置为:
    更新操作,
    其中,更新所有聚类中心点,按以下公式计算所有新的聚类中心:
    Figure PCTCN2020080251-appb-100002
  8. 根据权利要求7所述的糖尿病风险预警系统,其特征是,所有聚类中心点的更新是通过以下至少一个步骤来实现的:
    计算
    Figure PCTCN2020080251-appb-100003
    并判断u i'=u i是否成立;
    若成立,则保持当前中心不变;
    若不成立,则将当前u i更新为u i'。
  9. 根据权利要求8所述的糖尿病风险预警系统,其特征是,所述糖尿病预警分段函数为:
    Figure PCTCN2020080251-appb-100004
    其中,μ i(i=1,2,3)为第i个聚类中心,y=0、y=1、y=2分别代表健康、I级预警和II级预警,以此可以利用该预警模型来预测是否患糖尿病及糖尿病所处阶段。
  10. 一种糖尿病风险预警系统,其特征是,该系统包括:
    存储设备;
    基于特征权重的第二处理器(2),其耦合至存储设备且被配置为:
    计算自变量特征权重向量和原始相关度向量;
    基于所述自变量特征权重向量和所述原始相关度向量,输出基于特征权重的LARS糖尿病模型的回归系数ω。
  11. 根据权利要求10所述的糖尿病风险预警系统,其特征是,计算特征自变量的特征权重的公式为:
    Figure PCTCN2020080251-appb-100005
    其中,
    Figure PCTCN2020080251-appb-100006
    为特征方程
    Figure PCTCN2020080251-appb-100007
    的特征值。
  12. 根据权利要求11所述的糖尿病风险预警系统,其特征是,所述特征方程中的R 为糖尿病数据集矩阵X的协方差矩阵,其计算公式为:
    Figure PCTCN2020080251-appb-100008
    其中,
    Figure PCTCN2020080251-appb-100009
    θ i为第i个特征的均值。
  13. 根据权利要求12所述的糖尿病风险预警系统,其特征是,基于所述自变量特征权重向量和所述原始相关度向量的回归系数ω的输出是通过以下至少一个步骤来实现的:
    计算角平分向量、回归系数向量、新相关度向量和最大相关度;
    更新回归系数向量,评估值向量、残差向量和指标集;
    判断残差向量的L2范数是否小于容忍度,若是则结束,否则重复以上步骤。
  14. 根据权利要求13所述的糖尿病风险预警系统,其特征是,列向量X A中向量的角平分线线u A是通过以下计算公式来得到的:
    Figure PCTCN2020080251-appb-100010
    Figure PCTCN2020080251-appb-100011
    u A=X Aω A
  15. 一种糖尿病风险预警系统,其特征是,该装置包括:
    存储设备;
    至少一个处理器,其耦合到所述存储设备并被配置为:
    基于所选择的第一个聚类中心点,得到稳定的各簇中心,代入糖尿病分段函数,得到糖尿病预测模型,其中,选定数据集,定义聚类簇数k、领域半径ε,选择样本点X i与样本之间距离之和最大的点作为第一个聚类中心点,以使得第一个聚类中心点落在各簇类的中心部位;
    计算自变量特征权重向量和原始相关度向量;
    基于所述自变量特征权重向量和所述原始相关度向量,输出糖尿病预测模型的回归系数ω。
PCT/CN2020/080251 2019-04-18 2020-03-19 一种糖尿病风险预警系统 WO2020211592A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US16/967,620 US20220301708A1 (en) 2019-04-18 2020-03-19 Diabetes risk early warning system

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
CN201910314236.5 2019-04-18
CN201910314236.5A CN110085322A (zh) 2019-04-18 2019-04-18 一种k-means聚类糖尿病预警模型的改进方法
CN201910340600.5 2019-04-25
CN201910340600.5A CN110060781A (zh) 2019-04-25 2019-04-25 一种基于特征权重的lars糖尿病预测方法

Publications (1)

Publication Number Publication Date
WO2020211592A1 true WO2020211592A1 (zh) 2020-10-22

Family

ID=72836968

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/080251 WO2020211592A1 (zh) 2019-04-18 2020-03-19 一种糖尿病风险预警系统

Country Status (2)

Country Link
US (1) US20220301708A1 (zh)
WO (1) WO2020211592A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113921134A (zh) * 2021-09-01 2022-01-11 西安理工大学 一种基于ks模型的糖尿病预测算法

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11161011B2 (en) * 2019-04-29 2021-11-02 Kpn Innovations, Llc Methods and systems for an artificial intelligence fitness professional support network for vibrant constitutional guidance
CN116421178B (zh) * 2023-04-19 2024-05-28 河北金盛达医疗用品有限公司 辅助监护的方法、装置、终端设备及可读存储介质
CN117393171B (zh) * 2023-12-11 2024-02-20 四川大学华西医院 直肠癌术后lars发展轨迹预测模型构建方法及系统

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102930163A (zh) * 2012-11-01 2013-02-13 北京理工大学 一种2型糖尿病风险状态判定方法
CN104636631A (zh) * 2015-03-09 2015-05-20 江苏中康软件有限责任公司 一种基于糖尿病系统大数据的糖尿病概率计算方法
CN106384119A (zh) * 2016-08-23 2017-02-08 重庆大学 一种利用方差分析确定k值的k‑均值聚类改进算法
US20170161606A1 (en) * 2015-12-06 2017-06-08 Beijing University Of Technology Clustering method based on iterations of neural networks
CN107403072A (zh) * 2017-08-07 2017-11-28 北京工业大学 一种基于机器学习的2型糖尿病预测预警方法
CN110060781A (zh) * 2019-04-25 2019-07-26 岭南师范学院 一种基于特征权重的lars糖尿病预测方法
CN110085322A (zh) * 2019-04-18 2019-08-02 岭南师范学院 一种k-means聚类糖尿病预警模型的改进方法

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190100809A1 (en) * 2010-05-11 2019-04-04 Veracyte, Inc. Algorithms for disease diagnostics
WO2016029039A1 (en) * 2014-08-20 2016-02-25 Puretech Management, Inc. Systems and techniques for identifying and exploiting relationships between media consumption and health
US11139081B2 (en) * 2016-05-02 2021-10-05 Bao Tran Blockchain gene system
US10252145B2 (en) * 2016-05-02 2019-04-09 Bao Tran Smart device
US10052026B1 (en) * 2017-03-06 2018-08-21 Bao Tran Smart mirror

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102930163A (zh) * 2012-11-01 2013-02-13 北京理工大学 一种2型糖尿病风险状态判定方法
CN104636631A (zh) * 2015-03-09 2015-05-20 江苏中康软件有限责任公司 一种基于糖尿病系统大数据的糖尿病概率计算方法
US20170161606A1 (en) * 2015-12-06 2017-06-08 Beijing University Of Technology Clustering method based on iterations of neural networks
CN106384119A (zh) * 2016-08-23 2017-02-08 重庆大学 一种利用方差分析确定k值的k‑均值聚类改进算法
CN107403072A (zh) * 2017-08-07 2017-11-28 北京工业大学 一种基于机器学习的2型糖尿病预测预警方法
CN110085322A (zh) * 2019-04-18 2019-08-02 岭南师范学院 一种k-means聚类糖尿病预警模型的改进方法
CN110060781A (zh) * 2019-04-25 2019-07-26 岭南师范学院 一种基于特征权重的lars糖尿病预测方法

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113921134A (zh) * 2021-09-01 2022-01-11 西安理工大学 一种基于ks模型的糖尿病预测算法

Also Published As

Publication number Publication date
US20220301708A1 (en) 2022-09-22

Similar Documents

Publication Publication Date Title
WO2020211592A1 (zh) 一种糖尿病风险预警系统
Prasad et al. A reinforcement learning approach to weaning of mechanical ventilation in intensive care units
Velusamy et al. Ensemble of heterogeneous classifiers for diagnosis and prediction of coronary artery disease with reduced feature subset
Afsaneh et al. Recent applications of machine learning and deep learning models in the prediction, diagnosis, and management of diabetes: a comprehensive review
CN111105860B (zh) 面向慢性病康复的精准运动大数据智能预测、分析及优化系统
CN112639990A (zh) 用于使用个人数字表型维持健康的系统和方法
CN105827731A (zh) 基于融合模型的智能化健康管理服务器、系统及其控制方法
CN111223569B (zh) 一种基于特征权重的lars糖尿病预测方法
CN111223568B (zh) 一种改进k-means聚类的糖尿病预警模型
Liu et al. Artificial neural network models for prediction of cardiovascular autonomic dysfunction in general Chinese population
Ling et al. Natural occurrence of nocturnal hypoglycemia detection using hybrid particle swarm optimized fuzzy reasoning model
Tigga et al. Predicting type 2 diabetes using logistic regression
Balpande et al. Review on prediction of diabetes using data mining technique
Alnaggar et al. An IoT-based framework for detecting heart conditions using machine learning
Reddy et al. Evolving a neural network to predict diabetic neuropathy
Alshammari Applying Machine Learning Algorithms for the Classification of Sleep Disorders
Karimi Moridani An automated method for sleep apnoea detection using HRV
Arslan Sleep disorder and apnea events detection framework with high performance using two-tier learning model design
Devi et al. Performance analysis of data mining classification algorithms for early prediction of diabetes mellitus 2
Fahim et al. Diagnosis of diabetes using clinical features: an analysis based on machine learning techniques
CN110473627B (zh) 一种基于代价敏感的自适应神经模糊推理糖尿病预测方法
Chen et al. Prediction model of diabetes based on machine learning
Qin A Prediction Model of Diabetes Based on Ensemble Learning
Vamsi et al. Prediction of micro vascular and macro vascular complications in type-2 diabetic patients using machine learning techniques
Pekel et al. Computational intelligence approach for classification of diabetes mellitus using decision tree

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20792195

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20792195

Country of ref document: EP

Kind code of ref document: A1