CN117332203A

CN117332203A - System and method for carrying out data exploration and analysis on type 2 diabetes special disease queue

Info

Publication number: CN117332203A
Application number: CN202311457263.0A
Authority: CN
Inventors: 王艺璇; 林勇
Original assignee: Shanghai Palan Datarx Technology Co ltd
Current assignee: Shanghai Palan Datarx Technology Co ltd
Priority date: 2023-11-03
Filing date: 2023-11-03
Publication date: 2024-01-02

Abstract

The invention relates to a system and a method for carrying out data exploration and analysis on a type 2 diabetes special disease queue, wherein the analysis system is an interactive Web application program constructed by Shiny; the method for carrying out data exploration analysis comprises the following steps: s1: acquiring an ADaM data set and a predefined file of a type 2 diabetic patient; s2: dividing the group of type 2 diabetics into different subgroups or queues; s3: selecting a proper statistical method in an analysis system instrument panel according to the data exploration target; s4: according to a statistical method, selecting variables and confounding factors of corresponding people and participation models; s5: and outputting a result chart, and adjusting the width and the height of the chart. The invention can help the user simplify the work flow of developing the multivariate data analysis on the special type 2 diabetes queue, so that the result display is more visual, and the invention provides basis for promoting the standardized treatment and management of the basic diabetes.

Description

System and method for carrying out data exploration and analysis on type 2 diabetes special disease queue

Technical Field

The invention relates to the technical field of data exploration and analysis, in particular to a system and a method for carrying out data exploration and analysis on a type 2 diabetes special disease queue.

Background

Diabetes is a metabolic disease characterized by hyperglycemia, which is caused by defective insulin secretion or impaired biological action, or both, and which causes chronic damage to various tissues, especially eyes, kidneys, heart, blood vessels, nerves, and dysfunction.

At present, due to the fact that the differences of disease diagnosis and treatment and management modes among different regions and levels of medical institutions in China are large, the related researches on diabetes are mostly concentrated in three-level hospitals and the like, high-quality researches on focusing on the basic diagnosis and treatment and management modes are not available, the blood glucose reducing effect and safety of different diagnosis and treatment schemes in basic-level hospital patients are explored, the factors affecting the treatment effect and cost of basic-level diabetes patients can be influenced, and a basis can be provided for promoting basic-level diabetes standardized treatment and effective management.

The analytical procedures of the current study all followed the following procedure: making a research scheme and a statistical analysis plan; creating a corresponding analysis data set according to a set of research targets; writing analysis codes to generate a series of TFLs (TABLE, FUGURE, LISTING, statistical analysis report forms) designed in advance; the study summary report was completed.

The above workflow has the following two problems: 1. when a plurality of subgroups and queues are involved in the study or the same crowd is applied to different study indexes, a large number of similar reports with the same format can be generated, so that the problems of lengthy report, difficulty in reading and rapid extraction of effective information are caused; 2. when constructing the multi-factor model, the inclusion of confounding factors is needed for adjustment, the selection and combination of confounding factors are needed for continuous debugging, and each combination is needed for re-writing codes.

Disclosure of Invention

The invention aims at the current situation that the traditional TFL presentation form is complicated and the readability is low because a plurality of subgroups, queues and different research indexes are involved in the research in the process of exploring the influence of the blood sugar management of the type 2 diabetes patient on the long-term prognosis ending, and the mixed factors and the re-writing codes are required to be continuously debugged in the process of constructing a multi-factor model, and provides a system and a method for carrying out data exploration analysis on the type 2 diabetes special-purpose disease queue.

In order to achieve the above purpose, the present invention provides the following technical solutions: a system and method for carrying on data exploration analysis to the special disease queue of type 2 diabetes, through setting up the connection between Server (Server) and User Interface (UI) of the application program of Shiny, construct the data exploration analysis system comprising variable, statistical method option box, special disease subgroup of type 2 diabetes or queue select drop-down box, etc., when users interact with application program through UI instrument panel interface, the Server logic will interact with R code according to user's input and operation, respond to and update the state of the application program, realize the manipulation and real-time viewing of data and chart;

further, the setting of the variables in the analysis system is specifically: gender, age, region of visit, length of stay in hospital, medical insurance, condition of insulin use (basal vs. premix), diabetes medication regimen, whether the glycemic biochemical index in hospitalization meets the standard, whether a hypoglycemic event occurs, whether a patient is again in discharge 90 days, fasting blood glucose variation difference, last HbA1c < 7% in discharge 90 days, other disease categories (coronary heart disease, hypertension, stroke, chronic renal insufficiency), other disease numbers, service costs, treatment costs, and total costs;

further, the setting of the statistical method in the analysis system specifically includes:

1) Hypothesis testing: determining whether the sample-to-sample, sample-to-population differences are due to sampling errors or statistical inferences caused by intrinsic differences, wherein for a continuous variable, the number of cases (N), the Mean (Mean), the Standard Deviation (SD), the minimum (Min), the maximum (Max), the Median (Median), and the quartile spacing (IQR) thereof are described; the comparison between groups of normal distribution data adopts t test, and the comparison between groups of non-normal distribution data adopts Mann Whitney U test (Wilcoxon rank sum test); for the classification variables, the frequency (N) and percent (%) are described, the inter-group comparison uses chi-square test, and when any desired frequency is less than 1 or 20% of the desired frequency is less than 5, fisher's exact test will be used instead of chi-square test; finally, determining whether a statistically significant difference exists in the group comparison according to the P value; the result display form is a table, the display content comprises descriptive statistical results and P values, and the P values with statistically significant differences are displayed in a thickened form;

2) Logistic regression: correcting the influence of confounding factors, wherein the influence of the confounding factors can cause deviation or error to the prediction of the target variable, correcting the confounding factors through logistic regression analysis, and further analyzing the adjusted model coefficients; the result display form is a forest map commonly used in scientific literature, and the display content comprises the magnitude, the direction and the confidence interval of the coefficient;

3) Correlation analysis: assessing the degree of association between two or more variables in a study, helping to understand the linear relationship between the variables and how they change with each other; the Pearson correlation test is used for continuous variables, specifically:

in equation 1, x and y are two vectors of length n, m _x And m _y Corresponding to the mean of x and y, respectively; in the formula 2, n is the number of observations (length) in the x and y variables, the corresponding P value is confirmed through the t distribution table, and finally whether the correlation coefficient is obviously unequal to zero is determined according to the P value;

the discontinuous variable is tested by Spearman correlation, and is specifically:

in equation 3, x 'and y' are the rank orders of x and y,

the display content of the correlation analysis is a thermodynamic diagram, and the intensity of the correlation coefficient is represented by different color codes in the diagram;

4) Multiple linear regression and linear regression hypothesis testing:

a) The multiple linear regression model is used for exploring the relation between a plurality of independent variables and one continuous dependent variable in the research, explaining the variation of the dependent variable by estimating regression coefficients, displaying a forest map which is commonly used in scientific literature, and displaying the size, direction and confidence interval of content containing coefficients;

b) The linear regression hypothesis test is used for carrying out regression diagnosis, evaluating the fitting quality of a regression model in research, and displaying a residual image in the form of a residual image, wherein the content comprises a residual image of regression values and residual errors, a residual QQ image, a residual error, a lever image and a position scale image.

Further, the method for the user to conduct data exploration on the type 2 diabetes mellitus specific disease queue in the instrument panel of the entering analysis system comprises the following steps:

s1: acquiring an ADaM data set and a predefined file of a type 2 diabetic patient;

s2: dividing the group of type 2 diabetes patients into different subgroups or queues according to the characteristics of the ADaM data set and the requirements on data exploration targets in a predefined file;

s3: selecting a proper statistical method in an analysis system instrument panel according to the data exploration target;

s4: selecting a group of special diabetes mellitus queues of a suitable method and ending variables and confounding factors of a participation model in an analysis system instrument panel according to the statistical method;

s5: and outputting a result chart, and adjusting the width and the height of the chart by a system sliding slicer.

Further, in step S2, the data probing targets of the group of type 2 diabetes-specific disease queues are specifically:

1) The current treatment situation of the type 2 diabetes mellitus patient is explored, and the blood glucose reducing effect and the safety comparison result of the basic insulin and the premixed insulin are reflected;

2) The effect of glycemic management during hospitalization of type 2 diabetics on the outcome of long-term prognosis was explored, as reflected by the effects of glycemic changes and treatment during hospitalization of patients on chronic disease and readmission.

Further, the group of type 2 diabetics is divided into different subgroups or queues, specifically:

and dividing different subgroups or queues according to the exploration targets based on different departments of medical treatment, the condition of insulin usage in hospitalization treatment and the distribution interval of the last fasting blood glucose detection result before discharge.

Further, in step S4, a model building page in the analysis system selects a model-applicable model-specific group of type 2 diabetes and outcome variables and confounding factors of the participation model, specifically:

and selecting a group of special diabetes type 2 patients who apply the method or model according to the provided method or model in the statistical method option box in the step S3, and selecting ending variables and confounding factors of the participation model.

Further, after outputting the result chart by using the statistical method in the analysis system, the width and height of the chart are adjusted by the system sliding-section.

Compared with the prior art, the technical scheme of the application has the following beneficial effects:

according to the system and the method for carrying out data exploration analysis on the type 2 diabetes special disease queue, the combination selection of the subgroups and the queues (different queues can be selected under one subgroup) and the free combination of dependent variables and multiple independent variables in multiple regression analysis can be realized through the analysis system, the condition of the concerned queue is rapidly acquired and read, the workload of repeated screening and visualization on the multiple subgroups and the multiple variables corresponding to the subgroups when a user uses different statistical methods to carry out data analysis is reduced, the result can be presented more flexibly and intuitively, and the basis is provided for promoting standardized treatment and effective management of the basic diabetes.

Drawings

FIG. 1 is a flow chart of the method of the present invention;

FIG. 2 is a schematic diagram of the study objectives and study population selection provided in the examples;

FIG. 3 is a schematic diagram of a hypothesis testing result table provided in the embodiment;

FIG. 4 is a schematic representation of a logistic regression results forest provided by the examples;

FIG. 5 is a schematic diagram of a correlation analysis result provided in the embodiment;

FIG. 6 is a schematic representation of a multiple regression results forest provided by the examples;

fig. 7 is a schematic representation of a linear regression hypothesis test residual error provided by an embodiment.

Detailed Description

The drawings are for illustrative purposes only and are not to be construed as limiting the present patent.

All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

It will be appreciated by those skilled in the art that certain well-known structures in the drawings and descriptions thereof may be omitted.

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. The embodiment provides a system and a method for performing data exploration and analysis on a type 2 diabetes mellitus specific disease queue, wherein an interactive Web application program is constructed for the type 2 diabetes mellitus specific disease queue for performing data analysis, as shown in fig. 1, the specific analysis method comprises the following steps:

s5: outputting a result chart, and adjusting the width and the height of the chart through a system sliding slicer;

in this embodiment, the ADaM dataset and predefined study plan of the type 2 diabetic patient corresponding to step S1 is intended to utilize the county hospital medical record data of 8 different provinces to perform retrospective analysis; the operations corresponding to steps S2 and S3 for selecting study targets and study populations in the analysis system are shown in fig. 2.

The graphs produced by the statistical methods of the present embodiment based on the hypothesis testing, logistic regression, correlation analysis, multiple regression and linear regression hypothesis testing in the analysis system are shown in fig. 3 to 7, respectively.

Wherein, assuming that the test is used for patients who visit at different departments, the blood glucose reduction effect in the case of using different insulin is compared with the safety difference, the indexes for comparison may include: the incidence of blood glucose biochemical indicators reaching standards, the incidence of hypoglycemic events, and the incidence of readmission events within 90 days in hospitalization.

Logistic regression is used for patients with visits in different departments, and the comparison of the blood glucose reducing effect with different insulin levels with safety differences can include: gender, age, whether the biochemical index of blood glucose meets the standard, whether the hypoglycemic event occurs, and whether the patient is further treated within 90 days of discharge.

The correlation analysis is used for analyzing the blood sugar change and treatment influence of the patient in different distribution intervals of the last fasting blood sugar detection result before discharge, and the analysis indexes can comprise: fasting blood glucose variation difference, last HbA1c < 7% within 90 days of discharge, other disease types (coronary heart disease, hypertension, stroke, chronic renal insufficiency), other disease numbers, occurrence of readmission events within 90 days, service fee, treatment fee, total cost, age, sex, visit area, hospitalization duration, medical insurance, insulin use condition (basal vs. premix), diabetes medication regimen.

Multiple regression and regression hypothesis test examples for analysis of blood glucose changes and treatment effects in patients at different distribution intervals of the last fasting blood glucose test results prior to discharge, dependent variables may include: the length of stay, service fee, treatment fee, total cost, last HbA1c < 7% in 90 days discharged, occurrence of readmission event in 90 days, and the independent variables can include: fasting blood glucose variation differences, age, sex, region of visit, other disease categories (coronary heart disease, hypertension, stroke, chronic renal insufficiency), other disease numbers, insulin use (basal vs. premix), diabetes regimen.

The beneficial effects of the embodiment are as follows:

the combination selection of the subgroups and the queues (different queues can be selected under one subgroup) is realized through the analysis system, and the condition of the concerned queue is rapidly acquired and read through the free combination of the dependent variable and the independent variables in the multiple regression analysis, so that the workload of repeated screening and visualization of the multiple variables corresponding to the subgroups is reduced when a user uses different statistical methods to conduct data analysis, the result can be presented more flexibly and intuitively, and the basis is provided for promoting standardized treatment and effective management of the diabetes mellitus of the basal layer.

It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

Although embodiments of the present invention have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and alterations can be made therein without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims

1. A system and method for carrying out data exploration analysis on a type 2 diabetes mellitus special disease queue is characterized in that a data exploration analysis system comprising variable, statistical method option boxes, type 2 diabetes mellitus special disease subgroups or queue selection drop-down boxes and other interactive elements is constructed by establishing connection between a user interface of a Shiny application program and a server, when a user interacts with the application program through a UI instrument panel interface, server logic interacts with R codes according to input and operation of the user, responds and updates the state of the application program, and realizes manipulation and real-time viewing of data and charts;

the method for carrying out data exploration in the analyzed system comprises the following steps:

2. The system and method for data exploration and analysis of type 2 diabetes specific disease queues according to claim 1, wherein the setting of variables in the analysis system is specifically:

gender, age, region of visit, length of stay in hospital, medical insurance, condition of insulin use, diabetes medication, whether the blood glucose biochemical index meets the standard in blood glucose biochemical index in hospitalization, whether a hypoglycemic event occurs, whether a patient is again in a hospital within 90 days of discharge, a fasting blood glucose change difference value, last HbA1c of less than 7% in a hospital within 90 days of discharge, other disease types, other disease numbers, service fees, treatment fees, and total cost.

3. The system and method for data exploration and analysis of type 2 diabetes specific disease queues according to claim 1, wherein the arrangement of statistical methods in the analysis system is specifically as follows:

1) Hypothesis testing: judging whether the difference between the samples in the study is statistical deduction caused by sampling errors or intrinsic differences, wherein the result display form is a table, the display content comprises descriptive statistical results and P values, and the P values with statistically significant differences are displayed in a thickening form;

2) Logistic regression: correcting the influence of confounding factors, avoiding deviation or error of prediction of a target variable caused by the confounding factors, and displaying a forest map with a result display form commonly used in scientific literature to display the size, direction and confidence interval of content containing coefficients;

3) Correlation analysis: evaluating the degree of association between two or more variables in the study, wherein the continuous variable adopts a Pearson correlation test, the discontinuous variable adopts a Spearman correlation test, the result is displayed in a thermodynamic diagram, and the display content represents the strength of a correlation coefficient through different color codes;

4) Multiple linear regression and linear regression hypothesis testing:

4. The system and method for data exploration and analysis of type 2 diabetes mellitus specific disease queues according to claim 1, wherein the targets of data exploration of type 2 diabetes mellitus specific disease queues in step S2 are specifically:

5. The system and method for data exploration and analysis of type 2 diabetes mellitus exclusive disease queue according to claim 1, wherein in step S2, the type 2 diabetes mellitus patient group is divided into different subgroups or queues, specifically, in step S2, different subgroups or queues are divided based on different departments of medical care, the condition of insulin use in hospitalization, and the distribution interval of the last fasting blood glucose detection result before discharge according to the exploration target.

6. The system and method for data exploration and analysis of type 2 diabetes mellitus specific disease queues according to claim 1, wherein the model construction page in the analysis system in step S4 selects the population and variables of the applicable model, specifically:

7. The system and method for probe analysis of type 2 diabetes specific disease queues according to claim 1, further comprising step S5: after outputting the result chart by using the statistical method in the analysis system, the width and the height of the chart are adjusted by the sliding slicer of the system.