CN111243753A

CN111243753A - Medical data-oriented multi-factor correlation interactive analysis method

Info

Publication number: CN111243753A
Application number: CN202010125946.6A
Authority: CN
Inventors: 钱步月; 刘涛; 郑莹倩; 刘璇; 吕欣; 许靖琴; 侯梦薇; 吴风浪
Original assignee: Xian Jiaotong University
Current assignee: Xian Jiaotong University
Priority date: 2020-02-27
Filing date: 2020-02-27
Publication date: 2020-06-05
Anticipated expiration: 2040-02-27
Also published as: CN111243753B

Abstract

The invention discloses a medical data-oriented multi-factor correlation interactive analysis method, which comprises the following steps: processing the acquired medical data, and associating the processed medical data according to the patient case number to obtain a treatment sequence of each patient; mapping the obtained clinic sequences to a two-dimensional plane by using a t-SNE algorithm to form different characteristic groups; selecting a characteristic group from the characteristic groups according to the requirement; setting disease characterization indexes; selecting characteristics of the selected characteristic population, and determining a characteristic sequence related to the disease characterization index; and measuring the correlation among the selected characteristics by adopting a statistical measurement index to obtain a result with statistical significance, and finishing the multi-factor correlation interactive analysis. The invention can interactively analyze the high-dimensional medical data and visually display key factors influencing the disease development.

Description

Medical data-oriented multi-factor correlation interactive analysis method

Technical Field

The invention belongs to the technical field of multi-factor correlation analysis, and particularly relates to a medical data-oriented multi-factor correlation interactive analysis method.

Background

Medical statistics is the science of gathering, organizing, analyzing, expressing and interpreting data information in medicine and related fields, mainly applying the basic principle and method of statistics. In clinical medical research, according to existing clinical medical data and by combining with existing medical knowledge, the multi-factor correlation analysis is carried out by calculating statistical characteristics such as Pearson correlation coefficient and the like, and key factors which have large influence on disease development are determined. However, medical data is high-dimensional and complex, heavy calculation is needed in the traditional method, and the result is abstract and difficult to understand, so that diagnosis and scientific research are not facilitated for doctors; the development of diseases is often related to a plurality of factors, and the traditional method can only measure and calculate the correlation between two factors at present, which influences the effectiveness of results.

In summary, a new multi-factor correlation interactive analysis method oriented to high-dimensional medical data is needed.

Disclosure of Invention

The present invention is directed to a method for interactive analysis of medical data based on multi-factor correlation, which solves one or more of the above problems. The invention can interactively analyze the high-dimensional medical data and visually display key factors influencing the disease development.

In order to achieve the purpose, the invention adopts the following technical scheme:

the invention discloses a medical data-oriented multi-factor correlation interactive analysis method, which comprises the following steps of:

step 1, processing the acquired medical data, and associating the processed medical data according to the patient case number to obtain a treatment sequence of each patient; wherein the processing comprises a normalization processing;

step 2, mapping the treatment sequence obtained in the step 1 to a two-dimensional plane by using a t-SNE algorithm to form different characteristic groups; selecting a characteristic group from the characteristic groups according to the requirement;

step 3, setting disease characterization indexes; performing feature selection on the features of the feature population selected in the step 2, and determining a feature sequence related to the disease characterization index;

and 4, measuring the correlation among the characteristics selected in the step 3 by adopting a statistical measurement index to obtain a result with statistical significance, and finishing the multi-factor correlation interactive analysis.

In a further improvement of the present invention, in step 1, the specific step of processing the acquired medical data includes:

(1.1) eliminating irrelevant features and private data in the medical data; wherein the extraneous features include: patient name, patient serial number, privacy data include: patient identification number, patient mobile phone number;

(1.2) eliminating missing values and abnormal values in the medical data; wherein the missing values include: null, "-", outliers include: a value violating medical knowledge, a value violating common sense;

(1.3) eliminating completely duplicated data in the medical data;

(1.4) normalizing the numerical data in the medical data, comprising: for the same feature data x_i，

Where X is the set of all values of a numerical characteristic, X_iDenotes the ith element in X, i ═ 1,2, 3.. n, n denotes the total number of elements, min (X) denotes the minimum value in set X, max (X) denotes the maximum value in set X;

(1.5) encoding the type data in the medical data to obtain an encoding vector Y; wherein, the coding format is as follows:

wherein, y_kDenotes the kth value in the encoded vector, k 1,2, 3.

In a further improvement of the present invention, in step 1, a visit sequence T for each patient is obtained, the expression of which is:

T＝{x_a,y_b,z_c,...}，

in the formula, x_a,y_b,z_c1, b, c ═ 1,2,3,. l, each representing different types of medical data, belonging to the same patient; l represents the number of elements of each type of medical data;

in step 2, selecting a characteristic group G to be researched from the characteristic groups according to the requirement, wherein the expression is as follows:

G＝{T₁,T₂,...,T_p,…,T_d}，

in the formula, T_pRepresents the visit sequence of the p-th patient in the characteristic population to be researched, and d is 1,2, 3.

The invention has the further improvement that the step 3 specifically comprises the following steps:

(3.1) when a disease characterization index is set, the disease characterization index is interactively specified;

(3.2) selecting the features of the selected feature population, and removing the features with the variance value smaller than a threshold value when determining the feature sequence related to the disease characterization index to obtain the removed features; and sorting the removed features according to the relevance with the disease characterization indexes, determining k features which are most critical to the disease characterization, and finishing feature selection and feature sorting.

A further improvement of the invention is that in step (3.2), the step of ranking the remaining features according to relevance to disease characterization indicators, the step of determining the k features most critical to disease progression specifically comprises:

(3.2.1) constructing a classifier taking a decision tree as a base learning device, and marking as F;

(3.2.2), sending the data of the removed features into a classifier F, predicting a disease characterization index P, and obtaining a reference prediction result O, wherein the expression is as follows:

O＝F(t₁,t₂,...，t_q...，t_e)，

in the formula, t_qQ 1, 2., e denotes data containing the qth feature, and e denotes the number of features;

(3.2.3), the data from which the r-th feature is removed is sent to a classifier for prediction to obtain a prediction result O_rThe expression is:

O_i＝F(t₁,t₂,...t_r-1,t_r+1,...,t_e)；

(3.2.4), calculating the prediction result O_rThe difference from the reference prediction result O is used as the influence degree delta O of the r-th characteristic on the disease development_rThe expression is:

ΔO_r＝|O_r-O|，

in the formula,. DELTA.O_r R 1,2,3, e denotes the degree of influence of the r-th feature on disease progression; wherein, Δ O_rThe larger, the more critical it represents that the r-th feature has a greater impact on the progression of the disease;

(3.2.5) repeating steps (3.2.4) and (3.2.5) until all features have an impact on disease progression Δ O;

(3.2.6), sorting the features according to the size of the key measurement index to obtain the first s most key features, wherein the expression is as follows:

{t₁,t₂,...t_s}＝sort(ΔO₁,ΔO₂,...,ΔO_n)，

in the formula, sort () represents a sorting function.

In a further improvement of the present invention, in step 4, the statistical metric index includes: pearson correlation coefficient, u test, t test, analysis of variance, unitary regression based on central limit theorem or multiple regression analysis.

The invention further improves the method and also comprises the following steps:

and 5, visualizing the correlation among the s most critical features obtained in the step (3.2.6).

The invention has the further improvement that the step 5 specifically comprises the following steps:

(5.1) taking each feature obtained by feature selection as a longitudinal axis, taking the treatment sequence of each patient as a transverse axis, and drawing a parallel coordinate system among the features for visually displaying the dependence change rule among different features;

and (5.2) selecting two characteristics, mapping the data to a two-dimensional plane taking the two characteristics as coordinate axes, and displaying the correlation relationship between the two characteristics in a visualized manner.

Compared with the prior art, the invention has the following beneficial effects:

the multi-factor correlation interactive analysis method for the high-dimensional medical data, provided by the invention, designs a complete process from the original clinical medical data to the final correlation visualization result, and can directly display the dependence change rule among key features in the high-dimensional medical data.

The method comprises the steps of firstly processing acquired original clinical medical data, removing invalid information, sensitive information, missing values and abnormal values in the data, respectively adopting a standardization and coding processing method aiming at numerical data and classified data, splicing according to case numbers, and generating a patient treatment sequence; then mapping the high-dimensional visit sequence data to a two-dimensional plane to generate a characteristic group, and interactively selecting a group to be researched by a doctor; further performing feature selection on the data of the group of patients, calculating a key metric index of each feature for a final prediction result, selecting the first few features which are most key after sorting, performing hypothesis test on the selected features by a statistical method, and verifying the statistical significance level of the correlation among the features; and further, a parallel coordinate system and a two-dimensional coordinate system are adopted to respectively visually display all the characteristics and the dependency change relationship between every two characteristics, and the influence of different characteristics on the disease development is analyzed. The invention displays the key factors for the disease development by visually displaying the dependency change relationship among the characteristics in the high-dimensional medical data, and provides the statistically significant evidence.

The analysis method can overcome the defect that high-dimensional medical data is complex in abstraction and difficult to analyze; in the traditional method, the analysis process of high-dimensional medical data is calculated by means of complex statistics, the calculated amount is large, the calculation principle is difficult to understand, and great difficulty is caused to clinical diagnosis and medical research. The invention simplifies the whole analysis process, introduces medical knowledge of doctors through an interactive method, visually displays the whole analysis process, reduces the calculated amount and makes the dependency change relationship between variables easier to understand.

The invention considers the multi-factor dependence relationship among high-dimensional medical data; in the traditional method, the analysis of high-dimensional medical data only can analyze the relation between two variables by means of pairwise variation relation between the variables, and the multivariate relation cannot be modeled. The invention adopts the methods of dimension reduction mapping, feature selection, parallel coordinate system drawing and the like, fully considers the dependency change relationship among the multivariable, and leads the analysis result of the high-dimensional medical data to be more accurate.

The method is suitable for clinical medical data under various diseases, and has strong expandability; under the traditional method, a special analysis algorithm needs to be designed according to different diseases and different data types, and the method can hardly be expanded. The method of the invention is independent of specific data types, and all forms of clinical medical data can be analyzed and displayed by using the method of the invention, and the method can be adapted to the analysis requirements of different diseases.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art are briefly introduced below; it is obvious that the drawings in the following description are some embodiments of the invention, and that for a person skilled in the art, other drawings can be derived from them without inventive effort.

FIG. 1 is a schematic block diagram of a flow chart of a medical data-oriented multi-factor correlation interactive analysis method according to an embodiment of the present invention;

FIG. 2 is a schematic block diagram of a flow of a feature selection method in a method according to an embodiment of the invention;

FIG. 3 is a diagram illustrating the visualization result of feature clusters in the method according to the embodiment of the present invention;

FIG. 4 is a diagram illustrating the visualization results of all key features in the method according to the embodiment of the present invention;

fig. 5 is a schematic diagram of a partial feature correlation visualization result in the method according to the embodiment of the present invention.

Detailed Description

In order to make the purpose, technical effect and technical solution of the embodiments of the present invention clearer, the following clearly and completely describes the technical solution of the embodiments of the present invention with reference to the drawings in the embodiments of the present invention; it is to be understood that the described embodiments are only some of the embodiments of the present invention. Other embodiments, which can be derived by one of ordinary skill in the art from the disclosed embodiments without inventive faculty, are intended to be within the scope of the invention.

The embodiment of the invention provides a medical data-oriented multi-factor correlation interactive analysis method, which comprises the following steps:

step 1, carrying out standardization processing on collected clinical medical data, and associating according to patient case numbers to obtain a diagnosis sequence for each patient;

and 2, mapping the high-dimensional clinic medical data to a two-dimensional plane by using a t-SNE algorithm to form different characteristic groups. The doctor looks up all the characteristic population distributions and selects the characteristic population to be researched;

step 3, the doctor designates disease characterization indexes, selects the characteristics of the characteristics in the treatment sequence, and determines the characteristic sequence with larger relevance with the indexes;

step 4, measuring the correlation among the characteristics by adopting a statistical method to obtain a result with statistical significance;

and 5, visualizing the correlation among the characteristics.

Preferably, step 1 specifically comprises:

step 1.1, analyzing the collected clinical medical data to eliminate irrelevant features and privacy data in the collected clinical medical data. Irrelevant features include patient name, patient number, etc., which have no effect on the degree of illness of the patient and therefore should be removed; the private data comprises a patient identification number, a mobile phone number and the like, and can be positioned to the individual of the patient, so that ethical risks are easily caused, and therefore the private data also needs to be removed. Particularly, for the characteristics of the patients, such as native place, home address and the like, which are sensitive but have influence on the degree of illness, fuzzy processing should be carried out, namely only fuzzy information of nationality, province and the like is extracted, and specific sensitive information is removed;

and 1.2, eliminating missing values and abnormal values in the acquired clinical medical data. Missing values are values that are not meaningful, including null, "-" etc., that have no definite medical meaning, adversely affect the end result, and therefore should be dealt with. Outliers indicate significantly incorrect values, including values that violate medical knowledge and values that violate general knowledge, which can cause a large disturbance to the end result and therefore should be dealt with as well. The solution to the above two values is: if the vacancy values and outliers are less than 1/10 of the total data, then the piece of data is deleted; otherwise, replacing the vacancy value or the abnormal value by using the average value of the column of data;

and 1.3, eliminating repeated data in the acquired clinical medical data. Duplicate data includes two types: one is fully-duplicated data, such data is deduplicated with only the last strip remaining; the other is that partial data are different, which can be understood as examination records of patients at different times, and all data should be preserved;

step 1.4, for values in the collected clinical medical dataThe data is normalized, i.e. for the same characteristic data x_i：

Where X is the set of all values of a numerical characteristic, X_iDenotes the ith element in X, i ═ 1,2, 3.. n, min (X) denotes the minimum value in X, max (X) denotes the maximum value in X;

and 1.5, encoding the classified data in the acquired clinical medical data, and converting the classified data into a format which can be utilized by an algorithm. The converted coding format is:

where Y is the converted code vector, Y_iDenotes the ith value in the vector, i 1,2, 3.

Step 1.6, splicing the processed data according to the patient case number to generate a diagnosis sequence T of a single patient:

T＝{x_i,y_j,z_k,...}，

in the formula, x_i,y_j,z_kI, j, k is 1,2, 3.. n, which respectively represents different types of clinical medical data, and the data belong to the same patient and have the same case number. T is a high dimensional vector representing the sequence of visits by a single patient.

Specifically, the step 2 specifically includes:

and 2.1, mapping the high-dimensional clinic sequence vector to a two-dimensional plane by using a T-SNE algorithm for the patient clinic sequence T obtained in the previous step to generate different feature groups.

Wherein, the t-SNE algorithm can be expressed as:

for n high-dimensional data x₁,x₂,…,x_nAnd transforming Euclidean distance between data into joint probability to characterize similarity, wherein the formula is as follows:

wherein:

the objective function is expressed as:

where P is the joint probability distribution of each point in the high-dimensional space and Q is the joint probability distribution of each point in the low-dimensional space.

The optimized gradient is as follows:

definition of the perplexity:

H(P_i)＝-∑_jp_ijlog₂p_ij，

the concrete solving steps are as follows:

step 2.2, the group G to be studied is interactively selected by the physician according to the feature group generated in the previous step:

G＝{T₁,T₂,...,T_i}，

in the formula, T_iRepresenting the visit order of the ith patient in the selected feature populationColumn, i ═ 1,2, 3.

Specifically, step 3 specifically includes:

step 3.1, the physician specifies a characterization index P of the extent of disease progression. There are also differences in the indicators used medically to characterize the degree of disease progression for different diseases. Therefore, there is a need for physicians to interactively specify characterization indicators for a particular problem to measure the severity of the disease;

and 3.2, removing the low variance features. A variance value of a feature that is less than the threshold value indicates that the feature has little fluctuation in variation across all patients, i.e., it represents that the feature has little effect on the patient's disease progression and should be removed. In particular, in this scheme, the threshold takes 0;

and 3.3, sequencing all the characteristics according to the correlation with the disease development degree, determining k characteristics which are most critical to the disease development, and finishing the characteristic selection. The method comprises the following specific steps:

step 3.3.1, constructing a classifier taking a decision tree as a base learning device, and marking as F;

step 3.3.2, sending data containing all the characteristics into a classifier to predict a disease development degree index P, and obtaining a reference prediction result O:

O＝F(t₁,t₂,...,t_n)，

in the formula, t_i(i 1, 2.., n) denotes data including the i-th feature, and n denotes the number of features.

Step 3.3.3, the data without the ith characteristic is sent to a classifier for prediction again to obtain a new prediction result O_i:

O_i＝F(t₁,t₂,...t_i-1,t_i+1,...,t_n)，

Step 3.3.4, calculating the prediction result O after the ith characteristic is removed_iThe difference from the baseline prediction O as the degree of effect of the feature on disease progression Δ O_i：

ΔO_i＝|O_i-O|，

In the formula,. DELTA.O_i(i ═ 1,2, 3.., n) indicates the degree of influence of the i-th feature on the disease progression, and the larger the value, the greater the influence of the feature on the disease progression, i.e., the more critical the feature is.

Step 3.3.5, repeat the above process until all n features have obtained the critical measurement index Δ O_i；

Step 3.3.6, according to. DELTA.O_iSequencing all the features from big to small to obtain the first k most important features, and finishing feature selection:

{t₁,t₂,...t_k}＝sort(ΔO₁,ΔO₂,...,ΔO_n)，

where sort () represents the sorting function and takes the top k values.

Specifically, the statistical measurement index in step 4 includes pearson correlation coefficient, u test, t test, variance analysis (one-way variance analysis, multivariate variance analysis, etc.), and unitary regression or multivariate regression analysis based on the central limit theorem. The main function of the step is to perform statistical analysis on the k features selected in the feature selection process of the previous step, so as to ensure that the k features conform to a hypothesis testing rule on statistics.

Specifically, step 5 specifically includes:

step 5.1, drawing a parallel coordinate system among the characteristics by taking each characteristic selected in the characteristic selection process as a longitudinal axis and a treatment sequence of each patient as a transverse axis, and visually displaying a dependence change rule among different characteristics;

and 5.2, selecting two characteristics with strong correlation, mapping data containing the characteristics to a two-dimensional plane taking the two characteristics as coordinate axes, and displaying the correlation relationship between the two characteristics in a visualized manner.

In summary, the multi-factor correlation interactive analysis method for high-dimensional medical data provided by the embodiment of the invention designs a complete process from original clinical medical data to a final correlation visualization result, and can directly show the dependence change rule between key features in the high-dimensional medical data. The method comprises the steps of firstly processing acquired original clinical medical data, removing invalid information, sensitive information, missing values and abnormal values in the data, respectively adopting a standardization and coding processing method aiming at numerical data and classified data, splicing according to case numbers, and generating a patient treatment sequence; then mapping the high-dimensional visit sequence data to a two-dimensional plane to generate a characteristic group, and interactively selecting a group to be researched by a doctor; further performing feature selection on the data of the group of patients, calculating a key metric index of each feature for a final prediction result, selecting the first few features which are most key after sorting, performing hypothesis test on the selected features by a statistical method, and verifying the statistical significance level of the correlation among the features; and further, a parallel coordinate system and a two-dimensional coordinate system are adopted to respectively visually display all the characteristics and the dependency change relationship between every two characteristics, and the influence of different characteristics on the disease development is analyzed. The invention displays the key factors for the disease development by visually displaying the dependency change relationship among the characteristics in the high-dimensional medical data, and provides the statistically significant evidence. The analysis method can overcome the defect that high-dimensional medical data is complex in abstraction and difficult to analyze; in the traditional method, the analysis process of high-dimensional medical data is calculated by means of complex statistics, the calculated amount is large, the calculation principle is difficult to understand, and great difficulty is caused to clinical diagnosis and medical research. The invention simplifies the whole analysis process, introduces medical knowledge of doctors through an interactive method, visually displays the whole analysis process, reduces the calculated amount and makes the dependency change relationship between variables easier to understand. The invention considers the multi-factor dependence relationship among high-dimensional medical data; in the traditional method, the analysis of high-dimensional medical data only can analyze the relation between two variables by means of pairwise variation relation between the variables, and the multivariate relation cannot be modeled. The invention adopts the methods of dimension reduction mapping, feature selection, parallel coordinate system drawing and the like, fully considers the dependency change relationship among the multivariable, and leads the analysis result of the high-dimensional medical data to be more accurate. The method is suitable for clinical medical data under various diseases, and has strong expandability; under the traditional method, a special analysis algorithm needs to be designed according to different diseases and different data types, and the method can hardly be expanded. The method of the invention is independent of specific data types, all forms of clinical medical data can be analyzed and displayed by using the method, and the method can be adapted to the analysis requirements of different diseases.

DETAILED DESCRIPTION OF EMBODIMENT (S) OF INVENTION

In the embodiment, diagnosis and treatment data of oncology department of the second subsidiary hospital of the university of western transportation are selected to explain the implementation process of the method. It should be noted that, as an example, the present embodiment only lists a part of data segments and visualization results to illustrate the implementation process of the method, and the actual clinical medical data far exceeds the range of the list.

In an embodiment of the invention, the clinical medical data comprises:

firstly, the collected data is processed, the data such as name, date of birth, identification card number, telephone and the like are removed, and the type information such as gender and the like is coded, for example, the male code is 0, the female code is 1, the Chinese code is 000000, the Hui code is 000001 and the like. The data was then normalized to remove missing, outliers and duplicates. And then, correlating the patient data according to the case number to obtain a clinic sequence T of the patient. The visit sequence T in this example is a 3219 × 29 matrix, representing 3219 patients, each of whom contained 29-dimensional visit records. Wherein the record for patient 1008 is:

T₁₀₀₈＝{1008,0,0,0,0,0,0,1,...,0.29,1,0.87,...0.90,0,0,1}，

and mapping part of the diagnosis sequences to a two-dimensional plane to obtain the visualization result of the figure 3. As can be seen from the figure, these data form two feature populations, which are two different diseases, as can be seen in conjunction with specific data and medical knowledge.

The feature population to be studied is then selected by the physician. In this embodiment, the physician selects the right feature population. And (4) selecting features of the right data, wherein the specific flow refers to fig. 2. By calculating the influence of each feature on the disease progression Δ O_iAnd sorting the components in the order from big to small to obtain the characteristics with the largest influence on the disease as follows: basophil counting, discharge diagnosis, erythrocyte volume width CV and thrombocyte specific volume. Calculating statistical indicators of these features can verify that these features are significantly correlated with the degree of progression of the disease.

The dependency relationships between these features are then visualized. The visualization result in the parallel coordinate system is plotted with each feature as the vertical axis and the visit sequence of each patient in the feature population as the horizontal axis, respectively, as shown in fig. 4.

And then carrying out visual display on the correlation between every two characteristics. In this embodiment, two features of the T value and the BMD value are selected, and data related to the two features are mapped onto a two-dimensional plane, and the visualization result is as shown in fig. 5. As can be seen from the graph, the BMD value and the T value have high correlation and present a consistent distribution trend. This is consistent with the experience of clinical medical practice with medical explanations.

Although the present invention has been described in detail with reference to the above embodiments, those skilled in the art can make modifications and equivalents to the embodiments of the present invention without departing from the spirit and scope of the present invention, which is set forth in the claims of the present application.

Claims

1. A medical data-oriented multi-factor correlation interactive analysis method is characterized by comprising the following steps:

2. The medical data-oriented multi-factor correlation interactive analysis method according to claim 1, wherein in step 1, the specific step of processing the acquired medical data comprises:

(1.3) eliminating completely duplicated data in the medical data;

wherein, y_kDenotes the kth value in the encoded vector, k 1,2, 3.

3. The medical data-oriented multi-factor correlation interactive analysis method as claimed in claim 1, wherein the visit sequence T of each patient obtained in step 1 is expressed as:

T＝{x_a,y_b,z_c,...}，

G＝{T₁,T₂,...,T_p,…,T_d}，

4. The medical data-oriented multi-factor correlation interactive analysis method according to claim 1, wherein the step 3 specifically comprises:

5. The medical data-oriented multi-factor correlation interactive analysis method according to claim 4, wherein in step (3.2), the step of ranking the remaining features according to the correlation with disease characterization indicators and determining the k features most critical to disease progression specifically comprises:

O＝F(t₁,t₂,...，t_q...，t_e)，

O_i＝F(t₁,t₂,...t_r-1,t_r+1,...,t_e)；

ΔO_r＝|O_r-O|，

in the formula,. DELTA.O_rR 1,2,3, e denotes the degree of influence of the r-th feature on disease progression; wherein, Δ O_rThe larger the size, the more critical it is to represent the greater the impact of the r-th feature on the progression of the disease；

{t₁,t₂,...t_s}＝sort(ΔO₁,ΔO₂,...,ΔO_n)，

in the formula, sort () represents a sorting function.

6. The medical data-oriented multi-factor correlation interactive analysis method according to claim 1, wherein in step 4, the statistical metric index comprises: pearson correlation coefficient, u test, t test, analysis of variance, unitary regression based on central limit theorem or multiple regression analysis.

7. The medical data-oriented multi-factor correlation interactive analysis method according to claim 5, further comprising:

8. The medical data-oriented multi-factor correlation interactive analysis method according to claim 7, wherein the step 5 specifically comprises: