CN111243753B

CN111243753B - Multi-factor correlation interactive analysis method for medical data

Info

Publication number: CN111243753B
Application number: CN202010125946.6A
Authority: CN
Inventors: 钱步月; 刘涛; 郑莹倩; 刘璇; 吕欣; 许靖琴; 侯梦薇; 吴风浪
Original assignee: Xian Jiaotong University
Current assignee: Xian Jiaotong University
Priority date: 2020-02-27
Filing date: 2020-02-27
Publication date: 2024-04-02
Anticipated expiration: 2040-02-27
Also published as: CN111243753A

Abstract

The invention discloses a multi-factor correlation interactive analysis method for medical data, which comprises the following steps: processing the acquired medical data, and correlating the processed medical data according to the patient case number to obtain a treatment sequence of each patient; mapping the acquired diagnosis sequence onto a two-dimensional plane by using a t-SNE algorithm to form different characteristic groups; selecting a characteristic group from the characteristic groups according to the requirement; setting a disease characterization index; performing feature selection on features of the selected feature population, and determining a feature sequence related to the disease characterization index; and measuring the correlation among the selected features by adopting a statistical measurement index to obtain a result with statistical significance, and completing multi-factor correlation interactive analysis. The invention can interactively analyze the high-dimensional medical data and visually display key factors influencing the disease development.

Description

Multi-factor correlation interactive analysis method for medical data

Technical Field

The invention belongs to the technical field of multi-factor correlation analysis, and particularly relates to a multi-factor correlation interactive analysis method for medical data.

Background

Medical statistics is a science which applies the basic principles and methods of statistics and mainly researches the collection, arrangement, analysis, expression and interpretation of data information in medicine and related fields. In clinical medical research, according to the existing clinical medical data and combining the existing medical knowledge, multi-factor correlation analysis is performed by calculating statistical characteristics such as pearson correlation coefficient and the like, and key factors with great influence on disease development are determined. However, the medical data is high-dimensional and complex, the traditional method needs heavy calculation, and the result is abstract and difficult to understand, so that doctors are not facilitated to develop diagnosis and treatment and scientific research; the disease development is often related to various factors, and the traditional method can only calculate the correlation between two factors at present, so that the effectiveness of the result is affected.

In summary, a new multi-factor correlation interactive analysis method for high-dimensional medical data is needed.

Disclosure of Invention

The invention aims to provide a multi-factor correlation interactive analysis method for medical data, which aims to solve one or more technical problems. The invention can interactively analyze the high-dimensional medical data and visually display key factors influencing the disease development.

In order to achieve the above purpose, the invention adopts the following technical scheme:

the invention discloses a multi-factor correlation interactive analysis method for medical data, which comprises the following steps of:

step 1, processing acquired medical data, and correlating the processed medical data according to patient case numbers to obtain a treatment sequence of each patient; wherein the processing includes normalization processing;

step 2, mapping the diagnosis sequence obtained in the step 1 onto a two-dimensional plane by using a t-SNE algorithm to form different characteristic groups; selecting a characteristic group from the characteristic groups according to the requirement;

step 3, setting disease characterization indexes; performing feature selection on the features of the feature population selected in the step 2, and determining a feature sequence related to the disease characterization index;

and step 4, measuring the correlation among the features selected in the step 3 by adopting a statistical measurement index to obtain a result with statistical significance, and completing multi-factor correlation interactive analysis.

The invention further improves that in the step 1, the specific steps of processing the acquired medical data comprise:

(1.1) eliminating extraneous features and privacy data in the medical data; wherein the extraneous feature includes: patient name, patient serial number, privacy data includes: patient identification number, patient handset number;

(1.2) eliminating missing values and outliers in the medical data; wherein the missing values include: null, "-", outliers include: a value that violates medical knowledge and a value that violates common knowledge;

(1.3) eliminating completely duplicate data in the medical data;

(1.4) performing normalization processing on numerical data in the medical data, including: for the same characteristic data x _i ，

Wherein X is the set of all values of a certain numerical feature, X _i Represents the i-th element in X, i=1, 2,3,..n, n represents the total number of elements, min (X) represents the minimum value in set X, and max (X) represents the maximum value in set X;

(1.5) encoding the category type data in the medical data to obtain an encoding vector Y; wherein, the coding format is:

wherein y is _k Represents the kth value in the encoded vector, k=1, 2,3,..m, m represents the number of elements in the encoded vector, j represents the class number to which the data belongs.

The invention further improves that in the step 1, the visit sequence T of each patient is obtained, and the expression is:

T＝{x _a ,y _b ,z _c ,...}，

wherein x is _a ,y _b ,z _c A, b, c=1, 2,3,..l, respectively represent different types of medical data belonging to the same patient; l represents the number of elements of each type of medical data;

in the step 2, a feature group G to be researched is selected from feature groups according to the requirement, and the expression is:

G＝{T ₁ ,T ₂ ,...,T _p ,…,T _d }，

wherein T is _p Representing the sequence of visits by the p-th patient in the feature population to be studied, d=1, 2, 3.

The invention further improves that the step 3 specifically comprises the following steps:

(3.1) when setting the disease characterization index, interactively specifying;

(3.2) selecting the features of the selected feature group, and removing the features with variance values smaller than a threshold value when determining the feature sequence related to the disease characterization index to obtain the removed features; and sorting the removed features according to the correlation with the disease characterization index, determining k features which are most critical to disease characterization, and finishing feature selection and feature sorting.

A further improvement of the present invention is that, in step (3.2), the step of sorting the remaining features according to the correlation with the disease characterization index, and determining k features most critical for disease progression specifically includes:

(3.2.1) constructing a classifier based on a decision tree as a learner, and marking as F;

(3.2.2), sending the data of the removed features into a classifier F, and predicting a disease characterization index P to obtain a reference prediction result O, wherein the expression is as follows:

O＝F(t ₁ ,t ₂ ,...，t _q ...，t _e )，

in the method, in the process of the invention,t _q q=1, 2, where, e represents data containing the q-th feature, e represents the number of features;

(3.2.3) sending the data with the r-th characteristic removed into a classifier for prediction to obtain a prediction result O _r The expression is:

O _i ＝F(t ₁ ,t ₂ ,...t _r-1 ,t _r+1 ,...,t _e )；

(3.2.4) calculating the prediction result O _r The difference from the reference prediction result O is used as the influence degree delta O of the r-th characteristic on the disease development _r The expression is:

ΔO _r ＝|O _r -O|，

wherein DeltaO _r R=1, 2,3, where, e represents the extent of influence of the r-th feature on disease progression; wherein DeltaO _r The larger the r-th feature, the more critical the effect on disease progression;

(3.2.5) repeating steps (3.2.4) and (3.2.5) until all features have an effect on disease progression Δo;

(3.2.6) sorting the features according to the size of the key measurement index to obtain the first s most key features, wherein the expression is as follows:

{t ₁ ,t ₂ ,...t _s }＝sort(ΔO ₁ ,ΔO ₂ ,...,ΔO _n )，

in the equation, sort () represents the sort function.

A further improvement of the present invention is that, in step 4, the statistical measure index includes: pearson correlation coefficient, u-test, t-test, analysis of variance, and monobasic regression or polybasic regression analysis based on the central limit theorem.

A further improvement of the present invention is that it further comprises:

and 5, visualizing the correlation among the s most critical features obtained in the step (3.2.6).

The invention is further improved in that the step 5 specifically comprises the following steps:

(5.1) drawing a parallel coordinate system among the features by taking each feature obtained by feature selection as a vertical axis and a visit sequence of each patient as a horizontal axis, and visually displaying the dependence change rule among different features;

and (5.2) selecting two features, and mapping the data onto a two-dimensional plane taking the two features as coordinate axes for visually displaying the correlation relationship between the two features.

Compared with the prior art, the invention has the following beneficial effects:

the multi-factor correlation interactive analysis method for the high-dimensional medical data provided by the invention designs a complete flow from the original clinical medical data to the final correlation visual result, and can directly display the dependency change rule among key features in the high-dimensional medical data.

Firstly, processing acquired original clinical medical data, removing invalid information, sensitive information, missing values and abnormal values in the data, respectively adopting standardized and coding processing methods aiming at numerical data and category data, and splicing according to medical records to generate a patient treatment sequence; mapping the high-dimensional diagnosis sequence data to a two-dimensional plane to generate a characteristic group, and interactively selecting the group to be studied by a doctor; further selecting the characteristics of the data of the group of patients, calculating key measurement indexes of each characteristic for a final prediction result, selecting the first few most key characteristics after sequencing, carrying out hypothesis testing on the selected characteristics by a statistical method, and verifying the statistically significant level of the correlation between the characteristics; and further, a parallel coordinate system and a two-dimensional coordinate system are adopted to respectively and visually display all the characteristics and the dependence change relation between every two characteristics, and the influence of different characteristics on disease development is analyzed. The invention displays key factors for disease development through visualizing the dependency change relation among the features hidden in the high-dimensional medical data, and provides statistically significant evidence.

The analysis method can solve the defects that the high-dimensional medical data is complex in abstraction and difficult to analyze; the analysis process of the high-dimensional medical data in the traditional method is large in calculated amount by means of complex statistical calculation, the calculation principle is difficult to understand, and great difficulty is caused to clinical diagnosis and treatment and medical scientific research. The invention simplifies the whole analysis flow, introduces medical knowledge of doctors through an interactive method, visually displays the whole analysis process, reduces the calculated amount and ensures that the dependency and change relation among variables is easier to understand.

The invention considers the multi-factor dependence relationship between the high-dimensional medical data; the analysis of high-dimensional medical data under the traditional method only can analyze the relation between two variables by means of the two-to-two change relation between the variables, and the multivariate relation cannot be modeled. The method adopts the methods of dimension reduction mapping, feature selection, drawing a parallel coordinate system and the like, fully considers the dependency change relation among multiple variables, and ensures that the analysis result of the high-dimensional medical data is more accurate.

The method is suitable for clinical medical data under various diseases, and has strong expandability; under the traditional method, a special analysis algorithm is required to be designed according to different diseases and different data types, and the method is hardly expanded. The method does not depend on specific data types, all forms of clinical medical data can be analyzed and displayed by using the method, and the method can adapt to the analysis requirements of different diseases.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the following description of the embodiments or the drawings used in the description of the prior art will make a brief description; it will be apparent to those of ordinary skill in the art that the drawings in the following description are of some embodiments of the invention and that other drawings may be derived from them without undue effort.

FIG. 1 is a schematic block diagram of a multi-factor correlation interactive analysis method for medical data according to an embodiment of the invention;

FIG. 2 is a schematic block diagram of a feature selection method in a method of an embodiment of the invention;

FIG. 3 is a schematic diagram of a feature population visualization result in a method according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of the visualization of all key features in the method of an embodiment of the present invention;

FIG. 5 is a schematic diagram of a part of the feature correlation visualization result in the method according to the embodiment of the invention.

Detailed Description

In order to make the purposes, technical effects and technical solutions of the embodiments of the present invention more clear, the technical solutions of the embodiments of the present invention are clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention; it will be apparent that the described embodiments are some of the embodiments of the present invention. Other embodiments, which may be made by those of ordinary skill in the art based on the disclosed embodiments without undue burden, are within the scope of the present invention.

The multi-factor correlation interactive analysis method for the medical data comprises the following steps:

step 1, carrying out standardized processing on collected clinical medical data, and correlating according to patient case numbers to obtain a diagnosis sequence aiming at each patient;

and 2, mapping the high-dimensional diagnosis sequence to a two-dimensional plane by using a t-SNE algorithm on the processed clinical medical data to form different characteristic groups. A doctor looks up all feature population distributions and selects feature populations to be studied;

step 3, a doctor designates a disease characterization index, performs feature selection on features in a diagnosis sequence, and determines a feature sequence with larger relativity with the index;

step 4, measuring the correlation between the features by adopting a statistical method to obtain a result with statistical significance;

and 5, visualizing the correlation between the features.

Preferably, step 1 specifically includes:

and 1.1, analyzing the acquired clinical medical data, and eliminating irrelevant features and privacy data in the acquired clinical medical data. Irrelevant features include patient name, patient number, etc., which have no effect on the patient's extent of illness and should therefore be removed; the privacy data includes patient identification numbers, cell phone numbers, etc., which can be located to the patient's individual, easily posing ethical risks, and should therefore also be removed. Particularly, for the characteristics that the native place and home address of a patient are sensitive but the disease degree is possibly influenced, fuzzy processing should be performed, namely only fuzzy information such as nationality, province and the like is extracted, and specific sensitive information is removed;

and step 1.2, eliminating missing values and abnormal values in the acquired clinical medical data. Missing values refer to values that have no meaning, including null, "-" etc., that have no clear medical meaning and that have an adverse effect on the end result and therefore should be processed. Outliers refer to significantly incorrect values, including values that violate medical knowledge and values that violate common sense, which cause a large disturbance to the final result and should therefore also be processed. The solutions for the above two values are: if the vacancy value and the outlier are less than 1/10 of the total data, deleting the piece of data; otherwise, replacing the null value or the abnormal value by using the average value of the column data;

and step 1.3, eliminating repeated data in the acquired clinical medical data. The duplicate data includes two classes: one is completely repeated data, and only the last piece of data is reserved after the data is de-duplicated; the other is that partial data are different, which can be understood as examination records of patients at different times, and all data should be reserved;

step 1.4, the numerical data in the acquired clinical medical data is normalized, namely the same characteristic data x _i ：

Wherein X is the set of all values of a certain numerical feature, X _i Represents the i-th element in X, i=1, 2,3,..n, min (X) represents the minimum value in X, and max (X) represents the maximum value in X;

and step 1.5, encoding the category type data in the acquired clinical medical data, and converting the category type data into a format which can be utilized by an algorithm. The converted coding format is:

wherein Y is the converted code vector, Y _i The i-th value in the vector, i=1, 2,3,..n, j, indicates the class number to which the data belongs.

Step 1.6, splicing the processed data according to the patient case number to generate a single patient treatment sequence T:

T＝{x _i ,y _j ,z _k ,...}，

wherein x is _i ,y _j ,z _k I, j, k=1, 2,3,..n, respectively represent different types of clinical medical data, which data belong to the same patient, the patient numbers of which are the same. T is a high-dimensional vector representing the sequence of visits for a single patient.

Specifically, step 2 specifically includes:

and 2.1, mapping the high-dimensional diagnosis sequence vector to a two-dimensional plane by using a T-SNE algorithm on the patient diagnosis sequence T obtained in the previous step, and generating different characteristic groups.

Wherein, the t-SNE algorithm can be expressed as:

for n high-dimensional data x ₁ ,x ₂ ,…,x _n The Euclidean distance between the data is used for converting the Euclidean distance into joint probability to represent similarity, and the formula is as follows:

wherein:

the objective function is expressed as:

where P is the joint probability distribution of each point in the high-dimensional space and Q is the joint probability distribution of each point in the low-dimensional space.

The optimized gradient is as follows:

definition confusion degree:

H(P _i )＝-∑ _j p _ij log ₂ p _ij ，

the specific solving steps are as follows:

step 2.2, interactively selecting the group G to be studied by a doctor according to the characteristic group generated in the previous step:

G＝{T ₁ ,T ₂ ,...,T _i }，

wherein T is _i Representing the sequence of visits by the ith patient in the selected feature population, i=1, 2,3,..n.

Specifically, the step 3 specifically includes:

step 3.1, the doctor designates a characterization index P of the disease development degree. There are also differences in the indices used in medicine to characterize the extent of disease progression for different diseases. Thus, there is a need for doctors to interactively specify characterization indicators for specific problems to measure the severity of the disease;

step 3.2, removing the low variance feature. The variance value of the feature being less than the threshold value indicates that the variation fluctuation of the feature across all patients is small, i.e. it is indicative that the feature has less impact on the progression of the patient's disease and should be removed. In particular, in this scheme, the threshold value takes 0;

and 3.3, sorting all the features according to the correlation with the disease development degree, determining k features which are most critical to the disease development, and finishing feature selection. The method comprises the following specific steps:

step 3.3.1, constructing a classifier which takes a decision tree as a base learner, and marking the classifier as F;

step 3.3.2, sending the data containing all the characteristics into a classifier to predict the disease development degree index P, and obtaining a reference prediction result O:

O＝F(t ₁ ,t ₂ ,...,t _n )，

wherein t is _i (i=1, 2,) n represents data containing the i-th feature, n represents the number of features.

Step 3.3.3, sending the data with the ith feature removed into a classifier to predict again to obtain a new prediction result O _i :

O _i ＝F(t ₁ ,t ₂ ,...t _i-1 ,t _i+1 ,...,t _n )，

Step 3.3.4, calculating the predicted result O after removing the ith feature _i The difference from the reference prediction result O is used as the influence degree delta O of the characteristic on the disease development _i ：

ΔO _i ＝|O _i -O|，

In DeltaO _i (i=1, 2,3,., n) represents the extent of effect of the i-th feature on disease progression, the greater the value representing the greater the effect of the feature on disease progression, i.e., the more critical the feature.

Step 3.3.5 repeating the above process until all n features are obtainedCritical metric ΔO _i ；

Step 3.3.6 according to ΔO _i All the features are sequenced from large to small to obtain the first k most important features, and feature selection is completed:

{t ₁ ,t ₂ ,...t _k }＝sort(ΔO ₁ ,ΔO ₂ ,...,ΔO _n )，

where sort () represents the ranking function and takes the first k values.

Specifically, the statistical metrics in step 4 include pearson correlation coefficient, u-test, t-test, analysis of variance (single-factor analysis of variance, multiple-element analysis of variance, etc.), and performing unitary regression or multiple regression analysis based on the central limit theorem. The main function of the step is to perform statistical analysis on k features selected in the feature selection process of the previous step, so as to ensure that the k features conform to a statistical hypothesis testing rule.

Specifically, step 5 specifically includes:

step 5.1, drawing a parallel coordinate system among the features by taking each feature selected in the feature selection process as a vertical axis and a treatment sequence of each patient as a horizontal axis, and visually displaying the dependence change rule among different features;

and 5.2, selecting two features with stronger correlation, mapping data containing the features onto a two-dimensional plane taking the two features as coordinate axes, and visually displaying the correlation relationship between the two features.

In summary, the multi-factor correlation interactive analysis method for high-dimensional medical data provided by the embodiment of the invention designs a complete flow from original clinical medical data to final correlation visualization results, and can directly display the dependence change rule among key features in the high-dimensional medical data. Firstly, processing acquired original clinical medical data, removing invalid information, sensitive information, missing values and abnormal values in the data, respectively adopting standardized and coding processing methods aiming at numerical data and category data, and splicing according to medical records to generate a patient treatment sequence; mapping the high-dimensional diagnosis sequence data to a two-dimensional plane to generate a characteristic group, and interactively selecting the group to be studied by a doctor; further selecting the characteristics of the data of the group of patients, calculating key measurement indexes of each characteristic for a final prediction result, selecting the first few most key characteristics after sequencing, carrying out hypothesis testing on the selected characteristics by a statistical method, and verifying the statistically significant level of the correlation between the characteristics; and further, a parallel coordinate system and a two-dimensional coordinate system are adopted to respectively and visually display all the characteristics and the dependence change relation between every two characteristics, and the influence of different characteristics on disease development is analyzed. The invention displays key factors for disease development through visualizing the dependency change relation among the features hidden in the high-dimensional medical data, and provides statistically significant evidence. The analysis method can solve the defects that the high-dimensional medical data is complex in abstraction and difficult to analyze; the analysis process of the high-dimensional medical data in the traditional method is large in calculated amount by means of complex statistical calculation, the calculation principle is difficult to understand, and great difficulty is caused to clinical diagnosis and treatment and medical scientific research. The invention simplifies the whole analysis flow, introduces medical knowledge of doctors through an interactive method, visually displays the whole analysis process, reduces the calculated amount and ensures that the dependency and change relation among variables is easier to understand. The invention considers the multi-factor dependence relationship between the high-dimensional medical data; the analysis of high-dimensional medical data under the traditional method only can analyze the relation between two variables by means of the two-to-two change relation between the variables, and the multivariate relation cannot be modeled. The method adopts the methods of dimension reduction mapping, feature selection, drawing a parallel coordinate system and the like, fully considers the dependency change relation among multiple variables, and ensures that the analysis result of the high-dimensional medical data is more accurate. The method is suitable for clinical medical data under various diseases, and has strong expandability; under the traditional method, a special analysis algorithm is required to be designed according to different diseases and different data types, and the method is hardly expanded. The method of the invention does not depend on specific data types, can be used for remembering and analyzing and displaying all types of clinical medical data, and can adapt to the analysis requirements of different diseases.

DETAILED DESCRIPTION OF EMBODIMENT (S) OF INVENTION

In this embodiment, diagnosis and treatment data of oncology department of a second affiliated hospital of the western traffic university is selected to illustrate the implementation process of the method. It should be noted that, as an example, the present embodiment only enumerates a portion of the data segments and the visualization results to illustrate the implementation of the method, and the actual clinical medical data is far beyond the enumerated range.

In an embodiment of the present invention, the clinical medical data includes:

firstly, the collected data is processed, the data such as name, birth date, ID card number, telephone number and the like are removed, and the gender and other type information are coded, for example, the male code is 0, the female code is 1, the Han nationality code is 000000, the Hui nationality code is 000001 and the like. The data is then normalized to remove missing values, outliers, and duplicate values. And then, the patient data are related according to the patient record number, and the patient treatment sequence T of the patient is obtained. In this example, the visit sequence T is a matrix of 3219 x 29 representing 3219 patients each containing a 29-dimensional visit record. Wherein the 1008 th patient was recorded as:

T ₁₀₀₈ ＝{1008,0,0,0,0,0,0,1,...,0.29,1,0.87,...0.90,0,0,1}，

and mapping part of the treatment sequence to a two-dimensional plane to obtain the visual result of fig. 3. As can be seen from the figure, these data form two characteristic populations, which, in combination with specific data and medical knowledge, are known to be two different diseases.

The physician next selects the population of features to be studied. In this embodiment, the physician selects the right feature population. Feature selection is performed on the right data, and the specific flow is shown in fig. 2. By calculating the effect ΔO of each feature on disease progression _i And ordered in order from big to small, the features that have the greatest impact on the disease are: basophil count, discharge diagnosis, red blood cell volume width CV, platelet specific volume. Calculating statistical measures of these features can verify that these features are significantly correlated with the extent of disease progression.

The dependency between these features is then visualized. The visual result in the parallel coordinate system is plotted as shown in fig. 4, with each feature as the vertical axis and the visit sequence of each patient in the feature group as the horizontal axis.

The correlation between features is then visualized. In this embodiment, two features of the T value and the BMD value are selected, and the data related to the two features is mapped onto a two-dimensional plane, and the visualization result is shown in fig. 5. It can be seen from the graph that the correlation degree between the BMD value and the T value is higher, and the BMD value and the T value show consistent distribution trend. This is consistent with experience in clinical medical practice, with medical interpretation.

The above embodiments are only for illustrating the technical solution of the present invention and not for limiting the same, and although the present invention has been described in detail with reference to the above embodiments, one skilled in the art may make modifications and equivalents to the specific embodiments of the present invention, and any modifications and equivalents not departing from the spirit and scope of the present invention are within the scope of the claims of the present invention.

Claims

1. The multi-factor correlation interactive analysis method for the medical data is characterized by comprising the following steps of:

step 4, measuring the correlation among the features selected in the step 3 by adopting a statistical measurement index to obtain a result with statistical significance, and completing multi-factor correlation interactive analysis;

in step 1, the specific step of processing the acquired medical data includes:

(1.3) eliminating completely duplicate data in the medical data;

wherein y is _k Represents the kth value in the encoded vector, k=1, 2,3,..m, m represents the number of elements in the encoded vector, j represents the class number to which the data belongs;

in step 1, the obtained treatment sequence T of each patient has the expression:

T＝{x _a ,y _b ,z _c ,...}，

G＝{T ₁ ,T ₂ ,...,T _p ,…,T _d }，

wherein T is _p A sequence of visits representing the p-th patient in the feature population to be studied, d = 1,2,3,..d, d represents the number of patients in the feature population to be studied;

the step 3 specifically comprises the following steps:

(3.2) selecting the features of the selected feature group, and removing the features with variance values smaller than a threshold value when determining the feature sequence related to the disease characterization index to obtain the removed features; sorting the removed features according to the correlation with the disease characterization index, determining k features which are most critical to disease characterization, and finishing feature selection and feature sorting;

in the step (3.2), the remaining features are ranked according to the correlation with the disease characterization index, and the step of determining k features most critical to disease progression specifically includes:

O＝F(t ₁ ,t ₂ ,...，t _q ...，t _e )，

wherein t is _q Q=1, 2, where, e represents data containing the q-th feature, e represents the number of features;

O _i ＝F(t ₁ ,t ₂ ,...t _r-1 ,t _r+1 ,...,t _e )；

ΔO _r ＝|O _r -O|，

{t ₁ ,t ₂ ,...t _s }＝sort(ΔO ₁ ,ΔO ₂ ,...,ΔO _n )，

in the equation, sort () represents the sort function.

2. The method of claim 1, wherein in step 4, the statistical measure comprises: pearson correlation coefficient, u-test, t-test, analysis of variance, and monobasic regression or polybasic regression analysis based on the central limit theorem.

3. The method of multi-factor correlation interactive analysis for medical data according to claim 1, further comprising:

4. A multi-factor correlation interactive analysis method for medical data according to claim 3, wherein step 5 specifically comprises: