CN113707330A

CN113707330A - Mongolian medicine syndrome differentiation model construction method, system and method

Info

Publication number: CN113707330A
Application number: CN202110872486.8A
Authority: CN
Inventors: 陈永波; 刘勇国; 张云; 朱嘉静; 杨尚明; 李巧勤
Original assignee: University of Electronic Science and Technology of China
Current assignee: University of Electronic Science and Technology of China
Priority date: 2021-07-30
Filing date: 2021-07-30
Publication date: 2021-11-26
Anticipated expiration: 2041-07-30
Also published as: CN113707330B

Abstract

The invention discloses a construction method, a system and a method of a Mongolian medicine syndrome differentiation model, wherein the construction method comprises the following steps: s1, medical record data preprocessing: acquiring different symptoms in the medical record set and expressing the symptoms as a symptom set F, acquiring different syndromes in the medical record set and expressing the syndromes as a syndrome set Y, wherein each medical record in the medical record set is a sample; s2, constructing a neighborhood feature-label correlation calculation model of the sample; s3, constructing a label correlation calculation model of the sample; s4, constructing a calculation model of the interaction coefficient; and S5, constructing a gravity calculation model between the samples in the medical record set based on the calculation model of the interaction coefficient obtained in the step S4 and the heterogeneous overlapping Euclidean measurement distance between the samples, calculating positive and negative discrimination scores, and performing label prediction. The invention not only considers the correlation between symptoms and syndromes, but also combines the correlation between syndromes and syndromes, improves the accuracy of syndrome differentiation results and provides auxiliary decision for the diagnosis and treatment process of doctors.

Description

Mongolian medicine syndrome differentiation model construction method, system and method

Technical Field

The invention relates to the technical field of medical information management, in particular to a construction method, a system and a method of a Mongolian medicine syndrome differentiation model.

Background

Mongolian medicine is an important component of national medicine, has a history of more than 2000 years, and has a unique theoretical system and a good medical effect. Syndrome differentiation is the basis of Mongolian diagnosis and treatment, and the pathology, the disease location and the disease nature of a disease are analyzed through three diagnoses of inspection, inquiry and palpation, so as to determine the label of the disease.

In the traditional Mongolian medicine dialectical process, a doctor obtains the characteristics and physical sign information of a patient through observation and statement of the patient, and then carries out comprehensive analysis by combining factors such as diet, daily life, character and the like of the patient according to personal knowledge and experience to obtain a dialectical result. The syndrome differentiation results are subjective and depend to some extent on the personal experience and knowledge level of the doctor.

At present, there are two types of methods for objectively studying Mongolian medical syndrome differentiation: the Mongolian medicine differentiation diagnosis is quantitatively researched by using a scale method from the whole thinking, so that the Mongolian medicine clinical differentiation diagnosis is objectively and standardizedly facilitated, most of the methods depend on expert experience consensus and have subjectivity; the other type of analysis starts locally with the relationship between the index (also called the feature) and the syndrome type (also called the label) for a specific disease.

Most of the existing Mongolian medicine syndrome differentiation studies only explore the relationship between symptoms and syndromes, and relate to the relationship between syndromes and syndromes less, aiming at the diseases with two or three syndromes existing simultaneously, such as 'Mengkri' disease, which contains three syndromes of Heryi preponderance type, Hira preponderance type and Ba Da gan preponderance type, if only considering the relationship between the three syndromes of Heryi preponderance type, Hira preponderance type and Ba Da gan preponderance type and the conventional index of blood, the relationship between the three syndromes of Heryi preponderance type, His preponderance type and Ba Da gan preponderance type is not considered, thus the accuracy of syndrome differentiation results is poor.

Disclosure of Invention

The invention aims to provide a construction method and a system of a Mongolian medicine syndrome differentiation model.

The invention is realized by the following technical scheme:

a construction method of a Mongolian medicine syndrome differentiation model comprises the following steps:

s1, medical record data preprocessing:

acquiring different symptoms in a medical record set and expressing the symptoms as a symptom set F, acquiring different syndromes in the medical record set and expressing the syndromes as a syndrome set Y, wherein each medical record in the medical record set is a sample, the symptoms and syndromes in the sample are coded by adopting 0-1, the symptoms in the sample are characteristics, and the syndromes are labels;

s2, constructing a neighborhood feature-label correlation calculation model of the sample:

constructing a calculation model of the correlation between each feature and the label set based on the mutual information correlation between the features in the neighborhood of the sample and the labels and the average precision correlation between the features in the neighborhood of the sample and the labels; summing the correlation of each feature and the label set to obtain a neighborhood feature-label correlation calculation model of the sample;

s3, constructing a label correlation calculation model of the sample:

respectively constructing correlation calculation models of every two labels in the samples based on the number of the samples with two different labels in the neighborhood set in the samples, and taking the maximum value of the correlation in the samples as the label correlation of the samples;

s4, constructing a calculation model of an interaction coefficient based on the neighborhood characteristic-label correlation calculation model of the sample obtained in the step S2, the correlation calculation model of every two labels obtained in the step S3 and balance parameters of the two correlations;

s5, based on the interaction coefficient calculation model obtained in the step S4 and the heterogeneous overlapping Euclidean distance between the samples, a gravity calculation model between the samples in the medical record set is constructed, the gravity is calculated based on the calculation model, the gravity is summed, a positive judgment score calculation model and a negative judgment score calculation model are respectively constructed, and whether the samples belong to a certain syndrome in the syndrome set Y or not is judged by comparing the calculated values of the positive judgment score calculation model and the negative judgment score calculation model.

The invention carries on the objectification processing to the symptom and syndrome of the original medical record through the 0-1 coding mode, express it as the format convenient for the computer to process, the invention uses the mutual information, average precision and probability, etc. mode, measure the correlation of the characteristic and label, label correlation of the sample, calculate the sample interaction coefficient; and calculating and comparing the positive and negative discrimination scores by constructing a gravity formula among samples to obtain a final syndrome differentiation result.

The construction method of the invention fully considers the correlation between symptoms and syndromes and the correlation between syndromes, can highlight the effect of symptoms which have great influence on the syndrome differentiation result, is beneficial to improving the accuracy of the syndrome differentiation result, has interpretability of the result, and can provide auxiliary decision for traditional Chinese medicine syndrome differentiation.

Further, in step S2, the calculation model of the correlation between each feature and the label set is shown as follows:

in the formula, wherein, MIR_h(N_i) Representing the feature f in the neighborhood of sample i_hMutual information correlation with tags, APR_h(N_i) Representing the feature f in the neighborhood of sample i_hAverage accuracy correlation with tags; n is a radical of_iRepresenting a neighborhood of sample i.

Further, in step S4, the calculation model of the interaction coefficient is shown as follows:

IC_i＝αM_i+(1-α)R_i

in the formula, alpha represents a balance parameter of two correlations, M_iRepresenting neighborhood feature-label correlation of the sample i; r_iTag correlation representing sample i

Further, in step S5,

the gravity calculation model is shown as follows:

in the formula, IC_jGravitational coefficient between sample j and other samples, d_F(i, j) is the heterogeneous overlapping euclidean metric distance between sample i and sample j.

Further, in step S1, the encoding with 0-1 specifically includes:

the symptom in the existing symptom set F in each case is 1, and the symptom in the absent symptom set F is 0; the syndrome in the existing syndrome set Y in each medical record is 1, and the syndrome in the absent syndrome set Y is 0.

A Mongolian medicine dialectical method comprises the following steps:

s1, medical record data preprocessing:

s2, feature-tag correlation analysis:

calculating the correlation between each feature and the label set based on the mutual information correlation between the features in the neighborhood of the sample and the labels and the average precision correlation between the features in the neighborhood of the sample and the labels, summing the correlation between each feature and the label set to obtain the neighborhood feature-label correlation of the sample,

s3, analyzing label correlation of the sample:

respectively calculating the correlation of every two labels in the sample based on the number of samples with two different labels in the neighborhood set in the sample, and taking the maximum value of the correlation in the sample as the label correlation of the sample;

s4, calculation of interaction coefficient:

calculating an interaction coefficient of the sample based on the neighborhood feature-label correlation of the sample obtained in the step S2, the label correlation of the sample obtained in the step S3, and a balance parameter of the two correlations;

s5, label prediction:

based on the interaction coefficient obtained in the step S4 and the heterogeneous overlapping euclidean metric distance between the samples, calculating the attraction force between the samples in the medical record set, calculating the magnitude of the attraction force for each sample belonging to the label in the nearest neighbor set of the samples, and summing the magnitudes to obtain the positive discrimination score of the sample for the label; and calculating the gravity of each sample which does not belong to the label in the nearest neighbor set of the sample, summing the gravity to obtain the negative discrimination score of the sample to the label, comparing the positive discrimination score with the negative discrimination score, and judging whether the sample belongs to a certain syndrome in the syndrome set Y.

Further, the positive discrimination score and the negative discrimination score are compared, if the positive discrimination score is larger than the negative discrimination score, the syndrome exists in the sample, and if the positive discrimination score is smaller than the negative discrimination score, the syndrome does not exist in the sample.

A Mongolian medical dialectical system, comprising:

a data preprocessing module: the system is used for acquiring different symptoms in a medical record set and expressing the symptoms as a symptom set F, acquiring different syndromes in the medical record set and expressing the syndromes as a syndrome set Y, wherein each medical record in the medical record set is a sample, and the symptoms and syndromes in the sample are coded by adopting 0-1;

a feature-tag relevance analysis module: the method comprises the steps of calculating the correlation between each feature and a label set based on the mutual information correlation between the features and the labels in the neighborhood of a sample and the average precision correlation between the features and the labels in the neighborhood of the sample; summing the correlation of each feature and the label set to calculate neighborhood feature-label correlation of the sample, and constructing a correlation matrix of the neighborhood feature and the label of the sample according to the neighborhood feature-label correlation of the sample;

a label correlation analysis module of the sample: the method comprises the steps of calculating the correlation of every two labels in a sample respectively based on the number of samples with two different labels in a neighborhood set in the sample, and taking the maximum value of the correlation in the sample as the label correlation of the sample; constructing a label correlation matrix of the samples according to the label correlation of each sample;

an interaction coefficient calculation module: the system comprises a characteristic-label correlation analysis module, a sample correlation analysis module and a sample correlation analysis module, wherein the characteristic-label correlation analysis module is used for calculating the neighborhood characteristic-label correlation of the sample, the sample correlation calculated by the sample correlation analysis module, and the balance parameters of the two correlations to calculate the interaction coefficient of the sample;

a label prediction module: the system comprises a module for obtaining an interaction coefficient calculated by an interaction coefficient calculation module, calculating the gravitation between samples in a medical record set by combining heterogeneous overlapping Euclidean measurement distance between the samples, calculating the gravitation size of each sample belonging to a label in the nearest neighbor set of the samples, and summing to obtain the positive discrimination score of the sample to the label; and calculating the gravity of each sample which does not belong to the label in the nearest neighbor set of the sample, summing the gravity to obtain the negative discrimination score of the sample to the label, comparing the positive discrimination score with the negative discrimination score, and judging whether the sample belongs to a certain syndrome in the syndrome set Y.

Compared with the prior art, the invention has the following advantages and beneficial effects:

the syndrome differentiation method of the invention fully considers the correlation between symptoms and syndromes and the correlation between syndromes, can highlight the effect of symptoms which have great influence on the syndrome differentiation result, is beneficial to improving the accuracy of the syndrome differentiation result, has interpretability of the result, and can provide auxiliary decision for traditional Chinese medicine syndrome differentiation.

Drawings

The accompanying drawings, which are included to provide a further understanding of the embodiments of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the principles of the invention. In the drawings:

FIG. 1 is a block flow diagram of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to examples and accompanying drawings, and the exemplary embodiments and descriptions thereof are only used for explaining the present invention and are not meant to limit the present invention.

Example 1:

this example illustrates the "gastric Hiragana" disease as a subject.

As shown in fig. 1, a Mongolian medicine syndrome differentiation method includes the following steps:

s1, medical record data preprocessing:

mongolian medical diagnosis mainly includes three syndromes of Ba Da gan, Xila and He Yi, and aims at the diseases with two or three syndromes existing at the same time.

All the different symptoms in the medical data set are expressed as a set of symptoms F ═ F₁,…,f_k,…,f_d]Wherein f is_kRepresents the kth symptom in the symptom set, and d represents the number of different symptoms; all the different syndromes are expressed as syndrome set Y ═ Y₁,y₂,y₃}。

All cases are expressed as the set of cases X ═ X₁,…,X_i,…,X_n]Wherein X is_iIndicating the ith medical case in the medical case set, X_i＝{x_i1,…,x_ik,…,x_id}，x_ikA kth feature representing an ith case; x_iCorresponding syndrome vector Y_i＝[y_i1,y_i2,y_i3]And (4) showing. The symptoms and syndromes in each case are coded by 0-1, x_ik0 represents the absence of symptoms f in this case_k，x_ik1 represents the symptom f in this case_k(ii) a If X is_iHaving the syndrome y_jThen y is_ij1, otherwise y_ij0, where j ∈ 1,2, 3.

S2, feature-tag correlation analysis:

in the Mongolian medicine dialectical process, different characteristics have different influence degrees on dialectical results, i.e. the correlation between different characteristics and labels is different. The scheme constructs a feature-label correlation matrix M and measures the correlation between the features and the labels.

In the neighborhood of sample i, feature f_hThe correlation with the set of tags is expressed as

The calculation is as follows:

wherein, MIR_h(N_i) Representing the feature f in the neighborhood of sample i_hMutual information correlation with tags, APR_h(N_i) Representing the feature f in the neighborhood of sample i_hCorrelation with the average accuracy of the tags. N is a radical of_iExpressing the neighborhood of the sample i, calculating the distance between the sample i and other samples in the training set according to a heterogeneous overlapping Euclidean measurement method, sorting the distances according to an ascending order, selecting the first k samples, and obtaining the neighborhood N of the sample i_i＝(i₁,i₂,…,i_k) Where k denotes the number of neighbors of a sample i, the invention takes an empirical value, k being 10. The heterogeneous overlapping euclidean metric distance of sample i and sample j is defined as:

where F is a feature set, x_ifIs the f-th feature of sample i.

For discrete eigenvalues:

for the continuous characteristic value:

where, | | denotes an absolute value, max (f) and min (f) are the maximum and minimum values of the feature f, respectively.

MIR of formula (1)_h(N_i) The calculation method is as follows:

wherein g represents g different tags in the tag set (3 in this embodiment), and p (y)_j) Label y for representing neighborhood data set existence_jProbability of (d), NMI (f)_h,y_j) Representing a feature f_hAnd label y_jThe calculation method of the normalized mutual information is as follows:

wherein, H (f)_h) And H (y)_j) Respectively represent the feature f_hAnd a label y_jInformation entropy of (1), MI (f)_h；y_j) Representing a feature f_hAnd label y_jThe mutual information of (2). H (f)_h)、H(y_j) And MI (f)_h；y_j) The following are calculated respectively:

wherein t represents a feature f_hThere are t different values (2 in this case), p (f)_hq,y_j) Representing a feature f_hValue q and label y of_jProbability of occurrence together. p (f)_hq|y_j) Indicating label y_jIn the presence of a feature f_hThe probability of occurrence of the value q.

APR of formula (1)_h(N_i) The calculation is as follows: taking the average precision (Averageprecision) as an evaluation index, and taking the feature h of the sample in the neighborhood and the corresponding label setForming a new classification data set, completing five-fold cross validation by using a multi-label K-nearest neighbor algorithm (ML-KNN), and recording the obtained result as APR_h(N_i)。

The neighborhood feature-label correlation for sample i is calculated as follows:

the correlation matrix M ═ M of the neighborhood features of all samples and the labels can be obtained according to equations (1) - (10)₁,M₂,…M_n]，M_iRepresenting the neighborhood feature-label correlation of sample i.

S3, analyzing label correlation of samples

The labels of the samples have correlation, which is beneficial to improving the accuracy of the syndrome differentiation result. The present scheme uses the sample's label correlation matrix R to measure the label correlation of each sample.

Label y of sample i₁And a label y₂Correlation of (a) r_i1The calculation method is as follows:

wherein, count (y)₁＝1,y₂＝1,N_i) Neighborhood set N representing sample i_iIn while having a label y₁And a label y₂The number of samples.

Label y of sample i₂And a label y₃Correlation of (a) r_i2The calculation method is as follows:

wherein, count (y)₂＝1,y₃＝1,N_i) The neighbor set Ni representing the sample i has the label y at the same time₂And a label y₃The number of samples.

Label y of sample i₁And a label y₃Correlation of (a) r_i3The calculation method is as follows:

wherein, count (y)₁＝1,y₃＝1,N_i) The neighbor set Ni representing the sample i has the label y at the same time₁And a label y₃The number of samples.

Label correlation R for sample i_iThe calculation method is as follows:

R_i＝max(r_i1，r_i2，r_i3) (14)

where max (r)_i1，r_i2，r_i3) Represents taking r_i1，r_i2，r_i3Maximum value of (2).

The tag correlation matrix R ═ R for all samples can be obtained according to equations (11) to (14)₁,R₂,…,R_n]Wherein R is_iIndicating the tag correlation of sample i.

S4, calculation of interaction coefficient

Coefficient of interaction IC of design sample_iTo measure the magnitude of the interaction between sample i and other samples, the following is calculated:

IC_i＝αM_i+(1-α)R_i (15)

where α represents the balance parameter of the two correlations, which the present invention sets to 0.5. M_iRepresenting the neighborhood feature-label correlation of the sample i, and calculating R as shown in formula (10)_iThe label correlation of sample i is expressed, and the calculation is shown in equation (14).

S5, label prediction

(1) Calculation of gravitational forces between samples

The invention calculates the gravitation between samples by using the idea of classical universal gravitation, and the formula of the gravitation between a sample j and a sample i is as follows:

(2) generating a sample label

Nearest neighbor set N for sample i_iCalculating the gravity value and summing to obtain the positive discrimination score DS of the sample i to the label y, wherein the calculation is as follows:

nearest neighbor set N for sample i_iCalculating the gravity value and summing to obtain the negative discrimination score DS' of the sample i to the label y, wherein the calculation is as follows:

and comparing the sizes of the DS and the DS' to determine the dialectical result. If DS (i) > DS' (i), then sample i belongs to the y-th label, i.e., sample i has the y-th syndrome. If DS (i) ≦ DS' (i), then sample i does not belong to the yth label, i.e., sample i does not have the yth syndrome.

Example 2:

a Mongolian medical dialectical system, comprising:

The above-mentioned embodiments are intended to illustrate the objects, technical solutions and advantages of the present invention in further detail, and it should be understood that the above-mentioned embodiments are merely exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims

1. A construction method of a Mongolian medicine syndrome differentiation model is characterized by comprising the following steps:

s1, medical record data preprocessing:

s3, constructing a label correlation calculation model of the sample:

2. The method of claim 1, wherein in step S2, the calculation model of the correlation between each feature and the label set is represented by the following formula:

3. The method of claim 1, wherein in step S4, the interaction coefficient is calculated as follows:

IC_i＝αM_i+(1-α)R_i

in the formula, alpha represents a balance parameter of two correlations, M_iRepresenting neighborhood feature-label correlation of the sample i; r_iIndicating the tag correlation of sample i.

4. The method as claimed in claim 1, wherein in step S5,

the gravity calculation model is shown as follows:

in the formula, IC_jIs the coefficient of interaction between sample j and other samples, d_F(i, j) is the heterogeneous overlapping euclidean metric distance between sample i and sample j.

5. The method for constructing a Mongolian medicine dialectical model according to any one of claims 1 to 4, wherein in step S1, the encoding with 0-1 is specifically:

6. A Mongolian medicine syndrome differentiation method is characterized by comprising the following steps:

s1, medical record data preprocessing:

s2, feature-tag correlation analysis:

s3, analyzing label correlation of the sample:

s4, calculation of interaction coefficient:

s5, label prediction:

7. The Mongolian medicine dialectical method according to claim 6,

and comparing the positive discrimination score with the negative discrimination score, wherein if the positive discrimination score is greater than the negative discrimination score, the syndrome exists in the sample, and if the positive discrimination score is less than the negative discrimination score, the syndrome does not exist in the sample.

8. A Mongolian medicine dialectical system, comprising: