CN109087702B

CN109087702B - A four-diagnosis representation information fusion method for TCM health status analysis

Info

Publication number: CN109087702B
Application number: CN201810878380.7A
Authority: CN
Inventors: 代亮; 张佳; 林达真; 曹冬林; 李绍滋; 林旺庆
Original assignee: Xiamen University
Current assignee: Xiamen University
Priority date: 2018-08-03
Filing date: 2018-08-03
Publication date: 2021-07-16
Anticipated expiration: 2038-08-03
Also published as: CN109087702A

Abstract

The four-diagnosis representation information fusion method for TCM health status analysis collects information such as sight, smell, questioning, and diagnosis of clinical patients, and is used to generate multi-source information representation of patients, and mark the syndrome category to which they belong; The characteristic representation of each information source and its category information respectively analyze the health status of the tester, and obtain the auxiliary decision-making information for the tester from multiple information sources; build an information fusion model to maximize the consistency of decision-making, which is used to return the optimal health status. Status analysis results; evaluate the performance of the proposed algorithm by comparing the tester's actual health status with the corresponding predicted results. It can detect the tester's current health status and the nature of the disease, so that the tester can understand his physical condition and provide a reference for the formulation of intervention plans. It can provide high-precision health status analysis results and provide a basis for health care. It can integrate the four-diagnosis characterization information of clinical patients to obtain more accurate and reliable state analysis results.

Description

A four-diagnosis representation information fusion method for TCM health status analysis

技术领域technical field

本发明涉及多标记学习，尤其是涉及用于中医健康状态分析的四诊表征信息融合方法。The invention relates to multi-label learning, in particular to a four-diagnosis representation information fusion method for TCM health state analysis.

背景技术Background technique

状态是中医健康认知理论的逻辑起点，健康状态是指人体单位时间内形态结构、生理功能、心理状态、适应外界环境能力的综合状态，体现的是健康的状况和态势。健康状态分析是以中医学理论为依据，将采集的望、闻、问、切等信息用数据形式表达，强调客观地评价人体健康状态和病变本质，并对所患病、证给出概括性判断(李灿东.中医状态学[M].北京：中国中医药出版社，2016)。State is the logical starting point of TCM health cognition theory. Health state refers to the comprehensive state of the human body's morphological structure, physiological function, psychological state, and ability to adapt to the external environment within a unit of time, which reflects the state and situation of health. Health status analysis is based on the theory of traditional Chinese medicine. It expresses the collected information such as sight, smell, questioning, and incision in the form of data, emphasizing the objective evaluation of human health status and the nature of the disease, and giving a generalization of the disease and syndrome. Judgment (Li Candong. State of Traditional Chinese Medicine [M]. Beijing: China Traditional Chinese Medicine Press, 2016).

多标记学习技术用于处理真实世界中具有多义性的对象，在图像自动标注、生物信息学、信息检索以及推荐系统等领域得到了广泛关注和应用。具体地，临床就诊患者的证型分布往往多状态兼挟。故而，立足于人工智能技术解决中医健康状态分析问题，多标记学习技术引入到中医健康状态分析中来。Multi-label learning techniques are used to deal with objects with ambiguity in the real world, and have received extensive attention and applications in the fields of automatic image annotation, bioinformatics, information retrieval, and recommender systems. Specifically, the distribution of syndrome types in clinical patients is often multi-state. Therefore, based on artificial intelligence technology to solve the problem of TCM health status analysis, multi-label learning technology is introduced into TCM health status analysis.

按照中医“四诊合参”的原则，状态分析是建立在四诊信息的基础上。考虑到不同信息源对于预测的贡献程度具有差异性，且不同信息源之间相互关联，那么通过四诊方法收集临床就诊患者的整体信息，进而构建信息融合模型用以分析该患者所处健康状态。According to the principle of "four diagnostics combined with reference" in traditional Chinese medicine, state analysis is based on the information of four diagnostics. Considering that the contribution of different information sources to the prediction is different, and the different information sources are related to each other, the overall information of clinical patients is collected through the four-diagnosis method, and then an information fusion model is constructed to analyze the health status of the patient. .

中医健康大数据呈现多模态性与多标记性等特征，使得传统的数据分析理论、方法与技术面临有效性、准确性与可计算性等严峻挑战。因此，研究用于中医健康状态分析的四诊表征信息融合方法，有利于构建更为准确可靠的辨识模型，有利于发挥人工智能技术的优势促进交叉学科共同发展和繁荣。TCM health big data presents the characteristics of multi-modality and multi-marker, which makes traditional data analysis theories, methods and technologies face severe challenges such as validity, accuracy and computability. Therefore, studying the information fusion method of the four diagnostics for TCM health status analysis is conducive to the construction of a more accurate and reliable identification model, and is conducive to giving full play to the advantages of artificial intelligence technology to promote the common development and prosperity of interdisciplinary subjects.

发明内容SUMMARY OF THE INVENTION

本发明的目的在于针对临床就诊患者多状态兼挟，且诊断信息的来源具有多样性，提供用于中医健康状态分析的四诊表征信息融合方法。The purpose of the present invention is to provide a four-diagnosis representation information fusion method for TCM health state analysis, aiming at the multi-state coexistence of clinical patients and the diverse sources of diagnostic information.

本发明包括以下步骤：The present invention includes the following steps:

1)采集临床就诊患者的望、闻、问、切等信息，用于生成病人的多源信息表示，并标注其隶属的证型类别；1) Collect information such as seeing, smelling, asking, cutting and other information of clinical patients, which is used to generate multi-source information representation of patients, and mark the syndrome type to which they belong;

2)利用每个信息源的特征表征及其类别信息分别对测试者的健康状态进行分析，得到多个信息源对测试者的辅助决策信息；2) Use the characteristic representation of each information source and its category information to analyze the tester's health status respectively, and obtain the auxiliary decision-making information for the tester from multiple information sources;

3)构建信息融合模型使得决策一致性最大化，用于返回优化的健康状态分析结果；3) Build an information fusion model to maximize the consistency of decision-making, which is used to return the optimized health state analysis results;

4)对比测试者的实际健康状态与相应的预测结果来评价所提算法的性能。4) The performance of the proposed algorithm is evaluated by comparing the tester's actual health status with the corresponding predicted results.

在步骤1)中，所述采集临床就诊患者的望、闻、问、切等信息，用于生成病人的多源信息表示，并标注其隶属的证型类别的具体方法可为：In step 1), the described collection of information such as sight, smell, questioning, and incision of clinical patients is used to generate a multi-source information representation of the patient, and the specific method of marking the syndrome category to which it belongs can be:

(1)从电子病历中提取临床就诊患者的四诊表征信息，组成信息源A；利用望诊仪得到患者舌象，基于U-Net网络模型实现舌象分割，然后采用HSV、LAB和RGB描述算子获取舌象多个特征表示，分别组成信息源B、信息源C和信息源D；(1) Extract the four-diagnosis representation information of clinical patients from the electronic medical records to form information source A; obtain the patient's tongue image by using the inspection instrument, realize the tongue image segmentation based on the U-Net network model, and then use HSV, LAB and RGB to describe The operator obtains multiple feature representations of the tongue image, and forms information source B, information source C and information source D respectively;

(2)医生对临床就诊患者的健康状态进行标记，记为{l₁,l₂,...,l_q},1≤j≤q，其中l_j为临床就诊患者的第j个证型，q为类别标记的总数；(2) The doctor marks the health status of the clinically treated patients as {l ₁ ,l ₂ ,...,l _q }, 1≤j≤q, where l _j is the jth syndrome of the clinically treated patient , q is the total number of category labels;

(3)采用十折交叉验证方法对算法进行验证：将处理好的标准化数据按照9︰1的比例进行划分，分为训练数据和测试数据。(3) The algorithm is verified by the ten-fold cross-validation method: the processed standardized data is divided into training data and test data according to the ratio of 9:1.

在步骤2)中，所述利用每个信息源的特征表征及其类别信息分别对测试者的健康状态进行分析，得到多个信息源对测试者的辅助决策信息的具体方法可为：In step 2), the characteristic representation of each information source and its category information are used to analyze the health status of the tester respectively, and the specific method for obtaining the auxiliary decision-making information for the tester from multiple information sources may be:

(1)采用SVM预测测试者的健康状态，计算公式为：(1) Using SVM to predict the health status of the tester, the calculation formula is:

其中，

表示在数据源A上第i个测试者关于第j个证型的预测结果，

表示第i个测试者在数据源A上的特征表征信息；in,

represents the prediction result of the i-th tester on the j-th syndrome on the data source A,

Represents the feature representation information of the i-th tester on data source A;

(2)联合特征表征和相应的预测信息在训练集中搜寻测试者的Top-k个近邻，近邻选择基于测试者与训练样本的相似性关系，计算公式为：(2) Combine the feature representation and the corresponding prediction information to search the top-k neighbors of the tester in the training set. The selection of the neighbors is based on the similarity between the tester and the training samples. The calculation formula is:

其中，

包含测试者与训练样本在特征空间上的相似度，用余弦相似性方法计算得到；

由杰卡德相似性方法求得，包含测试者与训练样本在标记空间上的相似度；β为阈值，其取值范围为[0,1]；in,

Contains the similarity between the tester and the training sample in the feature space, calculated by the cosine similarity method;

Obtained by the Jaccard similarity method, including the similarity between the tester and the training sample in the label space; β is the threshold, and its value range is [0,1];

(3)利用相似性关系sim^A对证型之间的相关性建模来重构测试者的标记空间：(3) Use the similarity relationship sim ^A to model the correlation between the syndromes to reconstruct the tester's labeling space:

其中，

表示在数据源A上第i个测试者关于第j个证型的状态分析结果，Y_zj表示第i个测试者的第z个近邻在第j个证型上的实际值；in,

Represents the state analysis result of the i-th tester on the j-th pattern on the data source A, and Y _zj represents the actual value of the i-th tester's z-th neighbor on the j-th pattern;

(4)重复步骤(1)～(3)，分别得到基于信息源B～D的状态分析结果。(4) Repeat steps (1) to (3) to obtain state analysis results based on information sources B to D, respectively.

在步骤3)中，所述构建信息融合模型使得决策一致性最大化，用于返回优化的健康状态分析结果的具体方法可为：In step 3), the information fusion model is constructed to maximize decision consistency, and the specific method for returning the optimized health state analysis result may be:

(1)利用临床就诊患者四诊表征信息预测的多个状态结果来获取测试者最终的结果，构建以下优化目标函数进行求解：(1) Use multiple state results predicted by the four-diagnosis representation information of clinical patients to obtain the final result of the tester, and construct the following optimization objective function to solve:

其中，

表示第i个测试者在第j个证型上的优化结果，该优化结果通过融合多源的决策信息得到，W＝{w₁,w₂,...,w_M}为M个信息源的权重分布，其中M＝4，

另外，c_m表示(i,j)的集合，且(i,j)满足

α为阈值，其取值范围为[0,1]；in,

represents the optimization result of the i-th tester on the j-th card type, which is obtained by fusing multi-source decision-making information, W={w ₁ ,w ₂ ,...,w _M } is M information sources The weight distribution of , where M=4,

In addition, cm represents the set of (i, _j ), and (i, j) satisfies

α is the threshold, and its value range is [0,1];

(2)初始化权重，令

设置：

(2) Initialize the weights, let

set up:

(3)固定W，利用梯度下降法求解Y^*，计算公式为：(3) Fix W, use the gradient descent method to solve Y ^* , the calculation formula is:

(4)固定Y^*，利用拉格朗日乘子法求解W，计算公式为：(4) Fix Y ^* , use the Lagrange multiplier method to solve W, and the calculation formula is:

(5)重复步骤(3)和(4)，直到优化目标收敛，返回测试者健康状态的优化结果Y^*。(5) Steps (3) and (4) are repeated until the optimization objective converges, and the optimization result Y ^* of the tester's health state is returned.

在步骤4)中，所述对比测试者的实际健康状态与相应的预测结果来评价所提算法的性能的具体方法可为：In step 4), the specific method for evaluating the performance of the proposed algorithm by comparing the actual health state of the tester with the corresponding prediction result may be:

利用所提方法对测试数据中测试者的类别标记进行预测，并采用以下五个指标对所提算法的性能进行评价：The proposed method is used to predict the class labels of testers in the test data, and the following five indicators are used to evaluate the performance of the proposed algorithm:

(1)汉明损失：用于考察样本在单个标记上的误分类情况，该评价指标越小越好；(1) Hamming loss: used to examine the misclassification of the sample on a single marker, the smaller the evaluation index, the better;

(2)1-错误率：用于考察在样本的类别标记排序序列中，序列最前端的标记不属于相关标记集合的情况，该评价指标越小越好；(2) 1-Error rate: It is used to investigate the situation in which the label at the front end of the sequence does not belong to the relevant label set in the class label sorting sequence of the sample. The smaller the evaluation index, the better;

(3)覆盖率：用于考察在样本的类别标记排序序列中，覆盖所有相关标记所需的搜索深度情况，该评价指标越小越好；(3) Coverage rate: It is used to investigate the search depth required to cover all relevant tags in the category tag sorting sequence of the sample. The smaller the evaluation index, the better;

(4)排序损失：用于考察在样本的类别标记排序序列中出现排序错误的情况，该评价指标越小越好；(4) Sorting loss: It is used to investigate the case of sorting errors in the sorting sequence of the class labels of the samples. The smaller the evaluation index, the better;

(5)平均精度：用于考察在样本的类别标记排序序列中，排在相关标记之前的标记仍为相关标记的情况，该评价指标越大越好。(5) Average precision: It is used to investigate the situation where the tags before the relevant tags are still relevant tags in the category tag sorting sequence of the samples. The larger the evaluation index, the better.

与现有技术相比，本发明能够检测出测试者当前的健康状态和病变本质，使得测试者能够明了自身的体质状况，为制定干预方案提供参考。Compared with the prior art, the present invention can detect the current health state and lesion nature of the tester, so that the tester can understand his own physical condition and provide a reference for formulating an intervention plan.

本发明能够提供高精度的健康状态分析结果，为健康保健提供依据。The present invention can provide high-precision health state analysis results and provide basis for health care.

本发明能够融合临床就诊患者的四诊表征信息，从而获得更加准确可靠的状态分析结果。The invention can integrate the four-diagnosis representation information of the clinical patients, so as to obtain more accurate and reliable state analysis results.

附图说明Description of drawings

图1为舌象分割的示意图。Figure 1 is a schematic diagram of tongue image segmentation.

具体实施方式Detailed ways

为了使本发明的目的、技术方案及优点更加清楚明白，以下结合附图及实施例，对本发明进行进一步详细说明。应当理解，此处所描述的具体实施例仅仅用以解释本发明，并不用于限定本发明。In order to make the objectives, technical solutions and advantages of the present invention clearer, the present invention will be further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the present invention, but not to limit the present invention.

本实施例包括如下步骤：This embodiment includes the following steps:

1)采集729例临床就诊患者的望、闻、问、切等信息，用于生成病人的多源信息表示，并标注其隶属的证型类别，共计339个证型类别数；1) Collect information such as sight, smell, question, and cut of 729 clinical patients, which are used to generate multi-source information representation of patients, and mark the syndrome category to which they belong, with a total of 339 syndrome categories;

(1)从电子病历中提取临床就诊患者的四诊表征信息，组成信息源A；利用望诊仪得到患者舌象，基于U-Net网络模型实现舌象分割，如图1所示。然后采用HSV、LAB和RGB描述算子获取舌象多个特征表示，分别组成信息源B、信息源C和信息源D；(1) Extract the four-diagnosis characteristic information of clinical patients from the electronic medical records to form an information source A; obtain the patient's tongue image by using the inspection instrument, and realize the tongue image segmentation based on the U-Net network model, as shown in Figure 1. Then, HSV, LAB and RGB descriptors are used to obtain multiple feature representations of the tongue image, which form information source B, information source C and information source D respectively;

(2)医生对临床就诊患者的健康状态进行标记，记为{l₁,l₂,...,l_q}(1≤j≤q)。其中l_j为临床就诊患者的第j个证型，q为类别标记的总数；(2) The doctor marks the health status of the clinical patient, which is marked as {l ₁ ,l ₂ ,...,l _q }(1≤j≤q). Among them, l _j is the jth syndrome type of the clinical patient, and q is the total number of category markers;

2)利用每个信息源的特征表征及其类别信息分别分析测试者的健康状态，得到多个信息源对测试者的辅助决策信息；2) Use the characteristic representation of each information source and its category information to analyze the health status of the tester respectively, and obtain the auxiliary decision-making information for the tester from multiple information sources;

(1)采用SVM预测测试者健康状态，计算公式为：(1) Using SVM to predict the health status of the tester, the calculation formula is:

其中，

表示在数据源A上第i个测试者关于第j个证型的预测结果，

表示第i个测试者在数据源A上的特征表征信息；in,

(2)联合特征表征和相应的预测信息在训练集中搜寻测试者的Top-k个近邻。近邻选择基于测试者与训练样本的相似性关系，计算公式为：(2) Combine the feature representation and the corresponding prediction information to search the top-k neighbors of the tester in the training set. The selection of nearest neighbors is based on the similarity between testers and training samples, and the calculation formula is:

其中，

其中，

(4)将信息源A上的预测结果分别与BSVM(M.R.Boutell,J.Luo,X.Shen,C.M.Brown,Learning multi-label scene classification,Pattern Recognition,2004,37(9):1757–1771)和LIFT(M.Zhang,L.Wu,LIFT:Multi-label learning with label-specific features,IEEE Transactions on Pattern Analysis and MachineIntelligence,2015,37(1):107–120)方法进行比较，实验结果如表1所示。算法1对应的是本发明所提算法的验证结果；算法2对应的是LIFT的验证结果；算法3对应的是BSVM的验证结果。从表1中可以看出，本发明通过考虑标记相关性能在大部分的评价指标上好于其他算法。(4) Compare the prediction results on information source A with BSVM (M.R.Boutell,J.Luo,X.Shen,C.M.Brown,Learning multi-label scene classification,Pattern Recognition,2004,37(9):1757–1771) Compared with LIFT (M. Zhang, L. Wu, LIFT: Multi-label learning with label-specific features, IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(1): 107–120) method, the experimental results are shown in the table 1 shown. Algorithm 1 corresponds to the verification result of the algorithm proposed in the present invention; Algorithm 2 corresponds to the verification result of LIFT; Algorithm 3 corresponds to the verification result of BSVM. It can be seen from Table 1 that the present invention is better than other algorithms in most of the evaluation indexes by considering the tag-related performance.

表1Table 1

算法algorithm 汉明损失Hamming loss 1-错误率1 - Error rate 覆盖率coverage 排序损失sorting loss 平均精度average precision 11 0.0122±0.00080.0122±0.0008 0.1701±0.03300.1701±0.0330 0.3251±0.04350.3251±0.0435 0.0617±0.00810.0617±0.0081 0.6405±0.02650.6405±0.0265 22 0.0133±0.00090.0133±0.0009 0.1990±0.03770.1990±0.0377 0.4168±0.04080.4168±0.0408 0.0882±0.00850.0882±0.0085 0.5910±0.02600.5910±0.0260 33 0.0123±0.00090.0123±0.0009 0.1358±0.02630.1358±0.0263 0.3453±0.04360.3453±0.0436 0.0665±0.00760.0665±0.0076 0.6355±0.01770.6355±0.0177

(5)重复步骤(1)～(3)，分别得到基于信息源B～D的健康状态分析结果。(5) Steps (1) to (3) are repeated to obtain the health state analysis results based on the information sources B to D, respectively.

(1)利用临床就诊患者四诊表征信息预测的多个状态辨识结果来获取测试者最终的结果，构建以下优化目标函数进行求解：(1) Use multiple state identification results predicted by the four-diagnosis characterization information of clinical patients to obtain the final result of the tester, and construct the following optimization objective function to solve:

其中，

表示第i个测试者在第j个证型上的优化结果，该结果通过融合多源的决策信息得到。W＝{w₁,w₂,...,w_M}为M个信息源的权重分布(这里M＝4)，

另外，c_m表示(i,j)的集合，且(i,j)满足

α为阈值，其取值范围为[0,1]；in,

Represents the optimization result of the i-th tester on the j-th card type, which is obtained by fusing multi-source decision-making information. W={w ₁ ,w ₂ ,...,w _M } is the weight distribution of M information sources (here M=4),

In addition, cm represents the set of (i, _j ), and (i, j) satisfies

α is the threshold, and its value range is [0,1];

(2)初始化权重。令

设置：

(2) Initialize the weights. make

set up:

(5)重复步骤(3)～(4)，直到优化目标收敛，返回测试者健康状态的优化结果Y^*。(5) Steps (3) to (4) are repeated until the optimization objective converges, and the optimization result Y ^* of the tester's health state is returned.

4)利用所提方法对测试数据中测试者的健康状态进行分析；4) Use the proposed method to analyze the tester's health status in the test data;

将所提算法与每个信息源的预测结果进行比较，如表2所示。从表2可以看出，所提算法通过融合信息源A～D能在大部分评价指标上得到最优的结果。The proposed algorithm is compared with the prediction results of each information source, as shown in Table 2. It can be seen from Table 2 that the proposed algorithm can obtain the best results on most of the evaluation indicators by fusing the information sources A to D.

表2Table 2

汉明损失Hamming loss 1-错误率1 - Error rate 覆盖率coverage 排序损失sorting loss 平均精度average precision 信息源AInformation source A 0.0122±0.00100.0122±0.0010 0.1701±0.03300.1701±0.0330 0.3251±0.04350.3251±0.0435 0.0617±0.00810.0617±0.0081 0.6405±0.02650.6405±0.0265 信息源BInformation source B 0.0180±0.00140.0180±0.0014 0.5624±0.06570.5624±0.0657 0.4090±0.04430.4090±0.0443 0.0978±0.00710.0978±0.0071 0.3536±0.02030.3536±0.0203 信息源CInformation source C 0.0181±0.00150.0181±0.0015 0.6090±0.04700.6090±0.0470 0.4122±0.04120.4122±0.0412 0.0982±0.00580.0982±0.0058 0.3447±0.01600.3447±0.0160 信息源Dinformation source D. 0.0181±0.00150.0181±0.0015 0.5968±0.05950.5968±0.0595 0.4075±0.04130.4075±0.0413 0.0968±0.00830.0968±0.0083 0.3516±0.02410.3516±0.0241 信息融合information fusion 0.0118±0.00110.0118±0.0011 0.1604±0.02720.1604±0.0272 0.3328±0.06340.3328±0.0634 0.0637±0.01080.0637±0.0108 0.6473±0.01910.6473±0.0191

将所提算法与其他融合算法进行比较，如表3所示。算法1对应的是本发明所提算法的验证结果；算法2对应的基于所有信息源预测的平均结果；算法3对应的基于所有信息源预测的投票结果，算法4将所有信息源进行串联，然后利用SVM进行分类。从表3中可以看出，本发明所提算法具有最优的结果。The proposed algorithm is compared with other fusion algorithms, as shown in Table 3. Algorithm 1 corresponds to the verification result of the algorithm proposed in the present invention; Algorithm 2 corresponds to the average result predicted based on all information sources; Algorithm 3 corresponds to the voting result predicted based on all information sources, Algorithm 4 concatenates all information sources, and then Classification using SVM. It can be seen from Table 3 that the algorithm proposed in the present invention has the best results.

表3table 3

算法algorithm 汉明损失Hamming loss 1-错误率1 - Error rate 覆盖率coverage 排序损失sorting loss 平均精度average precision 11 0.0118±0.00110.0118±0.0011 0.1604±0.02720.1604±0.0272 0.3328±0.06340.3328±0.0634 0.0637±0.01080.0637±0.0108 0.6473±0.01910.6473±0.0191 22 0.0161±0.00190.0161±0.0019 0.2386±0.03500.2386±0.0350 0.3716±0.05780.3716±0.0578 0.0778±0.01020.0778±0.0102 0.5222±0.01900.5222±0.0190 33 0.0177±0.00200.0177±0.0020 0.4073±0.06210.4073±0.0621 0.3715±0.05780.3715±0.0578 0.0803±0.01010.0803±0.0101 0.4422±0.02420.4422±0.0242 44 0.0123±0.00070.0123±0.0007 0.1427±0.04090.1427±0.0409 0.3473±0.04270.3473±0.0427 0.0673±0.00830.0673±0.0083 0.6343±0.01580.6343±0.0158

本发明首先对四诊采集仪捕获的信息进行预处理，然后分别分析每个信息源的预测结果来判断测试者的健康状态，最后融合多个特征表征信息的预测结果使得状态辨识的一致性最大化，从而为测试者制定干预方案提供准确可靠的参考。The present invention first preprocesses the information captured by the four-diagnosis acquisition instrument, then analyzes the prediction results of each information source separately to judge the health state of the tester, and finally fuses the prediction results of multiple feature representation information to maximize the consistency of state identification Therefore, it can provide an accurate and reliable reference for testers to formulate intervention plans.

Claims

1. the four-diagnosis representation information fusion method used for TCM health state analysis, is characterized in that, comprises the following steps:

1) Collect the information of seeing, smelling, asking, and cutting the clinical patients, which is used to generate the multi-source information representation of the patients, and mark the syndrome type to which they belong;

2) Use the characteristic representation of each information source and its category information to analyze the tester's health state respectively, and obtain the auxiliary decision-making information for the tester from multiple information sources. The specific method is as follows:

(1) Using SVM to predict the health status of the tester, the calculation formula is:

in,

represents the prediction result of the i-th tester on the j-th syndrome on the information source A,

Represents the characteristic representation information of the i-th tester on the information source A;

(2) Combine the feature representation and the corresponding prediction information to search the top-k neighbors of the tester in the training set. The selection of the neighbors is based on the similarity between the tester and the training samples. The calculation formula is:

in,

(3) Use the similarity relationship sim ^A to model the correlation between the syndromes to reconstruct the tester's labeling space:

in,

Represents the state analysis result of the i-th tester on the j-th pattern on the information source A, and Y _zj represents the actual value of the i-th tester's z-th neighbor on the j-th pattern;

(4) Repeat steps (1) to (3) to obtain state analysis results based on information sources B to D respectively;

3) Build an information fusion model to maximize the consistency of decision-making, which is used to return the optimized health state analysis results. The specific methods are:

(1) Use multiple state results predicted by the four-diagnosis representation information of clinical patients to obtain the final result of the tester, and construct the following optimization objective function to solve:

in,

Represents the optimization result of the i-th tester on the j-th card type, which is obtained by fusing multi-source decision-making information;

Represents the state analysis result of the i-th tester on the j-th syndrome on the information source m; W={w ₁ ,w ₂ ,...,w _M } is the weight distribution of the M information sources, where M= 4,

In addition, cm represents the set of (i, _j ), and (i, j) satisfies

α is the threshold, and its value range is [0,1];

(2) Initialize the weights, let

set up:

(3) Fix W, use the gradient descent method to solve Y ^* , the calculation formula is:

(4) Fix Y ^* , use the Lagrange multiplier method to solve W, and the calculation formula is:

(5) Repeat steps (3) and (4) until the optimization objective converges, and return the optimization result Y ^* of the tester's health state;

4) The performance of the proposed algorithm is evaluated by comparing the tester's actual health status with the corresponding predicted results.

2. as claimed in claim 1, it is characterized in that, in step 1), in the four-diagnosis representation information fusion method that is used for Chinese medicine health state analysis, it is characterized in that, in step 1), described gathering the sight, smell, question, cut information of clinical patient, for The specific method for generating the multi-source information representation of the patient and marking the syndrome category to which it belongs is as follows:

(1) Extract the four-diagnosis representation information of clinical patients from the electronic medical records to form information source A; obtain the patient's tongue image by using the inspection instrument, realize the tongue image segmentation based on the U-Net network model, and then use HSV, LAB and RGB to describe The operator obtains multiple feature representations of the tongue image, and forms information source B, information source C and information source D respectively;

(2) The doctor marks the health status of the clinically treated patients as {l ₁ ,l ₂ ,...,l _q }, 1≤j≤q, where l _j is the jth syndrome of the clinically treated patient , q is the total number of category labels;

(3) The algorithm is verified by the ten-fold cross-validation method: the processed standardized data is divided into training data and test data according to the ratio of 9:1.

3. the four-diagnosis representation information fusion method that is used for TCM health state analysis as claimed in claim 1, it is characterized in that, in step 4) in, described contrast tester's actual health state and corresponding prediction result to evaluate the proposed. The specific method for the performance of the algorithm is:

The proposed algorithm is used to predict the category labels of testers in the test data, and the following five indicators are used to evaluate the performance of the proposed algorithm:

(1) Hamming loss: used to examine the misclassification of the sample on a single marker, the smaller the evaluation index, the better;

(2) 1-Error rate: It is used to investigate the situation in which the label at the front end of the sequence does not belong to the relevant label set in the class label sorting sequence of the sample. The smaller the evaluation index, the better;

(3) Coverage rate: It is used to investigate the search depth required to cover all relevant tags in the category tag sorting sequence of the sample. The smaller the evaluation index, the better;

(4) Sorting loss: It is used to investigate the case of sorting errors in the sorting sequence of the class labels of the samples. The smaller the evaluation index, the better;

(5) Average precision: It is used to investigate the situation where the tags before the relevant tags are still relevant tags in the category tag sorting sequence of the samples. The larger the evaluation index, the better.