WO2021169203A1 - Monogenic disease name recommendation method and system based on multi-level structural similarity - Google Patents

Monogenic disease name recommendation method and system based on multi-level structural similarity Download PDF

Info

Publication number
WO2021169203A1
WO2021169203A1 PCT/CN2020/111130 CN2020111130W WO2021169203A1 WO 2021169203 A1 WO2021169203 A1 WO 2021169203A1 CN 2020111130 W CN2020111130 W CN 2020111130W WO 2021169203 A1 WO2021169203 A1 WO 2021169203A1
Authority
WO
WIPO (PCT)
Prior art keywords
feature
clinical
clinical feature
standard clinical
feature set
Prior art date
Application number
PCT/CN2020/111130
Other languages
French (fr)
Chinese (zh)
Inventor
马旭
曹宗富
陈翠霞
喻浴飞
蔡瑞琨
李乾
罗敏娜
Original Assignee
国家卫生健康委科学技术研究所
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 国家卫生健康委科学技术研究所 filed Critical 国家卫生健康委科学技术研究所
Publication of WO2021169203A1 publication Critical patent/WO2021169203A1/en

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/70ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients

Definitions

  • the present invention relates to the field of medical information technology, in particular to a method and system for recommending names of single-gene diseases based on multi-level structural similarity.
  • the genetic pattern of single-gene diseases is diversified. Even the same single-gene disease may show different inheritance patterns, and different single-gene diseases may also show the same inheritance pattern.
  • the method of matching the best standard clinical feature corresponding to each clinical feature in feature set I from feature set A based on the node labels on the standardized clinical feature phenotype tree includes:
  • the same stem node is B t .
  • the calculation method is: all nodes in the connecting path between I i and B t form a directed set IB, the number of elements in the directed set IB is denoted as N IB , the directed set

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Public Health (AREA)
  • Biomedical Technology (AREA)
  • Databases & Information Systems (AREA)
  • Pathology (AREA)
  • Epidemiology (AREA)
  • General Health & Medical Sciences (AREA)
  • Primary Health Care (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A monogenic disease name recommendation method and system based on multi-level structural similarity, being capable of intelligently and accurately recommending a matched monogenic disease name. The method comprises: constructing a standardized clinical feature phenotype tree of a monogenic disease according to a feature relational database of monogenic disease names; labeling, on the nodes of the phenotype tree, clinical features in a feature set I inputted by a user; traversing an nth monogenic disease name in the feature relational database, and on the node of the phenotype tree, labeling a standard clinical feature in a feature set A corresponding to the nth monogenic disease name; matching, from the feature set A, an optimal standard clinical feature corresponding to each clinical feature in the feature set I; calculating a set similarity value between the feature set I and the current feature set A; and in order to make n be equal to n plus 1, re-traversing the feature relational database until all the monogenic disease names in the feature relational database are traversed, summarizing and sorting a set similarity value corresponding to the feature set I and each feature set A, and outputting the monogenic disease name corresponding to a maximum similarity value.

Description

基于多层级结构相似度的单基因病名称推荐方法和系统Single gene disease name recommendation method and system based on multi-level structure similarity 技术领域Technical field
本发明涉及医学信息技术领域,尤其涉及一种基于多层级结构相似度的单基因病名称推荐方法和系统。The present invention relates to the field of medical information technology, in particular to a method and system for recommending names of single-gene diseases based on multi-level structural similarity.
背景技术Background technique
单基因病是一种常见疾病,它是由一对等位基因突变导致的疾病,又称孟德尔式遗传病,其特点如下:Monogenic disease is a common disease. It is a disease caused by a pair of allele mutations, also known as Mendelian genetic disease. Its characteristics are as follows:
1、单基因病种类繁多,目前已发现的单基因病有8000种以上;1. There are many types of monogenic diseases, and more than 8,000 monogenic diseases have been discovered;
2、单基因病表型复杂,同一种单基因病表型异质性强,存在不同单基因病之间临床特征相互重叠的现象;2. The phenotype of single-gene disease is complex, and the phenotype of the same single-gene disease is highly heterogeneous, and there is a phenomenon that the clinical characteristics of different single-gene diseases overlap with each other;
3、单基因病遗传模式多样化,即使同一种单基因病,也可能表现为不同的遗传模式,不同的单基因病也可表现为相同的遗传模式。3. The genetic pattern of single-gene diseases is diversified. Even the same single-gene disease may show different inheritance patterns, and different single-gene diseases may also show the same inheritance pattern.
4、大部分单基因病发病率很低,较为罕见。4. The incidence of most monogenic diseases is very low and relatively rare.
这些复杂因素使得临床医生很难对所有的单基因病表型都了解,给单基因病临床诊疗带来了极大的困难。现有技术有通过建立单基因病与临床特征中文数据库,在此基础上,根据患者临床特征对可能的单基因病进行推荐,并提供便利的辅助诊断工具,为临床医生提供诊断线索,进而提高临床医生诊断的正确率,降低漏诊和误诊发生概率。具体为,基于用户输入的病例特征和标准化表型,利用Elestic相似度和Fisher精确检验富集分析方法对单基因病名称进行推荐,其中,Elestic相似度是对输入文本的相似度测量,不能考虑关键字词的含义,譬如“少汗症”和“多汗症”,可能推荐出表型相反的疾病名称排在最前面的情形,Fisher精确检验弊端在于,结果准确性严重依赖于输入的表型是否准确,由于单基因病的表型复杂性,医生很难保证输入的表型就是疾病的标准化表型,若输入的是近似表型,可能导致推荐结果出现误差。These complex factors make it difficult for clinicians to understand all the phenotypes of monogenic diseases, and bring great difficulties to the clinical diagnosis and treatment of monogenic diseases. Existing technologies have established a Chinese database of monogenic diseases and clinical characteristics. On this basis, it recommends possible monogenic diseases based on the clinical characteristics of patients, and provides convenient auxiliary diagnostic tools to provide clinicians with diagnostic clues, thereby improving The correct rate of diagnosis by clinicians reduces the probability of missed diagnosis and misdiagnosis. Specifically, based on the case characteristics and standardized phenotypes entered by the user, the Elestic similarity and Fisher exact test enrichment analysis methods are used to recommend the names of single-gene diseases. Among them, the Elestic similarity is a measure of the similarity of the input text and cannot be considered The meaning of keyword words, such as "hypohidrosis" and "hyperhidrosis", may suggest that the disease names with the opposite phenotype are ranked first. The disadvantage of Fisher's exact test is that the accuracy of the results depends heavily on the input table. Whether the type is accurate, due to the complexity of the phenotype of a single gene disease, it is difficult for doctors to guarantee that the input phenotype is the standardized phenotype of the disease. If the input is an approximate phenotype, it may cause errors in the recommended results.
发明内容Summary of the invention
本发明的目的在于提供一种基于多层级结构相似度的单基因病名称推荐方法和系统,减少对医生的输入限制要求,智能精准的推荐出所匹配的单基因病名称。The purpose of the present invention is to provide a single gene disease name recommendation method and system based on multi-level structural similarity, which reduces the input restriction requirements for doctors, and intelligently and accurately recommends the matched single gene disease name.
为了实现上述目的,本发明的一方面提供一种基于多层级结构相似度的单基因病名称推荐方法,包括:In order to achieve the above objectives, one aspect of the present invention provides a method for recommending names of single-gene diseases based on multi-level structural similarity, including:
根据单基因病名称的特征关系数据库,构建单基因病的标准化临床特征表型树;Construct a standardized clinical feature phenotype tree of monogenic diseases based on the characteristic relational database of the names of monogenic diseases;
将用户输入的特征集合I中的临床特征在标准化临床特征表型树上的节点标记;Mark the nodes of the clinical features in the feature set I input by the user on the standardized clinical feature phenotype tree;
遍历特征关系数据库中的第n个单基因病名称,将其对应特征集合A中的标准临床特征在标准化临床特征表型树上的节点标记,所述n的初始值为1;Traverse the name of the nth monogenic disease in the feature relational database, and mark the node of the standard clinical feature in the corresponding feature set A on the standardized clinical feature phenotype tree, and the initial value of n is 1;
基于标准化临床特征表型树上的节点标记,从特征集合A中匹配出与特征集合I中每个临床特征对应的最佳标准临床特征;Based on the node labels on the standardized clinical feature phenotype tree, the best standard clinical feature corresponding to each clinical feature in feature set I is matched from feature set A;
根据每个临床特征与对应的最佳标准临床特征的相似度值,计算出特征集合I与当前特征集合A的集合相似度值;According to the similarity value between each clinical feature and the corresponding best standard clinical feature, calculate the set similarity value between feature set I and current feature set A;
令n=n+1重新遍历特征关系数据库中的第n个单基因病名称,直至特征关系数据库中的单基因病名称遍历完毕,将特征集合I与每个特征集合A对应的集合相似度值汇总排序,输出最高相似度值对应的单基因病名称。Let n=n+1 traverse the name of the nth monogenic disease in the feature relational database again, until the monogenic disease name in the feature relational database is traversed, and set the set similarity value corresponding to the feature set I and each feature set A Summarize and sort, and output the name of the single-gene disease corresponding to the highest similarity value.
优选地,根据单基因病名称的特征关系数据库的方法包括:Preferably, the method for the relational database based on the characteristics of the names of single-gene diseases includes:
从单基因病的公共数据库和文献数据库,获得已知的单基因病名称及其对应的标准临床特征;Obtain the names of known monogenic diseases and their corresponding standard clinical features from public databases and literature databases of monogenic diseases;
基于已知的单基因病名称及其对应的标准临床特征,建立单基因病名称与标准临床特征的特征关系数据库;Based on the known names of single-gene diseases and their corresponding standard clinical features, establish a feature relationship database between the names of single-gene diseases and standard clinical features;
分别计算每种单基因病名称对应的各标准临床特征对该单基因病的贡献度c i Calculate the contribution c i of each standard clinical feature corresponding to each single-gene disease name to the single-gene disease.
较佳地,构建单基因病的标准化临床特征表型树的方法包括:Preferably, the method for constructing a standardized clinical feature phenotype tree of a single gene disease includes:
从特征关系数据库中获取数据,基于HPO构建单基因病的标准化临床特征表型树;Obtain data from the characteristic relational database, and construct a standardized clinical characteristic phenotype tree of monogenic diseases based on HPO;
所述标准化临床特征表型树由多个干节点和与每个干节点关联的至少一个支节点组成,每个支节点用于表示一个标准化临床特征,每个干节点用于表示关联的标准化临床特征的索引。The standardized clinical feature phenotype tree is composed of multiple stem nodes and at least one branch node associated with each stem node. Each branch node is used to represent a standardized clinical feature, and each stem node is used to represent an associated standardized clinical feature. The index of the feature.
进一步地,基于标准化临床特征表型树上的节点标记,从特征集合A中匹配出与特征集合I中每个临床特征对应的最佳标准临床特征的方法包括:Further, the method of matching the best standard clinical feature corresponding to each clinical feature in feature set I from feature set A based on the node labels on the standardized clinical feature phenotype tree includes:
所述特征集合I包括多个临床特征,所述特征集合A包括多个标准临床特征;The feature set I includes multiple clinical features, and the feature set A includes multiple standard clinical features;
遍历所述特征集合I中的第i个临床特征,从所述特征集合A中筛选出与所述第i个临床特征相似度最高的标准临床特征,作为与所述第i个临床特征对应的最佳标准临床特征,所述i的初始值为1;Traverse the i-th clinical feature in the feature set I, and select the standard clinical feature with the highest similarity to the i-th clinical feature from the feature set A, as the standard clinical feature corresponding to the i-th clinical feature The best standard clinical feature, the initial value of i is 1;
令i=i+1后重新遍历所述特征集合I中的第i个临床特征,直至特征集合I中的临床特征遍历完毕,从第n个单基因病名称对应的特征集合A中筛选出与特征集合I中临床特征一一对应的多个最佳标准临床特征。Let i=i+1 and re-traverse the i-th clinical feature in the feature set I until the clinical feature in the feature set I is traversed. From the feature set A corresponding to the name of the n-th monogenic disease, select the The clinical features in feature set I correspond to multiple best standard clinical features one-to-one.
优选地,从所述特征集合A中筛选出与所述第i个临床特征相似度最高的标准临床特征的方法包括:Preferably, the method for selecting the standard clinical feature with the highest similarity to the i-th clinical feature from the feature set A includes:
遍历所述特征集合A中的第j个标准临床特征,基于已建立的索引判断所述第j个标准临床特征与所述第i个临床特征是否存在相同的干节点B t,所述j的初始值为1; Traverse the j-th standard clinical feature in the feature set A, and determine whether the j-th standard clinical feature and the i-th clinical feature have the same stem node B t based on the established index. The initial value is 1;
若判断结果为否,则认为所述第j个标准临床特征与所述第i个临床特征的相似度值为零;If the judgment result is no, it is considered that the similarity value between the j-th standard clinical feature and the i-th clinical feature is zero;
若判断结果为是,基于多层级结构相似度算法计算所述第j个标准临床特征与所述第i个临床特征的相似度值;If the judgment result is yes, calculate the similarity value between the j-th standard clinical feature and the i-th clinical feature based on a multi-level structure similarity algorithm;
令j=j+1后重新遍历所述特征集合A中的第j个标准临床特征,并继续执行所述第j个标准临床特征与所述第i个临床特征的相似度计算,直至所述特征集合A中的标准临床特征遍历完毕,对应得到与所述特征集合A中标准临床特征一一对应的多个相似度值;Let j=j+1, traverse the j-th standard clinical feature in the feature set A again, and continue to perform the similarity calculation between the j-th standard clinical feature and the i-th clinical feature until the The standard clinical features in the feature set A are traversed, and multiple similarity values corresponding to the standard clinical features in the feature set A are correspondingly obtained;
从多个相似度值筛中筛选出最大值对应的标准临床特征作为与第i个临床特征对应的最佳标准临床特征。The standard clinical feature corresponding to the maximum value is selected from multiple similarity value screens as the best standard clinical feature corresponding to the i-th clinical feature.
优选地,基于多层级结构相似度算法计算所述第j个标准临床特征与所述第i个临床特征的相似度值的方法包括:Preferably, the method for calculating the similarity value between the j-th standard clinical feature and the i-th clinical feature based on a multi-level structure similarity algorithm includes:
基于标准化临床特征表型树上的节点标记,获取第i个临床特征与相同干节点B t连接通路中所有节点的有向集合IB,以及获取第j个标准临床特征相同干节点B t连接通路中所有节点的有向集合AB,所述有向集合IB长度的值为通路中节点的个数L IB,所述有向集合AB长度的值为通路中节点的个数L ABBased on the node labeling on the standardized clinical feature phenotype tree, obtain the directed set IB of all nodes in the path between the i-th clinical feature and the same stem node B t , and obtain the j-th standard clinical feature of the same stem node B t connected path The value of the length of the directed set IB is the number of nodes in the path L IB , and the value of the length of the directed set AB is the number of nodes in the path L AB ;
提取所述有向集合IB和所述有向集合AB中节点的交集IAB,所述交集IAB长度的值为通路中共有节点的个数L IABExtracting the intersection IAB of the nodes in the directed set IB and the directed set AB, and the value of the length of the intersection IAB is the number of common nodes in the path L IAB ;
采用公式S IiAj=β·SM+(1-β)·SI计算所述第j个标准临床特征与所述第i个临床特征的相似度值;其中, The formula S IiAj = β·SM+(1-β)·SI is used to calculate the similarity value between the j-th standard clinical feature and the i-th clinical feature; where,
所述SM表示所述第j个标准临床特征与所述第i个临床特征在表型树多层次间的相似度值;The SM represents the similarity value between the j-th standard clinical feature and the i-th clinical feature at multiple levels of the phenotype tree;
所述SI表示所述第j个标准临床特征与所述第i个临床特征在表型树同层次间的相似度值,所述β为权重系数。The SI represents the similarity value between the j-th standard clinical feature and the i-th clinical feature at the same level of the phenotype tree, and the β is a weighting coefficient.
示例地,所述SM的计算公式为SM=L IAB/max(L AB,L IB),所述SI的计算公式为SI=1/(L AB+L IB-2L IAB+1)。 For example, the calculation formula of the SM is SM=L IAB /max(L AB , L IB ), and the calculation formula of the SI is SI=1/(L AB +L IB -2L IAB +1).
优选地,根据每个临床特征与对应的最佳标准临床特征的相似度值,计算出特征集合I与当前特征集合A的集合相似度值的方法包括:Preferably, the method for calculating the set similarity value between the feature set I and the current feature set A according to the similarity value between each clinical feature and the corresponding best standard clinical feature includes:
利用第i个临床特征的贡献度c i,对特征集合A中与之对应最佳标准临床特征的最大相似度值进行加权处理; Use the contribution degree c i of the i-th clinical feature to weight the maximum similarity value corresponding to the best standard clinical feature in the feature set A;
令i=i+1,重新对特征集合A中与第i个临床特征对应的最佳标准临床特征的最大相似度值进行加权处理,直至将特征集合A中筛选出的全部最佳标准临床特征加权处理完毕,累加特征集合A中全部最佳标准临床特征对应的加权最大相似度值,得到特征集合I与当前特征集合A的集合相似度值。Let i=i+1, re-weight the maximum similarity value of the best standard clinical feature corresponding to the i-th clinical feature in feature set A until all the best standard clinical features selected in feature set A are selected After the weighting process is completed, the weighted maximum similarity values corresponding to all the best standard clinical features in the feature set A are accumulated, and the set similarity values of the feature set I and the current feature set A are obtained.
与现有技术相比,本发明提供的基于多层级结构相似度的单基因病名称推荐方法具有以下有益效果:Compared with the prior art, the single gene disease name recommendation method based on multi-level structural similarity provided by the present invention has the following beneficial effects:
本发明提供的基于多层级结构相似度的单基因病名称推荐方法中,首先基于单基因病名称的特征关系数据库构建单基因病的标准化临床特征表型树,然后将用户输入的特征集合I中的临床特征在标准化临床特征表型树上的节点标记,并遍历特征关系数据库中的第n个单基因病名称,将当前第n个单基因病名称对应的特征集合A中的标准临床特征在标准化临床特征表型树上的节点标记,然后根据标准化临床特征表型树上的节点标记,从特征集合A中分别匹配出与特征集合I中每个临床特征一一对应的最佳标准临床特征,并根据每个临床特征与对应的最佳标准临床特征的相似度值,计算出特征集合I与当前特征集合A的集合相似度值,在此之后,令n=n+1重新遍历特征关系数据库中的第n个单基因病名称,直至特征关系数据库中的单基因病名称遍历完毕,将特征集合I与每个特征集合A对应的集合相似度值汇总排序,输出最高相似度值对应的单基因病名称。In the single-gene disease name recommendation method based on multi-level structural similarity provided by the present invention, a standardized clinical feature phenotype tree of the single-gene disease is first constructed based on the feature relational database of the single-gene disease name, and then the feature set I input by the user is The clinical features of is marked at the node on the standardized clinical feature phenotype tree, and the nth monogenic disease name in the feature relational database is traversed, and the standard clinical features in feature set A corresponding to the current nth monogenic disease name are placed in Standardize the node labels on the clinical feature phenotype tree, and then according to the node labels on the standardized clinical feature phenotype tree, match the best standard clinical features one-to-one corresponding to each clinical feature in feature set I from feature set A, respectively , And calculate the set similarity value of feature set I and current feature set A according to the similarity value of each clinical feature and the corresponding best standard clinical feature. After that, let n=n+1 re-traverse the feature relationship The nth monogenic disease name in the database, until the monogenic disease name in the feature relational database is traversed, the collection similarity values corresponding to feature set I and each feature set A are summarized and sorted, and the highest similarity value is output The name of a single gene disease.
可见,本发明提供的基于多层级结构相似度的单基因病名称推荐方法的使用过程具有便捷友好的特点,通过即时搜索和表型树可以非常方便的输入标准化的临床特征,而且允许用户输入相似的临床特征,降低对用户输入限制的要求,提高了智能化诊断程度,点击查询后快速输出单基因名称的推荐结果,提高了单基因病的诊断准确率和诊断效率。It can be seen that the use process of the single gene disease name recommendation method based on multi-level structural similarity provided by the present invention is convenient and friendly. It is very convenient to input standardized clinical features through instant search and phenotype tree, and allows users to input similarities. The clinical characteristics of the system reduce the requirements for user input restrictions and improve the degree of intelligent diagnosis. After clicking the query, the recommended results of the single-gene name are quickly output, which improves the accuracy and efficiency of the diagnosis of single-gene diseases.
本发明的另一方面提供一种基于多层级结构相似度的单基因病名称推荐系统,包括:Another aspect of the present invention provides a single gene disease name recommendation system based on multi-level structural similarity, including:
表型树单元,用于根据单基因病名称的特征关系数据库,构建单基因病的标准化临床特征表型树;The phenotype tree unit is used to construct a standardized clinical feature phenotype tree of the single-gene disease according to the feature relation database of the name of the single-gene disease;
输入单元,用于将用户输入的特征集合I中的临床特征在标准化临床特征表型树上的节点标记;The input unit is used to mark the nodes of the clinical features in the feature set I input by the user on the standardized clinical feature phenotype tree;
遍历单元,用于遍历特征关系数据库中的第n个单基因病名称,将其对应特征集合A中的标准临床特征在标准化临床特征表型树上的节点标记,所述n的初始值为1;The traversal unit is used to traverse the name of the nth monogenic disease in the feature relational database, and mark the node of the standard clinical feature in the feature set A on the standardized clinical feature phenotype tree, and the initial value of n is 1 ;
检索单元,基于标准化临床特征表型树上的节点标记,从特征集合A中匹配出与特征集合I中每个临床特征对应的最佳标准临床特征;The retrieval unit, based on the node labels on the standardized clinical feature phenotype tree, matches the best standard clinical feature corresponding to each clinical feature in feature set I from feature set A;
计算单元,用于根据每个临床特征与对应的最佳标准临床特征的相似度值,计算出特征集合I与当前特征集合A的集合相似度值;The calculation unit is used to calculate the set similarity value between the feature set I and the current feature set A according to the similarity value between each clinical feature and the corresponding best standard clinical feature;
判断单元,令n=n+1重新响应遍历标记单元,直至特征关系数据库中的单基因病名称遍历完毕;The judging unit makes n=n+1 respond again to the traversal marking unit until the traversal of the single gene disease name in the characteristic relational database is completed;
输出单元,用于将特征集合I与每个特征集合A对应的集合相似度值汇总排序,输出最高相似度值对应的单基因病名称。The output unit is used to summarize and sort the set similarity values corresponding to the feature set I and each feature set A, and output the name of the single gene disease corresponding to the highest similarity value.
与现有技术相比,本发明提供的基于多层级结构相似度的单基因病名称推荐系统的有益效果与上述技术方案提供的基于多层级结构相似度的单基因病名称推荐方法有益效果相同,在此不做赘述。Compared with the prior art, the beneficial effects of the single-gene disease name recommendation system based on multi-level structure similarity provided by the present invention are the same as the beneficial effects of the single-gene disease name recommendation method based on multi-level structure similarity provided by the above technical solutions. I won't repeat it here.
本发明的第三方面提供一种计算机可读存储介质,例如是非易失性计算机可读存储介质,其中计算机可读存储介质上存储有计算机可读指令,计算机可读指令被处理器运行时执行上述基于多层级结构相似度的单基因病名称推荐方法的步骤。The third aspect of the present invention provides a computer-readable storage medium, for example, a non-volatile computer-readable storage medium, wherein computer-readable instructions are stored on the computer-readable storage medium, and the computer-readable instructions are executed when the processor is running The steps of the above-mentioned method for recommending names of single-gene diseases based on multi-level structural similarity.
与现有技术相比,本发明提供的计算机可读存储介质的有益效果与上述技术方案提供的基于多层级结构相似度的单基因病名称推荐方法的有益效果相同,在此不做赘述。Compared with the prior art, the beneficial effects of the computer-readable storage medium provided by the present invention are the same as the beneficial effects of the single-gene disease name recommendation method based on multi-level structural similarity provided by the above technical solutions, and will not be repeated here.
附图说明Description of the drawings
此处所说明的附图用来提供对本发明的进一步理解,构成本发明的一部分,本发明的示意性实施例及其说明用于解释本发明,并不构成对本发明的不当限定。在附图中:The drawings described here are used to provide a further understanding of the present invention and constitute a part of the present invention. The exemplary embodiments of the present invention and the description thereof are used to explain the present invention, and do not constitute an improper limitation of the present invention. In the attached picture:
图1为实施例一中基于多层级结构相似度的单基因病名称推荐方法的流程示意图;Fig. 1 is a schematic flow chart of a method for recommending names of single-gene diseases based on similarity of multi-level structure in the first embodiment;
图2为本发明实施例一中标准化临床特征表型树上的节点标记示例图;2 is an example diagram of node labels on the standardized clinical feature phenotype tree in Embodiment 1 of the present invention;
图3为实施例二中基于多层级结构相似度的单基因病名称推荐系统的结构框图;3 is a structural block diagram of a single gene disease name recommendation system based on multi-level structural similarity in the second embodiment;
图4为本发明实施例四中基于多层级结构相似度的单基因病名称推荐方法应用的环境架构示意图;4 is a schematic diagram of the environment architecture of the application of the single gene disease name recommendation method based on multi-level structural similarity in the fourth embodiment of the present invention;
图5为本发明实施例四中基于多层级结构相似度的单基因病名称推荐方法应用的环境架构的一种示例图。FIG. 5 is an example diagram of an environment architecture for the application of the single-gene disease name recommendation method based on multi-level structural similarity in the fourth embodiment of the present invention.
具体实施方式Detailed ways
为使本发明的上述目的、特征和优点能够更加明显易懂,下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述。显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有作出创造性劳动的前提下所获得的所有其它实施例,均属于本发明保护的范围。In order to make the above objectives, features, and advantages of the present invention more obvious and understandable, the technical solutions in the embodiments of the present invention will be described clearly and completely in conjunction with the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only a part of the embodiments of the present invention, rather than all the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative work shall fall within the protection scope of the present invention.
实施例一Example one
请参阅图1,本实施例提供一种基于多层级结构相似度的单基因病名称推荐方法,包 括:Referring to Fig. 1, this embodiment provides a method for recommending names of single-gene diseases based on multi-level structural similarity, including:
根据单基因病名称的特征关系数据库,构建单基因病的标准化临床特征表型树;将用户输入的特征集合I中的临床特征在标准化临床特征表型树上的节点标记;遍历特征关系数据库中的第n个单基因病名称,将其对应的特征集合A中的标准临床特征在标准化临床特征表型树上的节点标记,n的初始值为1;基于标准化临床特征表型树上的节点标记,从特征集合A中匹配出与特征集合I中每个临床特征对应的最佳标准临床特征;根据每个临床特征与对应的最佳标准临床特征的相似度值,计算出特征集合I与当前特征集合A的集合相似度值;令n=n+1重新遍历特征关系数据库中的第n个单基因病名称,直至特征关系数据库中的单基因病名称遍历完毕,将特征集合I与每个特征集合A对应的集合相似度值汇总排序,输出最高相似度值对应的单基因病名称。Construct a standardized clinical feature phenotype tree of single gene disease according to the feature relation database of the name of a single gene disease; mark the clinical features in the feature set I input by the user in the node mark on the standardized clinical feature phenotype tree; traverse the feature relation database The name of the n-th monogenic disease, and mark the node of the standard clinical feature in the corresponding feature set A on the standardized clinical feature phenotype tree, and the initial value of n is 1; based on the node on the standardized clinical feature phenotype tree Mark, match the best standard clinical feature corresponding to each clinical feature in feature set I from feature set A; calculate the similarity value between each clinical feature and the corresponding best standard clinical feature, calculate feature set I and The set similarity value of the current feature set A; let n=n+1 to re-traverse the nth monogenic disease name in the feature relational database until the monogenic disease name in the feature relational database is traversed, and the feature set I and each The set similarity values corresponding to the feature set A are summarized and sorted, and the single gene disease name corresponding to the highest similarity value is output.
本实施例提供的基于多层级结构相似度的单基因病名称推荐方法中,首先基于单基因病名称的特征关系数据库构建单基因病的标准化临床特征表型树,然后将用户输入的特征集合I中的临床特征在标准化临床特征表型树上的节点标记,并遍历特征关系数据库中的第n个单基因病名称,将当前第n个单基因病名称对应的特征集合A中的标准临床特征在标准化临床特征表型树上的节点标记,然后根据标准化临床特征表型树上的节点标记,从特征集合A中分别匹配出与特征集合I中每个临床特征一一对应的最佳标准临床特征,并根据每个临床特征与对应的最佳标准临床特征的相似度值,计算出特征集合I与当前特征集合A的集合相似度值,在此之后,令n=n+1重新遍历特征关系数据库中的第n个单基因病名称,直至特征关系数据库中的单基因病名称遍历完毕,将特征集合I与每个特征集合A对应的集合相似度值汇总排序,输出最高相似度值对应的单基因病名称。In the single-gene disease name recommendation method based on multi-level structural similarity provided in this embodiment, a standardized clinical feature phenotype tree of the single-gene disease is first constructed based on the feature relational database of the single-gene disease name, and then the characteristics input by the user are set. The clinical features in the node are marked on the standardized clinical feature phenotype tree, and the nth monogenic disease name in the feature relational database is traversed, and the current nth monogenic disease name corresponds to the standard clinical features in the feature set A Mark the nodes on the standardized clinical feature phenotype tree, and then according to the node tags on the standardized clinical feature phenotype tree, from feature set A, respectively match the best standard clinical one-to-one correspondence with each clinical feature in feature set I According to the similarity value of each clinical feature and the corresponding best standard clinical feature, the set similarity value of feature set I and the current feature set A is calculated. After that, let n=n+1 re-traverse the features The n-th monogenic disease name in the relational database, until the monogenic disease name in the feature relational database is traversed, the collection similarity values corresponding to feature set I and each feature set A are summarized and sorted, and the highest similarity value is output. The name of the single-gene disease.
可见,本实施例提供的基于多层级结构相似度的单基因病名称推荐方法在使用过程中具有便捷友好的特点,通过即时搜索和表型树可以非常方便的输入标准化的临床特征,而且允许用户输入相似的临床特征,降低对用户输入限制的要求,提高了智能化诊断程度,点击查询后快速输出单基因名称的推荐结果,提高了单基因病的诊断准确率和诊断效率。It can be seen that the single-gene disease name recommendation method based on multi-level structural similarity provided by this embodiment is convenient and friendly in use. It is very convenient to input standardized clinical features through instant search and phenotype tree, and allows users Enter similar clinical features, reduce the requirements for user input restrictions, and improve the degree of intelligent diagnosis. After clicking the query, the recommended results of single gene names are quickly output, which improves the accuracy and efficiency of single gene disease diagnosis.
具体地,上述实施例中根据单基因病名称的特征关系数据库的方法包括:Specifically, the method for the feature relational database based on the name of the single gene disease in the above embodiment includes:
从单基因病的公共数据库和文献数据库,获得已知的单基因病名称及其对应的标准临床特征;基于已知的单基因病名称及其对应的标准临床特征,建立单基因病名称与标准临床特征的特征关系数据库;分别计算每种单基因病名称对应的各标准临床特征对该单基因病的贡献度c iObtain the names of known monogenic diseases and their corresponding standard clinical features from the public databases and literature databases of monogenic diseases; establish the names and standards of monogenic diseases based on the known names of monogenic diseases and their corresponding standard clinical features The feature relation database of clinical features; respectively calculate the contribution c i of each standard clinical feature corresponding to each single-gene disease name to the single-gene disease.
优选地,还需参照中文人类表型标准用语联盟将特征关系数据库中的外文信息对应翻译成中文信息,以实现对中文版病历资料的识别匹配。Preferably, it is also necessary to translate the foreign language information in the characteristic relational database into Chinese information with reference to the Chinese Human Phenotype Standard Phrase Consortium, so as to realize the identification and matching of the Chinese version of the medical record data.
具体实施时,公共数据库为MedGen数据库,文献数据库为PubMed数据库,特征关系数据库中包括互相匹配的单基因病名称、外文临床特征、临床特征在人类表型标准用语数据库中的编号(HPOIDs)以及中文临床特征。本实施例可以为单基因病的临床诊断和鉴别提供线索和理论支持,也为进一步缩小基因检测的范围提供了数据支持。同时,本实施例建立的临床特征关系数据库覆盖的单基因病种类达8600种以上,单基因病表型临床特征超过11000个,表型与临床特征关系数据达9万种以上,囊括了单基因病研究方向最新的数据库版本和文献报道。In specific implementation, the public database is the MedGen database, and the literature database is the PubMed database. The feature relational database includes matching single gene disease names, clinical features in foreign languages, clinical features in the human phenotype standard term database (HPOIDs) and Chinese Clinical features. This embodiment can provide clues and theoretical support for the clinical diagnosis and identification of single gene diseases, and also provide data support for further narrowing the scope of genetic testing. At the same time, the clinical feature relationship database established in this embodiment covers more than 8,600 types of single gene diseases, more than 11,000 clinical features of single gene disease phenotypes, and more than 90,000 types of relationship data between phenotypes and clinical features, including single genes. The latest database versions and literature reports for disease research.
具体地,每种单基因病名称对应的各标准临床特征对该单基因病的贡献度c i的计算方法如下: Specifically, the calculation method of the contribution c i of each standard clinical feature corresponding to each single-gene disease name to the single-gene disease is as follows:
在特征关系数据库中,假设共有a种标准临床特征,a种标准临床特征在特征关系数据库中一共出现N次,假定每种标准临床特征出现的次数为a i,则每个标准临床特征在特征关系数据库中出现的频率为f i,f i的计算公式为: In the feature relational database, assuming that there are a total of a standard clinical features, a standard clinical feature appears N times in the feature relational database, assuming that the number of occurrences of each standard clinical feature is a i , then each standard clinical feature is in the feature The frequency of occurrence in the relational database is f i , and the calculation formula of f i is:
f i=a i/N; f i =a i /N;
对于特征关系数据库中的某种单基因病,假定对应有m个标准临床特征,每个标准临床特征在特征关系数据库中的分布频率依次为f 1、f 2、……、f m,则某个标准临床特征对该单基因病的贡献度c i的计算公式为: For a certain monogenic disease in the feature relational database, assuming that there are m standard clinical features, the distribution frequency of each standard clinical feature in the feature relational database is f 1 , f 2 , ..., f m , then The calculation formula for the contribution c i of a standard clinical feature to the monogenic disease is:
Figure PCTCN2020111130-appb-000001
Figure PCTCN2020111130-appb-000001
上述公式中,k为校正因子,且k>1,特征关系数据库作为参考数据库使用。In the above formula, k is the correction factor, and k>1, and the characteristic relation database is used as a reference database.
特征集合I,也即临床特征信息集合可通过可视化工具实现两种方式的标准化输入:第一种方式是输入关键词,每一个关键词相当于一个临床特征,通过即时搜索提供相关标准化表型信息的下拉菜单方便用户选择,实现标准化临床特诊信息的输入;第二种方式是直接在表型树上,通过鼠标点击相关的标准化临床特征信息,进行输入。Feature set I, that is, clinical feature information collection can be standardized in two ways through visualization tools: the first way is to enter keywords, each keyword is equivalent to a clinical feature, and related standardized phenotypic information can be provided through instant search The drop-down menu is convenient for users to choose and realize the input of standardized clinical special diagnosis information; the second way is to directly click on the relevant standardized clinical feature information on the phenotype tree to input.
上述实施例中构建单基因病的标准化临床特征表型树的方法包括:The method for constructing a standardized clinical feature phenotype tree of a single gene disease in the above embodiment includes:
从特征关系数据库中获取数据,基于HPO构建单基因病的标准化临床特征表型树;其中,标准化临床特征表型树由多个干节点和与每个干节点关联的至少一个支节点组成,每个支节点用于表示一个标准化临床特征,每个干节点用于表示关联的标准化临床特征的索引。HPO是指hp.obo文件。Obtain data from the feature relational database, and construct a standardized clinical feature phenotype tree for monogenic diseases based on HPO; among them, the standardized clinical feature phenotype tree consists of multiple stem nodes and at least one branch node associated with each stem node. Each branch node is used to represent a standardized clinical feature, and each stem node is used to represent an index of the associated standardized clinical feature. HPO refers to the hp.obo file.
上述实施例中基于标准化临床特征表型树上的节点标记,从特征集合A中匹配出与特征集合I中每个临床特征对应的最佳标准临床特征的方法包括:In the above embodiment, the method of matching the best standard clinical feature corresponding to each clinical feature in feature set I from feature set A based on the node labels on the standardized clinical feature phenotype tree includes:
特征集合I包括多个临床特征,特征集合A包括多个标准临床特征;遍历特征集合I 中的第i个临床特征,从特征集合A中筛选出与第i个临床特征相似度最高的标准临床特征,作为与第i个临床特征对应的最佳标准临床特征,i的初始值为1;令i=i+1后重新遍历特征集合I中的第i个临床特征,直至特征集合I中的临床特征遍历完毕,从第n个单基因病名称对应的特征集合A中筛选出与特征集合I中临床特征一一对应的多个最佳标准临床特征。Feature set I includes multiple clinical features, feature set A includes multiple standard clinical features; traverse the i-th clinical feature in feature set I, and select the standard clinical feature with the highest similarity to the i-th clinical feature from feature set A Feature, as the best standard clinical feature corresponding to the i-th clinical feature, the initial value of i is 1; let i=i+1 and re-traverse the i-th clinical feature in the feature set I until the i-th clinical feature in the feature set I After the clinical feature traversal is completed, multiple best standard clinical features corresponding to the clinical features in feature set I are screened out from feature set A corresponding to the name of the n-th monogenic disease.
进一步地,从特征集合A中筛选出与第i个临床特征相似度最高的标准临床特征的方法包括:Further, the method for selecting the standard clinical feature with the highest similarity to the i-th clinical feature from feature set A includes:
遍历特征集合A中的第j个标准临床特征,基于已建立的索引判断第j个标准临床特征与第i个临床特征是否存在相同的干节点B t,j的初始值为1;若判断结果为否,则认为第j个标准临床特征与第i个临床特征的相似度值为零;若判断结果为是,基于多层级结构相似度算法计算第j个标准临床特征与第i个临床特征的相似度值;令j=j+1后重新遍历特征集合A中的第j个标准临床特征,并继续执行第j个标准临床特征与第i个临床特征的相似度计算,直至特征集合A中的标准临床特征遍历完毕,对应得到与特征集合A中标准临床特征一一对应的多个相似度值;从多个相似度值筛中筛选出最大值对应的标准临床特征作为与第i个临床特征对应的最佳标准临床特征。 Traverse the j-th standard clinical feature in the feature set A, and judge whether the j-th standard clinical feature and the i-th clinical feature have the same stem node B t based on the established index, the initial value of j is 1; if the result of the judgment is If no, it is considered that the similarity between the j-th standard clinical feature and the i-th clinical feature is zero; if the judgment result is yes, the j-th standard clinical feature and the i-th clinical feature are calculated based on the multi-level structure similarity algorithm The similarity value of; Let j=j+1, re-traverse the j-th standard clinical feature in the feature set A, and continue to perform the similarity calculation between the j-th standard clinical feature and the i-th clinical feature until the feature set A After the traversal of the standard clinical features in feature set A is completed, multiple similarity values corresponding to the standard clinical features in feature set A are obtained; the standard clinical feature corresponding to the maximum value is selected from the multiple similarity value screens as the i-th The clinical features correspond to the best standard clinical features.
上述实施例中基于多层级结构相似度算法计算第j个标准临床特征与第i个临床特征的相似度值的方法包括:In the foregoing embodiment, the method for calculating the similarity value between the j-th standard clinical feature and the i-th clinical feature based on the multi-level structure similarity algorithm includes:
基于标准化临床特征表型树上的节点标记,获取第i个临床特征与相同干节点B t连接通路中所有节点的有向集合IB,以及获取第j个标准临床特征相同干节点B t连接通路中所有节点的有向集合AB,有向集合IB长度的值为通路中节点的个数L IB,有向集合AB长度的值为通路中节点的个数L AB;提取有向集合IB和有向集合AB中节点的交集IAB,交集IAB长度的值为通路中共有节点的个数L IAB;采用公式S IiAj=β·SM+(1-β)·SI算第j个标准临床特征与第i个临床特征的相似度值; Based on the node labeling on the standardized clinical feature phenotype tree, obtain the directed set IB of all nodes in the path between the i-th clinical feature and the same stem node B t , and obtain the j-th standard clinical feature of the same stem node B t connected path The length of the directed set IB is the number of nodes in the path L IB , and the length of the directed set AB is the number of nodes in the path L AB ; extract the directed set IB and the number of nodes in the path. To the intersection IAB of the nodes in the set AB, the length of the intersection IAB is the number of common nodes in the path L IAB ; the formula S IiAj = β·SM+(1-β)·SI is used to calculate the j-th standard clinical feature and the i-th The similarity value of each clinical feature;
其中,SM表示第j个标准临床特征与第i个临床特征在表型树多层次间的相似度值;SI表示第j个标准临床特征与第i个临床特征在表型树同层次间的相似度值,β为权重系数。Among them, SM represents the similarity value between the j-th standard clinical feature and the i-th clinical feature at multiple levels in the phenotype tree; SI represents the j-th standard clinical feature and the i-th clinical feature at the same level in the phenotype tree Similarity value, β is the weight coefficient.
具体实施时,对于特征关系数据库中某一单基因病名称对应的特征集合A有n个元素A j组成,分别为A 1、A 2、……、A n,也即A=[A 1,A 2,...,A j...,A n],特征关系数据库中的每一个基因病名称均对应一个集合A。假若某一单基因病患者输入的标准化特征集合I,有m个临床I i组成,对应的特征集合I=[I 1、I 2、……、I m]。如果I i与A j的干节点不相同,则认为I i与A j的相似度为0,如果I i与A j的干节点相同,如图2所示,相同的干节点为B t,则计算I i与A j的相似度,计算方法为:I i到B t之间连接通路中的所有节点组成 有向集合IB,有向集合IB的元素个数记为N IB,有向集合IB的长度定义为该通路上节点的个数,记为L IB,且L IB=N IBIn specific implementation, the feature set A corresponding to a single gene disease name in the feature relational database consists of n elements A j , which are A 1 , A 2 , ..., A n , that is, A=[A 1 , A 2 ,...,A j ...,A n ], each gene disease name in the characteristic relational database corresponds to a set A. If the standardized feature set I input by a patient with a monogenic disease is composed of m clinical I i , the corresponding feature set I = [I 1 , I 2 , ..., Im ]. If the stem nodes of I i and A j are not the same, then the similarity between I i and A j is considered to be 0. If the stem nodes of I i and A j are the same, as shown in Figure 2, the same stem node is B t , Then calculate the similarity between I i and A j , the calculation method is: all nodes in the connecting path between I i and B t form a directed set IB, the number of elements in the directed set IB is denoted as N IB , the directed set The length of IB is defined as the number of nodes on the path, denoted as L IB , and L IB =N IB ;
A j到B t之间连接通路中的所有节点组成有向集合AB,有向集合AB的元素个数记为N AB,有向集合AB的长度定义为该通路上节点的个数,记为L AB,且L AB=N ABAll nodes in the connecting path between A j and B t form a directed set AB. The number of elements in the directed set AB is denoted as NAB . The length of the directed set AB is defined as the number of nodes on the path, denoted as L AB , and L AB =N AB ;
有向集合IB和有向集合AB的交集集合记为IAB,交集集合IAB的元素个数记为N IAB,集合IAB的长度定义为共有路径上节点的个数,记为L IAB,则L IAB=N IAB,其中,SM=L IAB/max(L AB,L IB),SI=1/(L AB+L IB-2L IAB+1),β为权重系数,β∈(0,1);I i与A j之间的相似度的取值范围S IiAj∈[0,1]。 The intersection set of the directed set IB and the directed set AB is denoted as IAB, the number of elements in the intersection set IAB is denoted as N IAB , and the length of the set IAB is defined as the number of nodes on the common path, denoted as L IAB , then L IAB =N IAB , where SM=L IAB /max(L AB ,L IB ), SI=1/(L AB +L IB -2L IAB +1), β is the weight coefficient, β∈(0,1); The value range of the similarity between I i and A j is S IiAj ∈ [0,1].
进一步地,上述实施例中根据每个临床特征与对应的最佳标准临床特征的相似度值,计算出特征集合I与当前特征集合A的集合相似度值的方法包括:Further, in the foregoing embodiment, the method for calculating the set similarity value between the feature set I and the current feature set A according to the similarity value between each clinical feature and the corresponding best standard clinical feature includes:
利用第i个临床特征的贡献度c i,对特征集合A中与之对应最佳标准临床特征的最大相似度值进行加权处理;令i=i+1,重新对特征集合A中与第i个临床特征对应的最佳标准临床特征的最大相似度值进行加权处理,直至将特征集合A中筛选出的全部最佳标准临床特征加权处理完毕,累加特征集合A中全部最佳标准临床特征对应的加权最大相似度值,得到特征集合I与当前特征集合A的集合相似度值。 Use the contribution degree c i of the i-th clinical feature to weight the maximum similarity value corresponding to the best standard clinical feature in the feature set A; The maximum similarity value of the best standard clinical features corresponding to each clinical feature is weighted, until all the best standard clinical features selected in feature set A are weighted, and all the best standard clinical features in feature set A are accumulated. The weighted maximum similarity value of, obtains the set similarity value of the feature set I and the current feature set A.
具体实施时,对于每个输入的临床特征I i,都可以在特征集合A中找到一个与之对应相似度最大的标准临床特征A j,也就是说每个临床特征I i都会得到一个与特征集合A的相似度值,特征集合I和特征集合A的相似度,定义为特征集合I中的每个临床特征I i与特征集合A的相似度之和。 In specific implementation, for each input clinical feature I i , a standard clinical feature A j corresponding to the greatest similarity can be found in the feature set A, that is to say, each clinical feature I i will get an and feature The similarity value of the set A, the similarity between the feature set I and the feature set A, is defined as the sum of the similarity between each clinical feature I i in the feature set I and the feature set A.
考虑到每个临床特征对单基因病的贡献程度不一,需对相应的最大相似度值进行加权处理,其计算公式为
Figure PCTCN2020111130-appb-000002
其中S IiA表示临床特征I i与特征集合A的相似度值。特征集合I和特征集合A的相似度值,定义为特征集合I中每个临床特征I i与特征集合A的相似度之和,其计算公式为
Figure PCTCN2020111130-appb-000003
S IA表示特征集合I与特征集合A的相似度值。
Considering that each clinical feature has different contributions to single-gene disease, the corresponding maximum similarity value needs to be weighted, and the calculation formula is
Figure PCTCN2020111130-appb-000002
Wherein S IiA represents the similarity value between the clinical feature I i and the feature set A. The similarity value of feature set I and feature set A is defined as the sum of similarity between each clinical feature I i in feature set I and feature set A, and its calculation formula is
Figure PCTCN2020111130-appb-000003
S IA represents the similarity value between feature set I and feature set A.
本实施例的优势在于,1、开发了友好的客户端,用户可通过鼠标点击或者输入关键词即时搜索输入标准化临床特征,非常方便;2、通过计算临床特征I与特征集合A的多层次结构的相似度,而多层次结构相似度算法对输入表型进行了模糊处理,降低了对医生的输入限制要求,使得输入过程更加友好和智能,能够结合输入信息利用自定义的多 层次结构相似度算法,计算与单基因病疾病名称的关联强度,根据关联强度提示患者可能所患的单基因病,对单基因病名称进行精准推荐。The advantages of this embodiment are: 1. A friendly client is developed, and users can instantly search and input standardized clinical features by clicking on the mouse or input keywords, which is very convenient; 2. By calculating the multi-level structure of clinical feature I and feature set A The similarity of the multi-level structure similarity algorithm for the input phenotype is blurred, which reduces the input restriction requirements for the doctor, makes the input process more friendly and intelligent, and can use the customized multi-level structure similarity in combination with the input information The algorithm calculates the strength of the association with the name of the single-gene disease, and according to the strength of the association prompts the patient with the possible single-gene disease, and accurately recommends the name of the single-gene disease.
实施例二Example two
请参阅图3,本实施例提供一种基于多层级结构相似度的单基因病名称推荐系统,包括:Referring to FIG. 3, this embodiment provides a single-gene disease name recommendation system based on multi-level structural similarity, including:
表型树单元,用于根据单基因病名称的特征关系数据库,构建单基因病的标准化临床特征表型树;The phenotype tree unit is used to construct a standardized clinical feature phenotype tree of the single-gene disease according to the feature relation database of the name of the single-gene disease;
输入单元,用于将用户输入的特征集合I中的临床特征在标准化临床特征表型树上的节点标记;The input unit is used to mark the nodes of the clinical features in the feature set I input by the user on the standardized clinical feature phenotype tree;
遍历单元,用于遍历特征关系数据库中的第n个单基因病名称,将其对应特征集合A中的标准临床特征在标准化临床特征表型树上的节点标记,所述n的初始值为1;The traversal unit is used to traverse the name of the nth monogenic disease in the feature relational database, and mark the node of the standard clinical feature in the feature set A on the standardized clinical feature phenotype tree, and the initial value of n is 1 ;
检索单元,基于标准化临床特征表型树上的节点标记,从特征集合A中匹配出与特征集合I中每个临床特征对应的最佳标准临床特征;The retrieval unit, based on the node labels on the standardized clinical feature phenotype tree, matches the best standard clinical feature corresponding to each clinical feature in feature set I from feature set A;
计算单元,用于根据每个临床特征与对应的最佳标准临床特征的相似度值,计算出特征集合I与当前特征集合A的集合相似度值;The calculation unit is used to calculate the set similarity value between the feature set I and the current feature set A according to the similarity value between each clinical feature and the corresponding best standard clinical feature;
判断单元,令n=n+1重新响应遍历标记单元,直至特征关系数据库中的单基因病名称遍历完毕;The judging unit makes n=n+1 respond again to the traversal marking unit until the traversal of the single gene disease name in the characteristic relational database is completed;
输出单元,用于将特征集合I与每个特征集合A对应的集合相似度值汇总排序,输出最高相似度值对应的单基因病名称。The output unit is used to summarize and sort the set similarity values corresponding to the feature set I and each feature set A, and output the name of the single gene disease corresponding to the highest similarity value.
在一个实施例中,上述的单基因病名称推荐系统应用于计算机设备,该计算机设备包括通过系统总线连接的处理器和存储器。其中,该单基因病名称推荐系统的处理器用于提供计算和控制能力。该单基因病名称推荐系统的存储器包括非易失性存储介质、内存储器。该非易失性存储介质存储有操作系统和计算机可读指令。该内存储器为非易失性存储介质中的操作系统和计算机可读指令的运行提供环境。该单基因病名称推荐系统的网络接口用于与外部的传感器通信。该计算机可读指令被处理器执行时以实现上述基于多层级结构相似度的单基因病名称推荐方法的步骤,例如是以上述的表型树单元、输入单元、遍历单元、检索单元、计算单元、判断单元以及输出单元实现上述基于多层级结构相似度的单基因病名称推荐方法的步骤。In one embodiment, the aforementioned single-gene disease name recommendation system is applied to a computer device that includes a processor and a memory connected through a system bus. Among them, the processor of the single gene disease name recommendation system is used to provide calculation and control capabilities. The memory of the single gene disease name recommendation system includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and computer readable instructions. The internal memory provides an environment for the operation of the operating system and computer-readable instructions in the non-volatile storage medium. The network interface of the single gene disease name recommendation system is used to communicate with external sensors. When the computer-readable instruction is executed by the processor, the steps of the above-mentioned single-gene disease name recommendation method based on the similarity of the multi-level structure are implemented, for example, the above-mentioned phenotype tree unit, input unit, traversal unit, retrieval unit, and calculation unit. , The judgment unit and the output unit implement the steps of the above-mentioned method for recommending names of single-gene diseases based on the similarity of the multi-level structure.
与现有技术相比,本发明实施例提供的基于多层级结构相似度的单基因病名称推荐系统的有益效果与上述实施例一提供的基于多层级结构相似度的单基因病名称推荐方法的有益效果相同,在此不做赘述。Compared with the prior art, the beneficial effect of the single-gene disease name recommendation system based on multi-level structure similarity provided by the embodiment of the present invention is similar to that of the single-gene disease name recommendation method based on multi-level structure similarity provided in the first embodiment above. The beneficial effects are the same and will not be repeated here.
实施例三Example three
本实施例提供一种计算机可读存储介质,例如是非易失性计算机可读存储介质,其中计算机可读存储介质上存储有计算机可读指令,计算机可读指令被处理器运行时执行上述基于多层级结构相似度的单基因病名称推荐方法的步骤。This embodiment provides a computer-readable storage medium, for example, a non-volatile computer-readable storage medium, in which computer-readable instructions are stored on the computer-readable storage medium, and the computer-readable instructions are executed when the processor is running. The steps of the method for recommending names of single-gene diseases based on hierarchical structure similarity.
与现有技术相比,本实施例提供的计算机可读存储介质的有益效果与上述技术方案提供的基于多层级结构相似度的单基因病名称推荐方法的有益效果相同,在此不做赘述。Compared with the prior art, the beneficial effects of the computer-readable storage medium provided in this embodiment are the same as the beneficial effects of the single-gene disease name recommendation method based on multi-level structural similarity provided by the above technical solutions, and will not be repeated here.
实施例四Example four
基于上述实施例,参阅图4和图5所示,提供一种应用场景的环境架构示意图。Based on the foregoing embodiment, referring to FIG. 4 and FIG. 5, a schematic diagram of an environment architecture of an application scenario is provided.
可以开发一个应用软件,用于实现上述实施例中的基于多层级结构相似度的单基因病名称推荐方法,并且,该应用软件可以安装在用户终端,用户终端与服务器连接,实现通信。An application software can be developed to implement the single-gene disease name recommendation method based on multi-level structural similarity in the foregoing embodiment, and the application software can be installed in a user terminal, and the user terminal is connected to the server to realize communication.
其中,用户终端可以为计算机、平板电脑等任何智能设备,本实施例仅以电脑为例进行说明。The user terminal may be any smart device such as a computer or a tablet computer, and this embodiment only uses a computer as an example for description.
例如,打开智能设备相关的应用程序,用户使用输入模块如键盘、鼠标等输入特征集合I中的临床特征,实现在应用程序中临床特征的标准化输入,电脑中的应用程序将特征集合I的临床特征发送至数据库检索模块,如服务器,由数据库检索模块采用多层级结构相似度算法遍历计算特征关系数据库中各单基因病名称对应的特征集合A与特征集合I集合的相似度值,汇总排序后得到最高相似度值对应的单基因病名称,然后将最高相似度值对应的单基因病名称通过输出模块,如显示器可视化的反馈给用户。For example, open an application related to a smart device, and the user uses an input module such as a keyboard, a mouse, etc. to input the clinical features in the feature set I to realize the standardized input of the clinical features in the application. The features are sent to a database retrieval module, such as a server. The database retrieval module uses a multi-level structure similarity algorithm to traverse and calculate the similarity values between the feature set A and the feature set I corresponding to each single gene disease name in the feature relational database, and then summarize and sort. Obtain the name of the single-gene disease corresponding to the highest similarity value, and then send the name of the single-gene disease corresponding to the highest similarity value to the user through an output module, such as a display.
本领域普通技术人员可以理解,实现上述发明方法中的全部或部分步骤是可以通过程序来指令相关的硬件来完成,上述程序可以存储于计算机可读取存储介质中,该程序在执行时,包括上述实施例方法的各步骤,而该程序的存储介质可以是:ROM/RAM、磁碟、光盘、存储卡等。A person of ordinary skill in the art can understand that all or part of the steps in the above-mentioned inventive method can be implemented by a program instructing relevant hardware. The above-mentioned program can be stored in a computer readable storage medium. When the program is executed, it includes For each step of the method in the foregoing embodiment, the storage medium of the program may be: ROM/RAM, magnetic disk, optical disk, memory card, etc.
以上,仅为本发明的具体实施方式,但本发明的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本发明揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本发明的保护范围之内。因此,本发明的保护范围应以所述权利要求的保护范围为准。The above are only specific implementations of the present invention, but the protection scope of the present invention is not limited to this. Any person skilled in the art can easily think of changes or substitutions within the technical scope disclosed by the present invention, and they should all be covered. Within the protection scope of the present invention. Therefore, the protection scope of the present invention should be subject to the protection scope of the claims.

Claims (11)

  1. 一种基于多层级结构相似度的单基因病名称推荐方法,包括:A method for recommending names of single-gene diseases based on multi-level structural similarity, including:
    根据单基因病名称的特征关系数据库,构建单基因病的标准化临床特征表型树;Construct a standardized clinical feature phenotype tree of monogenic diseases based on the characteristic relational database of the names of monogenic diseases;
    将用户输入的特征集合I中的临床特征在标准化临床特征表型树上的节点标记;Mark the nodes of the clinical features in the feature set I input by the user on the standardized clinical feature phenotype tree;
    遍历特征关系数据库中的第n个单基因病名称,将其对应特征集合A中的标准临床特征在标准化临床特征表型树上的节点标记,所述n的初始值为1;Traverse the name of the nth monogenic disease in the feature relational database, and mark the node of the standard clinical feature in the corresponding feature set A on the standardized clinical feature phenotype tree, and the initial value of n is 1;
    基于标准化临床特征表型树上的节点标记,从特征集合A中匹配出与特征集合I每个临床特征对应的最佳标准临床特征;Based on the node labels on the standardized clinical feature phenotype tree, the best standard clinical feature corresponding to each clinical feature of feature set I is matched from feature set A;
    根据每个临床特征与对应的最佳标准临床特征的相似度值,计算出特征集合I与当前特征集合A的集合相似度值;以及According to the similarity value between each clinical feature and the corresponding best standard clinical feature, calculate the set similarity value between feature set I and the current feature set A; and
    令n=n+1重新遍历特征关系数据库中的第n个单基因病名称,直至特征关系数据库中的单基因病名称遍历完毕,将特征集合I与每个特征集合A对应的集合相似度值汇总排序,输出最高相似度值对应的单基因病名称。Let n=n+1 traverse the name of the nth monogenic disease in the feature relational database again, until the monogenic disease name in the feature relational database is traversed, and set the set similarity value corresponding to the feature set I and each feature set A Summarize and sort, and output the name of the single-gene disease corresponding to the highest similarity value.
  2. 根据权利要求1所述的方法,其中,根据单基因病名称的特征关系数据库的方法包括:The method according to claim 1, wherein the method of relation database based on the characteristics of the names of single-gene diseases comprises:
    从单基因病的公共数据库和文献数据库,获得已知的单基因病名称及其对应的标准临床特征;Obtain the names of known monogenic diseases and their corresponding standard clinical features from public databases and literature databases of monogenic diseases;
    基于已知的单基因病名称及其对应的标准临床特征,建立单基因病名称与标准临床特征的特征关系数据库;以及Based on the known names of single-gene diseases and their corresponding standard clinical features, establish a feature relationship database between the names of single-gene diseases and standard clinical features; and
    分别计算每种单基因病名称对应的各标准临床特征对该单基因病的贡献度c i Calculate the contribution c i of each standard clinical feature corresponding to each single-gene disease name to the single-gene disease.
  3. 根据权利要求1或2所述的方法,其中,构建单基因病的标准化临床特征表型树的方法包括:The method according to claim 1 or 2, wherein the method of constructing a standardized clinical feature phenotype tree of a single gene disease comprises:
    从特征关系数据库中获取数据,基于HPO构建单基因病的标准化临床特征表型树;Obtain data from the characteristic relational database, and construct a standardized clinical characteristic phenotype tree of monogenic diseases based on HPO;
    其中所述标准化临床特征表型树由多个干节点和与每个干节点关联的至少一个支节点组成,每个支节点用于表示一个标准化临床特征,每个干节点用于表示关联的标准化临床特征的索引。The standardized clinical feature phenotype tree is composed of multiple stem nodes and at least one branch node associated with each stem node, each branch node is used to represent a standardized clinical feature, and each stem node is used to represent the associated standardization Index of clinical characteristics.
  4. 根据权利要求1至3任一所述的方法,其中,基于标准化临床特征表型树上的节点标记,从特征集合A中匹配出与特征集合I中每个临床特征对应的最佳标准临床特征的方法包括:The method according to any one of claims 1 to 3, wherein the best standard clinical feature corresponding to each clinical feature in feature set I is matched from feature set A based on the node labels on the standardized clinical feature phenotype tree The methods include:
    所述特征集合I包括多个临床特征,所述特征集合A包括多个标准临床特征;The feature set I includes multiple clinical features, and the feature set A includes multiple standard clinical features;
    遍历所述特征集合I中的第i个临床特征,从所述特征集合A中筛选出与所述第i个临床特征相似度最高的标准临床特征,作为与所述第i个临床特征对应的最佳标准临床特征,所述i的初始值为1;以及Traverse the i-th clinical feature in the feature set I, and select the standard clinical feature with the highest similarity to the i-th clinical feature from the feature set A, as the standard clinical feature corresponding to the i-th clinical feature The best standard clinical feature, the initial value of i is 1; and
    令i=i+1后重新遍历所述特征集合I中的第i个临床特征,直至特征集合I中的临床特征遍历完毕,从第n个单基因病名称对应的特征集合A中筛选出与特征集合I中临床特征一一对应的多个最佳标准临床特征。Let i=i+1 and re-traverse the i-th clinical feature in the feature set I until the clinical feature in the feature set I is traversed. From the feature set A corresponding to the name of the n-th monogenic disease, select the The clinical features in feature set I correspond to multiple best standard clinical features one-to-one.
  5. 根据权利要求4所述的方法,其中,从所述特征集合A中筛选出与所述第i个临床特征相似度最高的标准临床特征的方法包括:The method according to claim 4, wherein the method of selecting the standard clinical feature with the highest similarity to the i-th clinical feature from the feature set A comprises:
    遍历所述特征集合A中的第j个标准临床特征,基于已建立的索引判断所述第j个标准临床特征与所述第i个临床特征是否存在相同的干节点B t,所述j的初始值为1; Traverse the j-th standard clinical feature in the feature set A, and determine whether the j-th standard clinical feature and the i-th clinical feature have the same stem node B t based on the established index. The initial value is 1;
    若判断结果为否,则认为所述第j个标准临床特征与所述第i个临床特征的相似度值为零;If the judgment result is no, it is considered that the similarity value between the j-th standard clinical feature and the i-th clinical feature is zero;
    若判断结果为是,基于多层级结构相似度算法计算所述第j个标准临床特征与所述第i个临床特征的相似度值;If the judgment result is yes, calculate the similarity value between the j-th standard clinical feature and the i-th clinical feature based on a multi-level structure similarity algorithm;
    令j=j+1后重新遍历所述特征集合A中的第j个标准临床特征,并继续执行所述第j个标准临床特征与所述第i个临床特征的相似度计算,直至所述特征集合A中的标准临床特征遍历完毕,对应得到与所述特征集合A中标准临床特征一一对应的多个相似度值;以及Let j=j+1, traverse the j-th standard clinical feature in the feature set A again, and continue to perform the similarity calculation between the j-th standard clinical feature and the i-th clinical feature until the The standard clinical features in feature set A are traversed, and multiple similarity values corresponding to the standard clinical features in feature set A are correspondingly obtained; and
    从多个相似度值筛中筛选出最大值对应的标准临床特征作为与第i个临床特征对应的最佳标准临床特征。The standard clinical feature corresponding to the maximum value is selected from multiple similarity value screens as the best standard clinical feature corresponding to the i-th clinical feature.
  6. 根据权利要求5所述的方法,其中,基于多层级结构相似度算法计算所述第j个标准临床特征与所述第i个临床特征的相似度值的方法包括:The method according to claim 5, wherein the method of calculating the similarity value between the j-th standard clinical feature and the i-th clinical feature based on a multi-level structure similarity algorithm comprises:
    基于标准化临床特征表型树上的节点标记,获取第i个临床特征与相同干节点B t连接通路中所有节点的有向集合IB,以及获取第j个标准临床特征相同干节点B t连接通路中所有节点的有向集合AB,所述有向集合IB长度的值为通路中节点的个数L IB,所述有向集合AB长度的值为通路中节点的个数L ABBased on the node labeling on the standardized clinical feature phenotype tree, obtain the directed set IB of all nodes in the path between the i-th clinical feature and the same stem node B t , and obtain the j-th standard clinical feature of the same stem node B t connected path The value of the length of the directed set IB is the number of nodes in the path L IB , and the value of the length of the directed set AB is the number of nodes in the path L AB ;
    提取所述有向集合IB和所述有向集合AB中节点的交集IAB,所述交集IAB长度的值为通路中共有节点的个数L IAB;以及 Extract the intersection IAB of the nodes in the directed set IB and the directed set AB, and the value of the length of the intersection IAB is the number of common nodes in the path L IAB ; and
    采用公式
    Figure PCTCN2020111130-appb-100001
    计算所述第j个标准临床特征与所述第i个临 床特征的相似度值;其中,
    Adopt the formula
    Figure PCTCN2020111130-appb-100001
    Calculate the similarity value between the j-th standard clinical feature and the i-th clinical feature; wherein,
    所述SM表示所述第j个标准临床特征与所述第i个临床特征在表型树多层次间的相似度值;且The SM represents the similarity value between the j-th standard clinical feature and the i-th clinical feature at multiple levels of the phenotype tree; and
    所述SI表示所述第j个标准临床特征与所述第i个临床特征在表型树同层次间的相似度值,所述β为权重系数。The SI represents the similarity value between the j-th standard clinical feature and the i-th clinical feature at the same level of the phenotype tree, and the β is a weighting coefficient.
  7. 根据权利要求6所述的方法,其特征在于,所述SM的计算公式为SM=L IAB/max(L AB,L IB),且所述SI的计算公式为SI=1/(L AB+L IB-2L IAB+1)。 The method according to claim 6, wherein the calculation formula of the SM is SM=L IAB /max(L AB , L IB ), and the calculation formula of the SI is SI=1/(L AB + L IB -2L IAB +1).
  8. 根据权利要求1至7任一所述的方法,其中,根据每个临床特征与对应的最佳标准临床特征的相似度值,计算出特征集合I与当前特征集合A的集合相似度值的方法包括:The method according to any one of claims 1 to 7, wherein the method of calculating the set similarity value of the feature set I and the current feature set A according to the similarity value between each clinical feature and the corresponding best standard clinical feature include:
    利用第i个临床特征的贡献度c i,对特征集合A中与之对应最佳标准临床特征的最大相似度值进行加权处理;以及 Use the contribution degree c i of the i-th clinical feature to weight the maximum similarity value corresponding to the best standard clinical feature in the feature set A; and
    令i=i+1,重新对特征集合A中与第i个临床特征对应的最佳标准临床特征的最大相似度值进行加权处理,直至将特征集合A中筛选出的全部最佳标准临床特征加权处理完毕,累加特征集合A中全部最佳标准临床特征对应的加权最大相似度值,得到特征集合I与当前特征集合A的集合相似度值。Let i=i+1, re-weight the maximum similarity value of the best standard clinical feature corresponding to the i-th clinical feature in feature set A until all the best standard clinical features selected in feature set A are selected After the weighting process is completed, the weighted maximum similarity values corresponding to all the best standard clinical features in the feature set A are accumulated, and the set similarity values of the feature set I and the current feature set A are obtained.
  9. 一种基于多层级结构相似度的单基因病名称推荐系统,包括:A single gene disease name recommendation system based on multi-level structural similarity, including:
    表型树单元,用于根据单基因病名称的特征关系数据库,构建单基因病的标准化临床特征表型树;The phenotype tree unit is used to construct a standardized clinical feature phenotype tree of the single-gene disease according to the feature relation database of the name of the single-gene disease;
    输入单元,用于将用户输入的特征集合I中的临床特征在标准化临床特征表型树上的节点标记;The input unit is used to mark the nodes of the clinical features in the feature set I input by the user on the standardized clinical feature phenotype tree;
    遍历单元,用于遍历特征关系数据库中的第n个单基因病名称,将其对应特征集合A中的标准临床特征在标准化临床特征表型树上的节点标记,所述n的初始值为1;The traversal unit is used to traverse the name of the nth monogenic disease in the feature relational database, and mark the node of the standard clinical feature in the feature set A on the standardized clinical feature phenotype tree, and the initial value of n is 1 ;
    检索单元,基于标准化临床特征表型树上的节点标记,从特征集合A中匹配出与特征集合I中每个临床特征对应的最佳标准临床特征;The retrieval unit, based on the node labels on the standardized clinical feature phenotype tree, matches the best standard clinical feature corresponding to each clinical feature in feature set I from feature set A;
    计算单元,用于根据每个临床特征与对应的最佳标准临床特征的相似度值,计算出特征集合I与当前特征集合A的集合相似度值;The calculation unit is used to calculate the set similarity value between the feature set I and the current feature set A according to the similarity value between each clinical feature and the corresponding best standard clinical feature;
    判断单元,令n=n+1重新响应遍历标记单元,直至特征关系数据库中的单基因病名称遍历完毕;以及The judging unit makes n=n+1 respond again to the traversal marking unit until the traversal of the single-gene disease names in the characteristic relational database is completed; and
    输出单元,用于将特征集合I与每个特征集合A对应的集合相似度值汇总排序,输出最高相似度值对应的单基因病名称。The output unit is used to summarize and sort the set similarity values corresponding to the feature set I and each feature set A, and output the name of the single gene disease corresponding to the highest similarity value.
  10. 一种非易失性计算机可读存储介质上存储有计算机可读指令,其中,所述计算机可读指令被处理器运行时执行上述权利要求1至8任一项所述方法的步骤。A non-volatile computer-readable storage medium stores computer-readable instructions, wherein the computer-readable instructions execute the steps of the method according to any one of claims 1 to 8 when the computer-readable instructions are executed by a processor.
  11. 一种计算机设备,包括存储器和一个或多个处理器,所述存储器中储存有计算机可读指令,其中当所述计算机可读指令被处理器执行时,使得所述一个或多个处理器执行如权利要求1至8任一项所述方法的步骤。A computer device includes a memory and one or more processors. The memory stores computer-readable instructions, wherein when the computer-readable instructions are executed by the processor, the one or more processors are executed The steps of the method according to any one of claims 1 to 8.
PCT/CN2020/111130 2020-02-27 2020-08-25 Monogenic disease name recommendation method and system based on multi-level structural similarity WO2021169203A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010123773.4 2020-02-27
CN202010123773.4A CN111341458B (en) 2020-02-27 2020-02-27 Single-gene disease name recommendation method and system based on multi-level structure similarity

Publications (1)

Publication Number Publication Date
WO2021169203A1 true WO2021169203A1 (en) 2021-09-02

Family

ID=71185714

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/111130 WO2021169203A1 (en) 2020-02-27 2020-08-25 Monogenic disease name recommendation method and system based on multi-level structural similarity

Country Status (2)

Country Link
CN (1) CN111341458B (en)
WO (1) WO2021169203A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115798733A (en) * 2023-01-09 2023-03-14 神州医疗科技股份有限公司 Intelligent auxiliary reasoning system and method for orphan disease

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111341458B (en) * 2020-02-27 2020-11-03 国家卫生健康委科学技术研究所 Single-gene disease name recommendation method and system based on multi-level structure similarity
CN111883223B (en) * 2020-06-11 2021-05-25 国家卫生健康委科学技术研究所 Report interpretation method and system for structural variation in patient sample data

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109119132A (en) * 2018-08-03 2019-01-01 国家卫生计生委科学技术研究所 Method and system based on case history characteristic matching monogenic disease title
CN109215796A (en) * 2018-08-14 2019-01-15 平安医疗健康管理股份有限公司 Searching method, device, computer equipment and storage medium
US20190080051A1 (en) * 2015-11-11 2019-03-14 Northeastern University Methods And Systems For Profiling Personalized Biomarker Expression Perturbations
CN110021364A (en) * 2017-11-24 2019-07-16 上海暖闻信息科技有限公司 Analysis detection system based on patients clinical symptom data and full sequencing of extron group data screening single gene inheritance disease Disease-causing gene
CN111341458A (en) * 2020-02-27 2020-06-26 国家卫生健康委科学技术研究所 Single-gene disease name recommendation method and system based on multi-level structure similarity

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108009040B (en) * 2017-12-12 2021-05-04 杭州时趣信息技术有限公司 Method, system and computer readable storage medium for determining fault root cause
CN109524068A (en) * 2018-10-16 2019-03-26 东华大学 A kind of disease symptoms extracting method based on AC automatic machine

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190080051A1 (en) * 2015-11-11 2019-03-14 Northeastern University Methods And Systems For Profiling Personalized Biomarker Expression Perturbations
CN110021364A (en) * 2017-11-24 2019-07-16 上海暖闻信息科技有限公司 Analysis detection system based on patients clinical symptom data and full sequencing of extron group data screening single gene inheritance disease Disease-causing gene
CN109119132A (en) * 2018-08-03 2019-01-01 国家卫生计生委科学技术研究所 Method and system based on case history characteristic matching monogenic disease title
CN109215796A (en) * 2018-08-14 2019-01-15 平安医疗健康管理股份有限公司 Searching method, device, computer equipment and storage medium
CN111341458A (en) * 2020-02-27 2020-06-26 国家卫生健康委科学技术研究所 Single-gene disease name recommendation method and system based on multi-level structure similarity

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
LI JIANHUA, LI ZHEREN ; KANG YAN ; LI LING: "Review on the Research Progress of Mining of OMIM Data", VACUUM., PERGAMON PRESS., GB, vol. 31, no. 6, 1 December 2014 (2014-12-01), GB, pages 1400 - 1404, XP055840474, ISSN: 0042-207X *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115798733A (en) * 2023-01-09 2023-03-14 神州医疗科技股份有限公司 Intelligent auxiliary reasoning system and method for orphan disease

Also Published As

Publication number Publication date
CN111341458B (en) 2020-11-03
CN111341458A (en) 2020-06-26

Similar Documents

Publication Publication Date Title
WO2021169203A1 (en) Monogenic disease name recommendation method and system based on multi-level structural similarity
US7689555B2 (en) Context insensitive model entity searching
CN100449531C (en) Patient data mining
US11521713B2 (en) System and method for generating clinical trial protocol design document with selection of patient and investigator
WO2022116430A1 (en) Big data mining-based model deployment method, apparatus and device, and storage medium
Giunchiglia et al. A large dataset for the evaluation of ontology matching
WO2021248694A1 (en) Report interpretation method and system for structural variations in sample data of patient
US8600772B2 (en) Systems and methods for interfacing with healthcare organization coding system
CN109378066A (en) A kind of control method and control device for realizing disease forecasting based on feature vector
US20160070751A1 (en) Database management system
WO2022227203A1 (en) Triage method, apparatus and device based on dialogue representation, and storage medium
US20180067986A1 (en) Database model with improved storage and search string generation techniques
CN110019410A (en) For the big data digging system of tcm clinical case information
JP2024027087A (en) Standard medical term management system and method based on general model
CN116541752B (en) Metadata management method, device, computer equipment and storage medium
CN115547466A (en) Medical institution registration and review system and method based on big data
US20170255752A1 (en) Continuous adapting system for medical code look up
CN109997201A (en) For the accurate clinical decision support using data-driven method of plurality of medical knowledge module
US20240143605A1 (en) System And Method For Improved State Identification And Prediction In Computerized Queries
CN110019474B (en) Automatic synonymy data association method and device in heterogeneous database and electronic equipment
JPWO2010001792A1 (en) Database system
US20080177719A1 (en) Methods and systems for retrieving query results based on a data standard specification
CN112258135A (en) Method and device for auditing prescription data and computer-readable storage medium
US8190880B2 (en) Methods and systems for displaying standardized data
US20210272038A1 (en) Healthcare Decision Platform

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20922276

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20922276

Country of ref document: EP

Kind code of ref document: A1