WO2021169088A1 - 用于大规模电子健康档案知识协同约简的最近邻多粒度利润方法 - Google Patents
用于大规模电子健康档案知识协同约简的最近邻多粒度利润方法 Download PDFInfo
- Publication number
- WO2021169088A1 WO2021169088A1 PCT/CN2020/096484 CN2020096484W WO2021169088A1 WO 2021169088 A1 WO2021169088 A1 WO 2021169088A1 CN 2020096484 W CN2020096484 W CN 2020096484W WO 2021169088 A1 WO2021169088 A1 WO 2021169088A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- nearest neighbor
- granularity
- electronic health
- subpopulation
- super
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H10/00—ICT specially adapted for the handling or processing of patient-related medical or healthcare data
- G16H10/60—ICT specially adapted for the handling or processing of patient-related medical or healthcare data for patient-specific data, e.g. for electronic patient records
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/004—Artificial life, i.e. computing arrangements simulating life
- G06N3/006—Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
Definitions
- the present invention relates to the field of intelligent processing of medical information, in particular to a nearest neighbor multi-granularity profit method for collaborative reduction of large-scale electronic health file knowledge.
- Electronic health records are electronic personal health historical records that are formed when people are engaged in medical and health-related activities and have the value of preservation for future reference. After these years of development, my country has accumulated a large amount of medical and health data information in the field of electronic health records.
- the use of artificial intelligence methods to automatically discover hidden medical laws from the rich electronic health file data information is of great significance and value for disease prevention, control and treatment.
- the application of traditional artificial intelligence, machine learning and data mining algorithms is greatly restricted.
- the data training samples processed by traditional data mining algorithms are generally required to not contain a large amount of missing information, that is, the completeness of the data is required.
- Most of the data containing missing information is directly deleted, and most of the processed data types are symbolic.
- numerical data, for fuzzy data it is converted into numerical data for processing.
- the data in large-scale electronic health records often shows a high degree of incompleteness, and there is a considerable proportion of missing data in established electronic health records.
- the value of some attribute columns of electronic health file data is described in descriptive language, which has strong ambiguity. If all fuzzy data is directly converted into numerical or symbolic data, it may cause a large amount of loss of electronic health file information. It even affects the subsequent intelligent auxiliary diagnosis decision-making.
- Multi-granularity computing is one of the strategies that humans usually adopt when solving problems, and it is an important manifestation of human cognitive ability.
- Multi-granularity-based data modeling is to conduct intelligent analysis of complex data by obtaining information granular sets and multiple granular structures, extracting available knowledge from them and forming effective decision-making schemes. If data modeling uses only one granular structure, it is called single-granularity-based data modeling; if multiple granular structures are used, it is called multi-granularity-based data modeling. Multi-granularity-based data analysis can analyze problems from multiple angles and levels, and better obtain more reasonable and satisfactory problem solutions. As one of the important characteristics of human cognition, multi-granularity plays an important role in data mining and knowledge discovery of complex data. Therefore, in the context of medical big data application, an effective multi-granularity collaborative reduction method of knowledge is proposed for the mixed incomplete and fuzzy data in large-scale electronic health records, which has important significance and value for large-scale electronic health records decision support analysis.
- the purpose of the present invention is to disclose a method that reduces the execution time, improves the accuracy of the large-scale electronic health file knowledge collaborative reduction, and reduces the complexity cost of the large-scale electronic health file knowledge collaborative reduction on the cloud computing Spark cloud platform , Lay a good foundation for the development of intelligent services such as electronic health record feature selection, rule mining and clinical decision support. A nearest neighbor multi-granular profit method for large-scale electronic health record knowledge collaborative reduction.
- the invention discloses a nearest neighbor multi-granularity profit method for collaborative reduction of large-scale electronic health file knowledge, which includes the following steps:
- step B the specific steps of step B are as follows:
- the shared nearest neighbor vector is used to represent the nearest neighbor radius set in the d i-th layer as:
- tf(R j ) is the frequency of occurrence of the nearest neighbor radius R j in the di-th layer
- df(R j ) is the hierarchical frequency of the weight vector w j in the nearest neighbor radius R j
- corr (f i, f j ) represents an inner product operation f i and f j two feature vectors
- Df (R i R j) is the nearest neighbor vector contains the total number of nearest neighbors radius of R i and R j
- df (R j) is a vector of weights w j level nearest neighbor frequencies of radius R j;
- ⁇ i is the number of Super-Elitist i in the i-th nearest neighbor radius used for knowledge reduction in the i-th electronic health record data subset.
- step C is as follows:
- Granu-Subpopulation i s super elite matrix, Is the trust degree between the nearest neighbor radius R i and R j at the kth iteration;
- the present invention has the following advantages:
- the present invention can support large-scale electronic health records to parallelize knowledge collaborative reduction on multiple nodes.
- Super elites perform knowledge reduction tasks in their respective multi-granularity sub-populations, which greatly reduces the execution time and improves large-scale electronic health records. The accuracy of the collaborative reduction of health file knowledge.
- the nearest neighbor multi-granularity profit method proposed in the present invention divides and stores large-scale electronic health files in multiple evolutionary subpopulations Granu-Subpopulation i , which reduces the knowledge reduction of large-scale electronic health files on the cloud computing Spark cloud platform.
- the complexity cost has laid a good foundation for the development of intelligent services such as feature selection of electronic health records, rule mining, and clinical decision support.
- the present invention can efficiently obtain the knowledge collaborative reduction set of incomplete and fuzzy data in a large-scale electronic health file, which has very important significance and value for the large-scale electronic health file decision support analysis.
- Figure 1 is the overall flow chart of the system
- Figure 2 is a diagram of the dynamic execution process of the nearest neighbor multi-granularity profit model
- the present invention discloses a nearest neighbor multi-granularity profit method for large-scale electronic health file knowledge collaborative reduction, including the following steps:
- step B The specific steps of step B are as follows:
- the shared nearest neighbor vector is used to represent the nearest neighbor radius set in the d i-th layer as:
- tf(R j ) is the frequency of occurrence of the nearest neighbor radius R j in the di-th layer
- df(R j ) is the hierarchical frequency of the weight vector w j in the nearest neighbor radius R j
- corr (f i, f j ) represents an inner product operation f i and f j two feature vectors
- Df (R i R j) is the nearest neighbor vector contains the total number of nearest neighbors radius of R i and R j
- df (R j) is a vector of weights w j level nearest neighbor frequencies of radius R j;
- ⁇ i is the i-th nearest neighbor radius used for the i-th electronic health record data subset to know
- step C The specific steps of step C are as follows:
- Granu-Subpopulation i s super elite matrix, Is the trust degree between the nearest neighbor radius R i and R j at the kth iteration;
- the present invention can support large-scale electronic health files to parallelize knowledge collaborative reduction on multiple nodes, and super elites perform knowledge reduction tasks in their respective multi-granularity sub-populations, which greatly reduces the execution time and improves large-scale electronic health files.
- the accuracy rate of knowledge collaborative reduction is the
- the nearest neighbor multi-granularity profit method proposed in the present invention divides and stores large-scale electronic health records in multiple evolutionary subpopulations Granu-Subpopulation i , and reduces the complexity of large-scale electronic health file knowledge reduction on the cloud computing Spark cloud platform Costs have laid a good foundation for the development of intelligent services such as feature selection of electronic health records, rule mining, and clinical decision support; it can efficiently obtain knowledge reduction sets of incomplete and fuzzy data in large-scale electronic health records, which is very useful for large-scale electronic health records.
- the health file decision support analysis has very important meaning and value; the present invention will not be limited to the embodiments shown in this article, but should conform to the widest scope consistent with the principles and novel features disclosed in this article.
- the present invention uses the above-mentioned embodiments to illustrate the implementation method and device structure of the present invention, but the present invention is not limited to the above-mentioned embodiments, which does not mean that the present invention must rely on the above-mentioned methods and structures to be implemented.
- any improvement to the present invention, equivalent replacement of the selected implementation method of the present invention, addition of steps, selection of specific methods, etc. fall within the scope of protection and disclosure of the present invention.
- the present invention is not limited to the above-mentioned embodiments, and all the ways to achieve the objects of the present invention by adopting structures and methods similar to those of the present invention fall within the protection scope of the present invention.
Landscapes
- Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Biophysics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Epidemiology (AREA)
- Medical Informatics (AREA)
- Primary Health Care (AREA)
- Public Health (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Medical Treatment And Welfare Office Work (AREA)
- Measuring And Recording Apparatus For Diagnosis (AREA)
Abstract
Description
Claims (3)
- 用于大规模电子健康档案知识协同约简的最近邻多粒度利润方法,其特征在于:具体步骤如下:A.在大数据Spark云平台上将大规模电子健康档案数据集分割至不同的多粒度进化子种群Granu-Subpopulation i中,i=1,2,…,N,N为多粒度进化子种群总个数,这样大规模电子健康档案数据集知识约简任务分解为多个并行化多粒度进化子种群的知识协同约简任务,分别计算出多粒度进化子种群所分配的电子健康档案数据集候选等价类;B.设计一种最近邻多粒度利润模型,将第i个多粒度进化子种群个Granu-Subpopulation i用于大规模电子健康档案第i个数据子集的知识约简,同时在多粒度进化种群Granu-Subpopulation i中根据适应度的大小,选择适应度值最大的超级精英Super-Elitist i和适应度值最小的普通精英Ordinary-Elitist i,求出共享最近邻域向量的相似度Sim(m,n)和共享最近邻利润向量ζ(e),并在最近邻半径的第d i层中构造协同化的最近邻向量;C.构建多粒度精英矩阵Gp i,计算多粒度子种群Granu-Subpopulation i中精英矩阵Gp i的最近邻多粒度利润权重,得到其相应的权重利润矩阵Γ(e),执行超级精英权重利润矩阵自适应动态调整策略,求得各超级精英在各自多粒度子种群内利润权重 然后分配给进行大规模电子健康档案数据子集知识协同约简的各个多粒度子种群Granu-Subpopulation i中超级精英Super-Elitist i;E.比较上述求出的大规模电子健康档案知识协同约简集精度EHR与预先设定精度值λ关系,若满足EHR≥λ,则输出大规模电子健康档案最优知识协同约简集。否则,继续执行上述C和D步骤,直至大规模电子健康档案知识协同约简精度满足EHR≥λ;F.求出大规模电子健康档案数据知识协同约简集及其核属性,并将电子健康档案相关知识约简集存储至Spark云平台,为大规模电子健康档案决策支持分析提供重要的智能辅助诊断依据。
- 根据权利要求1所述一种用于大规模电子健康档案知识协同约简的最近邻多粒度利润方法,其特征在于:所述步骤B的具体步骤如下:a.采用共享最近邻域向量表示第d i层中最近邻半径集为:d i={w 1,w 2,...,w j,...,w m},w j=(1+log tf(R j))*log(1+n/df(R j)),其中tf(R j)为第d i层中最近邻域半径R j的出现频率,df(R j)为权重向量w j在最近邻域半径R j的层次频率;b.构造一个N i×N i的矩阵C i,其中N i是第d i层中最近邻域半径数量,则最近半径R i和R j之间共享权重C i(i,j)定义如下:C i(i,j)=corr(f i,f j),其中f i和f j分别对应于最近邻半径R i和R j的特征向量,corr(f i,f j)表示f i和f j两个特征向量的内积操作;f.求出共享最近邻利润向量ζ(e),计算公式如下:g.计算最近邻半径R i和R j之间的自适应利润补偿权重f i j如下:f i j=Df(R iR j)/df(R j),其中Df(R iR j)为最近邻域向量包含最近邻域半径R i和R j的总数量,df(R j)为权重向量w j在最近邻域半径R j的层次频率;h.在最近邻半径的第d i层中构造协同化最近邻向量f m,f n,f p,f t,分别如下:其中ξ i为第i个最近邻半径中用于第i个电子健康档案数据子集进行知识约简的超级精英Super-Elitist i数量。
- 根据权利要求1所述一种用于大规模电子健康档案知识协同约简的最近邻多粒度利润方法,其特征在于:所述步骤C的具体步骤如下:a.在第i个多粒度进化子种群Granu-Subpopulation i中,将最近邻半径矩阵表示成两个张量 和 然后将它们合并到多粒度子种群Granu-Subpopulation i的超级精英矩阵集Gp i中,其中i=1,2,…,N;b.计算超级精英矩阵中相邻张量之间的平均共享相似度,计算公式如下:c.计算多粒度子种群Granu-Subpopulation i中超级精英矩阵Gp i的最近邻多粒度利润权重,计算公式如下:d.构造子种群Granu-Subpopulation i的多粒度染色体,其包括m个超级精英,相应的权重利润矩阵Γ(e)定义如下:e.更新超级精英Super-Elitist i的权重,在大规模电子健康档案数据子集知识协同约简过程中如果多粒度子种群Granu-Subpopulation i中超级精英 矩阵的势||Gp i||大于 N为多粒度进化子种群总个数,则超级精英权重 将相应增加,自适应动态调整公式如下:其中||Γ(e)||为权重利润矩阵Γ(e)的势,η i是控制超级精英Super-Elitist i的动态权重参数,其公式定义如下:
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
AU2020331559A AU2020331559A1 (en) | 2020-02-25 | 2020-06-17 | Nearest-neighbor multi-granularity profit method for collaborative knowledge reduction of large-scale electronic health records |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010117158.2 | 2020-02-25 | ||
CN202010117158.2A CN111354427B (zh) | 2020-02-25 | 2020-02-25 | 用于大规模电子健康档案知识协同约简的最近邻多粒度利润方法 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2021169088A1 true WO2021169088A1 (zh) | 2021-09-02 |
Family
ID=71195847
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2020/096484 WO2021169088A1 (zh) | 2020-02-25 | 2020-06-17 | 用于大规模电子健康档案知识协同约简的最近邻多粒度利润方法 |
Country Status (3)
Country | Link |
---|---|
CN (1) | CN111354427B (zh) |
AU (1) | AU2020331559A1 (zh) |
WO (1) | WO2021169088A1 (zh) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114023063A (zh) * | 2021-11-02 | 2022-02-08 | 大连理工大学 | 一种基于认知网络的智能交通系统协同决策方法 |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110178964A1 (en) * | 2010-01-21 | 2011-07-21 | National Cheng Kung University | Recommendation System Using Rough-Set and Multiple Features Mining Integrally and Method Thereof |
CN103838972A (zh) * | 2014-03-13 | 2014-06-04 | 南通大学 | 一种用于mri病历属性约简的量子协同博弈实现方法 |
CN104915430A (zh) * | 2015-06-15 | 2015-09-16 | 南京邮电大学 | 一种基于MapReduce的约束关系粗糙集规则获取方法 |
CN107256342A (zh) * | 2017-06-15 | 2017-10-17 | 南通大学 | 用于电子病历知识约简效能评估的多种群协同熵级联方法 |
CN108986872A (zh) * | 2018-06-21 | 2018-12-11 | 南通大学 | 用于大数据电子病历约简的多粒度属性权重Spark方法 |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6263334B1 (en) * | 1998-11-11 | 2001-07-17 | Microsoft Corporation | Density-based indexing method for efficient execution of high dimensional nearest-neighbor queries on large databases |
CN104933156A (zh) * | 2015-06-25 | 2015-09-23 | 西安理工大学 | 一种基于共享近邻聚类的协同过滤方法 |
CN108447534A (zh) * | 2018-05-18 | 2018-08-24 | 灵玖中科软件(北京)有限公司 | 一种基于nlp的电子病历数据质量管理方法 |
-
2020
- 2020-02-25 CN CN202010117158.2A patent/CN111354427B/zh active Active
- 2020-06-17 WO PCT/CN2020/096484 patent/WO2021169088A1/zh active Application Filing
- 2020-06-17 AU AU2020331559A patent/AU2020331559A1/en not_active Abandoned
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110178964A1 (en) * | 2010-01-21 | 2011-07-21 | National Cheng Kung University | Recommendation System Using Rough-Set and Multiple Features Mining Integrally and Method Thereof |
CN103838972A (zh) * | 2014-03-13 | 2014-06-04 | 南通大学 | 一种用于mri病历属性约简的量子协同博弈实现方法 |
CN104915430A (zh) * | 2015-06-15 | 2015-09-16 | 南京邮电大学 | 一种基于MapReduce的约束关系粗糙集规则获取方法 |
CN107256342A (zh) * | 2017-06-15 | 2017-10-17 | 南通大学 | 用于电子病历知识约简效能评估的多种群协同熵级联方法 |
CN108986872A (zh) * | 2018-06-21 | 2018-12-11 | 南通大学 | 用于大数据电子病历约简的多粒度属性权重Spark方法 |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114023063A (zh) * | 2021-11-02 | 2022-02-08 | 大连理工大学 | 一种基于认知网络的智能交通系统协同决策方法 |
Also Published As
Publication number | Publication date |
---|---|
AU2020331559A1 (en) | 2021-09-09 |
CN111354427B (zh) | 2022-04-29 |
CN111354427A (zh) | 2020-06-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Razi et al. | A comparative predictive analysis of neural networks (NNs), nonlinear regression and classification and regression tree (CART) models | |
Guo et al. | Breaking the curse of space explosion: Towards efficient nas with curriculum search | |
Rahman et al. | Discretization of continuous attributes through low frequency numerical values and attribute interdependency | |
CN109902192B (zh) | 基于无监督深度回归的遥感图像检索方法、系统、设备及介质 | |
Hu et al. | A niching backtracking search algorithm with adaptive local search for multimodal multiobjective optimization | |
CN113693563A (zh) | 一种基于超图注意力网络的脑功能网络分类方法 | |
Biswas et al. | Hybrid expert system using case based reasoning and neural network for classification | |
Bouchachia et al. | Towards incremental fuzzy classifiers | |
WO2021169088A1 (zh) | 用于大规模电子健康档案知识协同约简的最近邻多粒度利润方法 | |
WO2021082444A1 (zh) | 用于大规模脑病历分割的多粒度Spark超信任模糊方法 | |
Zhang et al. | An enhanced grey wolf optimizer boosted machine learning prediction model for patient-flow prediction | |
Hu et al. | Differential evolution based on network structure for feature selection | |
Jain | Introduction to data mining techniques | |
JP7207128B2 (ja) | 予測システム、予測方法、および予測プログラム | |
CN108446740B (zh) | 一种用于脑影像病历特征提取的多层一致协同方法 | |
Hong et al. | A novel and efficient neuro-fuzzy classifier for medical diagnosis | |
Tarle et al. | Improved artificial neural network for dimension reduction in medical data classification | |
Eick et al. | Learning Bayesian classification rules through genetic algorithms | |
Farhadi et al. | Leveraging Meta-Learning To Improve Unsupervised Domain Adaptation | |
Chen et al. | Intelligent Fuzzy Optimization Algorithm for Data Set Information Clustering Patterns Based on Data Mining and IoT | |
CN116718198B (zh) | 基于时序知识图谱的无人机集群的路径规划方法及系统 | |
Mostofi et al. | Data mining and diagnosis of heart diseases: a hybrid approach to the b-mine algorithm and association rules | |
Dong et al. | Applications in Various Decision Problems | |
Vivek et al. | Novel Machine Learning-based Soil Characteristic Analysis | |
Huang et al. | A revised MCDM approach for determining criteria weights: the combination of Bayesian BWM and fuzzy DEMATEL |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
ENP | Entry into the national phase |
Ref document number: 2020331559 Country of ref document: AU Date of ref document: 20200617 Kind code of ref document: A |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20922369 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20922369 Country of ref document: EP Kind code of ref document: A1 |