WO2021082444A1 - Multi-granulation spark-based super-trust fuzzy method for large-scale brain medical record segmentation - Google Patents

Multi-granulation spark-based super-trust fuzzy method for large-scale brain medical record segmentation Download PDF

Info

Publication number
WO2021082444A1
WO2021082444A1 PCT/CN2020/094104 CN2020094104W WO2021082444A1 WO 2021082444 A1 WO2021082444 A1 WO 2021082444A1 CN 2020094104 W CN2020094104 W CN 2020094104W WO 2021082444 A1 WO2021082444 A1 WO 2021082444A1
Authority
WO
WIPO (PCT)
Prior art keywords
granularity
super
population
center
elite
Prior art date
Application number
PCT/CN2020/094104
Other languages
French (fr)
Chinese (zh)
Inventor
丁卫平
丁嘉陆
王杰华
胡彬
陈森博
万杰
赵理莉
孙颖
冯志豪
李铭
任龙杰
丁帅荣
Original Assignee
南通大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 南通大学 filed Critical 南通大学
Priority to AU2020286320A priority Critical patent/AU2020286320B2/en
Publication of WO2021082444A1 publication Critical patent/WO2021082444A1/en

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H10/00ICT specially adapted for the handling or processing of patient-related medical or healthcare data
    • G16H10/60ICT specially adapted for the handling or processing of patient-related medical or healthcare data for patient-specific data, e.g. for electronic patient records
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/004Artificial life, i.e. computing arrangements simulating life
    • G06N3/006Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]

Definitions

  • the medical health service big data project not only requires the construction of electronic health records and electronic medical records databases, but also a medical health management and service big data application system covering public health, medical services, medical security, drug supply, family planning and integrated management services.
  • a medical health management and service big data application system covering public health, medical services, medical security, drug supply, family planning and integrated management services.
  • we need to make full use of various information technologies such as big data, cloud computing and mobile Internet to promote the effective interoperability of electronic medical record databases and electronic health record databases, and Realize benign interaction to implement the big data project of medical and health services.
  • the present invention discloses a multi-granular Spark super-trust fuzzy method for large-scale brain medical record segmentation.
  • the large-scale brain medical record data attribute set is divided into different multi-granular evolutionary subpopulations Granu-population i on the Spark cloud platform; Design a super-trust model based on multi-granularity Spark to build trust among different super elites in multi-granularity populations; adjust multi-granularity center threshold, use multi-granularity sub-population balance adjustment strategy for super elites to dynamically update, and large-scale brain disease records
  • the invention can stably segment a large-scale brain disease history knowledge reduction collection, and provide an important diagnosis basis for the intelligent diagnosis and auxiliary treatment of brain diseases.
  • step B the specific steps of step B are as follows:
  • the population trust between the h-th multi-granularity population and the u-th multi-granularity population center is Calculated as follows:
  • is the similarity threshold, and the range is ⁇ [0,1], then the multi-granularity population conforms to the subpopulation trust relationship in different granular spaces;
  • is the confidence factor of the direct trust between super elites.
  • the value of ⁇ is related to the number of super elite interactions. The greater the number of interactions, the greater the value of ⁇ , 0 ⁇ 1.
  • the size of the large-scale brain disease record attribute set is determined by different granularity spaces.
  • the neutron population trust relationship is dynamically updated iteratively.
  • step C A further improvement of the present invention lies in: the specific steps of step C are as follows:
  • the distance between the particle size center c 1 and the initial particle size center c 0 after the first iteration of the particle size subpopulation is d(c 1 , c 0 ), and the new particle size center c′ and the original particle size center after the i-th iteration
  • step E the specific steps of the step E are as follows:
  • the global optimal consensus probability of obtaining all super elites is t ⁇ 1,2,...,s ⁇ , construct the optimal consistent equilibrium degree and probability degree pair of large-scale brain disease record attribute segmentation as t ⁇ 1,2,...,s ⁇ ;
  • the present invention constructs a multi-granularity population super-elite dynamic cooperative operation mechanism on the Spark cloud platform based on the dynamic elite dominant area, and achieves the optimal and consistent balance of large-scale brain medical record segmentation, and reduces the complexity cost of large-scale brain medical record feature segmentation. It further improves the granularity and robustness of large-scale parallel feature extraction of brain medical records on the cloud computing Spark cloud platform, and lays a good foundation for the development of intelligent services such as brain medical record feature selection, rule mining, and clinical decision support.
  • Figure 1 is the overall flow chart of the system
  • Figure 3-5 is a diagram of the dynamic fuzzy collaborative operation process of multi-granularity population super elites
  • n is the total number of elites
  • SP i is the i-th super elite
  • P ij is the j-th ordinary elite in the i-th multi-granularity population
  • Re ij is the credibility of the i-th super elite to the j-th super elite
  • R mj is the partial trust recommended by the m-th ordinary elite in the population to the j-th super elite
  • I(j) is the The set of all elites in j multi-granularity populations GP j ,
  • is the potential of the set;
  • the population trust between the h-th multi-granularity population and the u-th multi-granularity population center is Calculated as follows:
  • is the similarity threshold, and the range is ⁇ [0,1], then the multi-granularity population conforms to the subpopulation trust relationship in different granular spaces;
  • is the confidence factor of the direct trust between super elites.
  • the value of ⁇ is related to the number of super elite interactions. The greater the number of interactions, the greater the value of ⁇ , 0 ⁇ 1.
  • the size of the large-scale brain disease record attribute set is determined by different granularity spaces.
  • the neutron population trust relationship is dynamically updated iteratively.
  • the distance between the particle size center c 1 and the initial particle size center c 0 after the first iteration of the particle size subpopulation is d(c 1 , c 0 ), and the new particle size center c′ and the original particle size center after the i-th iteration
  • the invention adopts a multi-granular Spark super trust model to construct trust between different super elites in a multi-granular population, uses different multi-granular sub-population balance adjustment strategies for super elites to dynamically update, and performs global search and segmentation of large-scale brain disease records With local refined segmentation, super elites can collaboratively extract knowledge reduction subsets in their respective regions, which greatly reduces the execution time and improves the accuracy of large-scale brain medical record segmentation.
  • the present invention constructs a multi-granularity population super-elite dynamic cooperative operation mechanism on the Spark cloud platform based on the dynamic elite dominant area, achieves the optimal and consistent balance of large-scale brain disease record segmentation, reduces the complexity cost of large-scale brain disease record feature segmentation, and further improves
  • the fine-grained and robustness of large-scale parallel feature extraction of brain medical records on the cloud computing Spark cloud platform has laid a good foundation for the development of intelligent services such as brain medical record feature selection, rule mining, and clinical decision support.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Computation (AREA)
  • Primary Health Care (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Medical Informatics (AREA)
  • Data Mining & Analysis (AREA)
  • Public Health (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Epidemiology (AREA)
  • Measuring And Recording Apparatus For Diagnosis (AREA)
  • Magnetic Resonance Imaging Apparatus (AREA)

Abstract

A multi-granulation Spark-based super-trust fuzzy method for large-scale brain medical record segmentation, comprising: first, segmenting a large-scale brain medical record data attribute set into different multi-granulation evolutionary subpopulations (Granu-populationi) on a Spark cloud platform; designing a multi-granulation Spark-based super-trust model to construct trust between different super elitists in multi-granulation populations; adjusting a multi-granulation center threshold, and dynamically updating the super elitists using a multi-granulation subpopulation balance adjustment strategy, performing global search segmentation and local refinement segmentation on large-scale brain medical records, wherein super elitists can collaboratively extract knowledge reduction subsets in respective regions; and finally, obtaining the optimal large-scale brain medical record segmentation characteristic set and storing same on the Spark cloud platform. By means of the present method, stable segmentation can be implemented on large-scale brain medical record knowledge reduction sets to provide important diagnostic basis for intelligent diagnosis and auxiliary treatment of brain diseases.

Description

用于大规模脑病历分割的多粒度Spark超信任模糊方法Multi-granular Spark super-trust fuzzy method for large-scale brain medical record segmentation 技术领域:Technical field:
本发明涉及到医学信息领域,具体来说涉及一种用于大规模脑病历分割的多粒度Spark超信任模糊方法。The present invention relates to the field of medical information, in particular to a multi-granularity Spark super-trust fuzzy method for large-scale brain medical record segmentation.
背景技术:Background technique:
医疗健康服务大数据工程不仅需要构建电子健康档案和电子病历数据库,而且要建设覆盖公共卫生、医疗服务、医疗保障、药品供应、计划生育和综合管理业务的医疗健康管理和服务大数据应用体系。我们在现有的医疗资源条件下,要达成医疗健康服务大数据工程目标,需要充分利用大数据、云计算和移动互联网等多种信息技术,促进电子病历数据库和电子健康档案数据库有效互通,并实现良性互动以实施医疗健康服务大数据工程。The medical health service big data project not only requires the construction of electronic health records and electronic medical records databases, but also a medical health management and service big data application system covering public health, medical services, medical security, drug supply, family planning and integrated management services. Under the existing medical resources, to achieve the goal of the big data project of medical and health services, we need to make full use of various information technologies such as big data, cloud computing and mobile Internet to promote the effective interoperability of electronic medical record databases and electronic health record databases, and Realize benign interaction to implement the big data project of medical and health services.
随着云计算和大数据时代的来临,大规模电子病历智能处理在整个医疗大数据产生和使用过程中异常复杂,电子病历系统中存储的医疗数据具有容量大、来源分散、格式多样、存取速度快以及应用价值高等特征。采用一些人工智能和数据挖掘技术来有效发现和提取出大规模电子病历中重要的医学诊断规则和知识是形成临床决策支持系统的关键,但由于电子病历系统是一个特殊的医疗信息系统,电子病历系统中存储的医学数据具有海量、多样、不完备和时效等复杂特性,给其特征选择、协同服务、知识发现及临床决策支持服务等带来了较大困难。对复杂大规模电子病历如何有效处理是设计面向未来医疗健康服务大数据工程和临床智能决策分析服务系统的关键。结合大规模电子病历系统自身特点,采用一些高效的模型和方法进行复杂医疗病历知识约简处理是未来发展的趋势。With the advent of cloud computing and big data era, large-scale electronic medical record intelligent processing is extremely complicated in the entire process of generating and using medical big data. The medical data stored in the electronic medical record system has large capacity, scattered sources, diverse formats, and access Features such as high speed and high application value. Using some artificial intelligence and data mining techniques to effectively discover and extract important medical diagnosis rules and knowledge in large-scale electronic medical records is the key to forming a clinical decision support system. However, because the electronic medical record system is a special medical information system, electronic medical records The medical data stored in the system has complex characteristics such as massive, diverse, incomplete, and time-sensitive, which brings great difficulties to its feature selection, collaborative services, knowledge discovery, and clinical decision support services. How to effectively process complex and large-scale electronic medical records is the key to designing future-oriented medical and health service big data engineering and clinical intelligent decision analysis service system. Combining the characteristics of the large-scale electronic medical record system, adopting some efficient models and methods to reduce the knowledge of complex medical records is the trend of future development.
利用人工智能和大数据处理方法从大规模脑病历数据中自动分割脑属性,发现潜在的医学规律、对脑疾病的预防、控制和治疗具有重要的作用。大规模脑病历分割问题广泛存在于脑病历特征选择、规则挖掘和临床决策支持系统等研究中,是医疗大数据背景下脑病历智能应用的核心技术。因此迫切需要考虑给出云计算环境下有效方法来解决大规模脑病历分割问题,进一步提高海量脑病历智能处理和服务模式,这是目前医疗大数据背景下脑病历智能辅助诊断治疗以及临床决策支持系统研究中急需解决的关键问题,同时也是脑病历领域中具有挑战性的研究课题。然而由于大规模脑病历高度的不完备性和取值模糊性,导致了脑病历数据属性非真实性特征更加鲜明、不确定性更加显著,大大限制了传统属性分割方法的应用。因此在医疗大数据环境下,针对大规模脑病历的特征提出有效的分割方法,取得脑病历分割中全局搜索约简与局部精化知识协同约简最优一致均衡,对大规模脑病历决策支持分析具有非常重要的意义与价值。Using artificial intelligence and big data processing methods to automatically segment brain attributes from large-scale brain medical records, discover potential medical laws, and play an important role in the prevention, control and treatment of brain diseases. Large-scale brain medical record segmentation problems widely exist in the research of brain medical record feature selection, rule mining and clinical decision support system. It is the core technology of brain medical record intelligent application under the background of medical big data. Therefore, there is an urgent need to consider providing effective methods under the cloud computing environment to solve the problem of large-scale brain medical record segmentation, and further improve the intelligent processing and service mode of massive brain medical records. This is the current intelligent auxiliary diagnosis and treatment of brain medical records and clinical decision support under the background of medical big data. The key issues that need to be solved urgently in system research are also challenging research topics in the field of brain medical records. However, due to the high degree of incompleteness and ambiguity of large-scale brain medical records, the non-authentic characteristics of brain medical records data are more distinctive and the uncertainty is more obvious, which greatly limits the application of traditional attribute segmentation methods. Therefore, in the medical big data environment, an effective segmentation method is proposed for the characteristics of large-scale brain medical records, and the optimal and consistent balance of global search reduction and local refined knowledge collaborative reduction in brain medical record segmentation is obtained, which supports large-scale brain medical records decision-making. Analysis has very important meaning and value.
本发明公开一种用于大规模脑病历分割的多粒度Spark超信任模糊方法,首先在Spark云平台上将大规模脑病历数据属性集分割至不同的多粒度进化子种群Granu-population i中;设计一种基于多粒度Spark超信任模型,构建多粒度种群内不同超级精英之间信任度;调整多粒度中心阈值,对超级精英使用多粒度子种群均衡调整策略进行动态更新,对大规模脑病历进行全局搜索分割与局部精化分割,超级精英在各自区域内能协同 提取知识约简子集;最后求得大规模脑病历最优分割特征集
Figure PCTCN2020094104-appb-000001
并存储至Spark云平台中。本发明能稳定分割大规模脑病历知识约简集,为脑部疾病智能诊断和辅助治疗提供重要的诊断依据。
The present invention discloses a multi-granular Spark super-trust fuzzy method for large-scale brain medical record segmentation. Firstly, the large-scale brain medical record data attribute set is divided into different multi-granular evolutionary subpopulations Granu-population i on the Spark cloud platform; Design a super-trust model based on multi-granularity Spark to build trust among different super elites in multi-granularity populations; adjust multi-granularity center threshold, use multi-granularity sub-population balance adjustment strategy for super elites to dynamically update, and large-scale brain disease records Perform global search segmentation and local refinement segmentation, super elites can collaboratively extract knowledge reduction subsets in their respective regions; finally, obtain the optimal segmentation feature set of large-scale brain medical records
Figure PCTCN2020094104-appb-000001
And stored in the Spark cloud platform. The invention can stably segment a large-scale brain disease history knowledge reduction collection, and provide an important diagnosis basis for the intelligent diagnosis and auxiliary treatment of brain diseases.
本发明的进一步改进在于:所述步骤B的具体步骤如下:The further improvement of the present invention lies in: the specific steps of step B are as follows:
a.设置多粒度种群个数为n,且n≥2,初始化多粒度种群为GP h且h∈{1,...,n}; a. Set the number of multi-granularity populations to n, and n≥2, and initialize the multi-granularity population to GP h and h∈{1,...,n};
b.初始化第一个粒度种群的中心为
Figure PCTCN2020094104-appb-000002
然后初始化第二个粒度种群的中心为
Figure PCTCN2020094104-appb-000003
将其作为超级精英的优先权
Figure PCTCN2020094104-appb-000004
b. Initialize the center of the first granularity population as
Figure PCTCN2020094104-appb-000002
Then initialize the center of the second granularity population as
Figure PCTCN2020094104-appb-000003
Make it a priority for the super elite
Figure PCTCN2020094104-appb-000004
c.对于第3个及其以上的多粒度种群中心
Figure PCTCN2020094104-appb-000005
计算当前精英优先权
Figure PCTCN2020094104-appb-000006
与所有当前粒度种群的中心最小距离,计算公式如下:
c. For the 3rd and above multi-granularity population centers
Figure PCTCN2020094104-appb-000005
Calculate current elite priority
Figure PCTCN2020094104-appb-000006
The minimum distance from the center of all current particle size populations is calculated as follows:
Figure PCTCN2020094104-appb-000007
Figure PCTCN2020094104-appb-000007
将该最小距离分配给第u个多粒度种群中心
Figure PCTCN2020094104-appb-000008
重复执行这个过程直至n个多粒度进化种群全部初始化;
Assign the minimum distance to the u-th multi-granularity population center
Figure PCTCN2020094104-appb-000008
Repeat this process until all n multi-granularity evolutionary populations are initialized;
d.在相同粒度子种群中第i个超级精英的信任度定义如下:d. The trust degree of the i-th super elite in the same granular subpopulation is defined as follows:
Figure PCTCN2020094104-appb-000009
Figure PCTCN2020094104-appb-000009
其中n是精英总数,SP i为第i个超级精英,P ij为在第i个多粒度种群中第j个普通精英; Where n is the total number of elites, SP i is the i-th super elite, and P ij is the j-th ordinary elite in the i-th multi-granularity population;
e.计算第i个超级精英SP i在第h个多粒度种群中心
Figure PCTCN2020094104-appb-000010
中的信任度R i,其迭代计算公式如下:
e. Calculate the i-th super elite SP i at the h-th multi-granularity population center
Figure PCTCN2020094104-appb-000010
The iterative calculation formula of the trust degree R i in is as follows:
Figure PCTCN2020094104-appb-000011
Figure PCTCN2020094104-appb-000011
其中i∈{2,...,N},
Figure PCTCN2020094104-appb-000012
Where i∈{2,...,N},
Figure PCTCN2020094104-appb-000012
f.设多粒度种群中心
Figure PCTCN2020094104-appb-000013
之间相似度在当前的循环次数为t,t∈{2,...,n-1},每一个多粒度种群中心
Figure PCTCN2020094104-appb-000014
的信任度由上一轮第t-1次迭代计算出来,这样大规模脑病历属性集的规模大小将通过不同粒度空间中子种群信任度关系进行动态迭代更新;
f. Set up a multi-granularity population center
Figure PCTCN2020094104-appb-000013
The number of similarities between the current cycles is t,t∈{2,...,n-1}, and each multi-granularity population center
Figure PCTCN2020094104-appb-000014
The trust degree of is calculated from the t-1 iteration of the previous round, so that the size of the large-scale brain disease record attribute set will be dynamically updated iteratively through the sub-population trust relationship in different granular spaces;
g.计算多粒度种群中不同超级精英SP i和SP j信任度间的信任偏差Diff ij,计算公式为 g. Calculate the trust deviation Diff ij between the trust degrees of different super elites SP i and SP j in the multi-granularity population, the calculation formula is
Figure PCTCN2020094104-appb-000015
Figure PCTCN2020094104-appb-000015
式中Re ij为第i个超级精英对第j个超级精英的信誉度,R mj为种群中任选第m个普通精英对第j个超级精英推荐的局部信任度,I(j)为第j个多粒度种群GP j中所有精英集合,|I(j)|为该集合的势; Where Re ij is the credibility of the i-th super elite to the j-th super elite, R mj is the partial trust recommended by the m-th ordinary elite in the population to the j-th super elite, and I(j) is the The set of all elites in j multi-granularity populations GP j , |I(j)| is the potential of the set;
h.第h个多粒度种群和第u个多粒度种群中心之间的种群信任度为
Figure PCTCN2020094104-appb-000016
计算公式如下:
h. The population trust between the h-th multi-granularity population and the u-th multi-granularity population center is
Figure PCTCN2020094104-appb-000016
Calculated as follows:
Figure PCTCN2020094104-appb-000017
Figure PCTCN2020094104-appb-000017
其中m为迭代的次数,
Figure PCTCN2020094104-appb-000018
是两个多粒度种群第t次迭代的变化范围,计算公式为
Where m is the number of iterations,
Figure PCTCN2020094104-appb-000018
Is the variation range of the t-th iteration of the two multi-granularity populations, and the calculation formula is
Figure PCTCN2020094104-appb-000019
Figure PCTCN2020094104-appb-000019
i.对于第h个多粒度种群
Figure PCTCN2020094104-appb-000020
如果满足
Figure PCTCN2020094104-appb-000021
ε为相似度阈值,范围为ε∈[0,1],则多粒度种群符合不同粒度空间中子种群信任度关系;
i. For the h-th multi-granularity population
Figure PCTCN2020094104-appb-000020
If satisfied
Figure PCTCN2020094104-appb-000021
ε is the similarity threshold, and the range is ε∈[0,1], then the multi-granularity population conforms to the subpopulation trust relationship in different granular spaces;
g.构建多粒度种群内不同超级精英之间信任度关系公式,定义为g. Construct a formula for the trust relationship between different super elites in a multi-granularity population, which is defined as
Figure PCTCN2020094104-appb-000022
Figure PCTCN2020094104-appb-000022
其中λ是超级精英之间直接信任度的信心因子,λ的取值和超级精英交互的数目有关,交互的数目越多则λ取值越大,0≤λ≤1。我们取λ=h/H Lmt,其中h为超级精英i和超级精英j之间交互的数目,H Lmt为设定的交互数目门限值,大规模脑病历属性集的规模大小通过不同粒度空间中子种群信任度关系进行动态迭代更新。 Among them, λ is the confidence factor of the direct trust between super elites. The value of λ is related to the number of super elite interactions. The greater the number of interactions, the greater the value of λ, 0≤λ≤1. We take λ=h/H Lmt , where h is the number of interactions between super elite i and super elite j, and H Lmt is the set threshold for the number of interactions. The size of the large-scale brain disease record attribute set is determined by different granularity spaces. The neutron population trust relationship is dynamically updated iteratively.
本发明的进一步改进在于:所述步骤C的具体步骤如下:A further improvement of the present invention lies in: the specific steps of step C are as follows:
a.使用传统的聚类方法k-means初始化多粒度中心为
Figure PCTCN2020094104-appb-000023
a. Use the traditional clustering method k-means to initialize the multi-granularity center as
Figure PCTCN2020094104-appb-000023
b.设多粒度子种群集和中心都为空集,V=Φ和C=Φ,迭代次数t=1。计算每个多粒度子种群与多粒度中心的距离,按最小距离原则将大规模脑病历属性集划分到相应的多粒度中心,形成k个
Figure PCTCN2020094104-appb-000024
并记录各中心中超级精英个数
Figure PCTCN2020094104-appb-000025
设置初始的调整标号
Figure PCTCN2020094104-appb-000026
b. Assuming that the multi-granularity sub-species cluster and center are both empty sets, V=Φ and C=Φ, and the number of iterations t=1. Calculate the distance between each multi-granularity subpopulation and the multi-granularity center, and divide the large-scale brain disease record attribute set into the corresponding multi-granularity centers according to the principle of minimum distance, forming k
Figure PCTCN2020094104-appb-000024
And record the number of super elites in each center
Figure PCTCN2020094104-appb-000025
Set the initial adjustment label
Figure PCTCN2020094104-appb-000026
c.重新计算每个多粒度中心
Figure PCTCN2020094104-appb-000027
和各个粒度中心移动的初始位移d(c 1i,c 0i),其中|V i|表示多粒度种群V i中种群的个数;
c. Recalculate each multi-granularity center
Figure PCTCN2020094104-appb-000027
And each initial displacement movement of the center of the particle size d (c 1i, c 0i) , where | V i | represents the number of multi-particle populations V i in the population;
d.粒度子种群在第一次迭代后粒度中心c 1与初始粒度中心c 0之间距离为d(c 1,c 0),在第i次迭代后新的粒度中心c′与原粒度中心c之间距离d(c,c′),如果
Figure PCTCN2020094104-appb-000028
ε为相似度阈值,范围为ε∈[0,1],则以c′为代表的粒度中心不再参与下轮迭代调整,否则继续进行迭代调整;
d. The distance between the particle size center c 1 and the initial particle size center c 0 after the first iteration of the particle size subpopulation is d(c 1 , c 0 ), and the new particle size center c′ and the original particle size center after the i-th iteration The distance d(c,c′) between c, if
Figure PCTCN2020094104-appb-000028
ε is the similarity threshold and the range is ε∈[0,1], then the granularity center represented by c′ will no longer participate in the next round of iterative adjustment, otherwise iterative adjustment will continue;
e.计算标号f tj=1的多粒度种群中每个超级精英与参与调整多粒度种群中心的距离,按最小距离原则将脑病历属性划分到相应的多粒度种群,形成k个新多粒度种群{V tj},并记录各多粒度种群中超级精英个数{N tj},求出调整后用于大规模脑病历属性分割的超级精英个数ΔN tje. Calculate the distance between each super elite in the multi-granularity population labeled f tj =1 and the center of the multi-granularity population participating in the adjustment, and divide the brain disease record attributes into corresponding multi-granularity populations according to the principle of minimum distance to form k new multi-granularity populations {V tj }, and record the number of super elites in each multi-granularity population {N tj }, and find the adjusted number of super elites ΔN tj for segmentation of large-scale brain disease records;
f.重新计算参与调整多粒度中心
Figure PCTCN2020094104-appb-000029
和多粒度中心移动的位移d(c tj,c tj);
f. Recalculate and adjust multi-granularity centers
Figure PCTCN2020094104-appb-000029
And the displacement d(c tj ,c tj ) of the movement of the multi-granularity center;
g.设置粒度中心迁移的调整阈值为ε和多粒度子种群数目调整阈值为θ,如果多粒度V tj的中心c tj满足
Figure PCTCN2020094104-appb-000030
Figure PCTCN2020094104-appb-000031
则将多粒度中心V tj中的调整标号设置为0,即f tj=0,并将V tj和c tj添加到最终多粒度种群中心集合中,即V=V∪{V tj}和C=C∪{c tj},如果形成了包含k个多粒度中心集合,此时|V|=k,终止迭代。
g. provided to adjust the size of the center of the migration threshold ε and the number of multiple sub-populations granularity adjustment threshold θ, if the center of the multi-granularity V tj satisfies c tj
Figure PCTCN2020094104-appb-000030
with
Figure PCTCN2020094104-appb-000031
Then the adjustment label in the multi-granularity center V tj is set to 0, that is, f tj =0, and V tj and c tj are added to the final multi-granularity population center set, that is, V=V∪{V tj } and C= C∪{c tj }, if a set containing k multi-granularity centers is formed, at this time |V|=k, the iteration is terminated.
本发明的进一步改进在于:所述步骤E的具体步骤如下:The further improvement of the present invention lies in: the specific steps of the step E are as follows:
a.设两个相邻的超级精英聚类为
Figure PCTCN2020094104-appb-000032
Figure PCTCN2020094104-appb-000033
它们的精英成员关系度分别为
Figure PCTCN2020094104-appb-000034
Figure PCTCN2020094104-appb-000035
a. Suppose two adjacent super elite clusters are
Figure PCTCN2020094104-appb-000032
with
Figure PCTCN2020094104-appb-000033
Their elite membership degrees are respectively
Figure PCTCN2020094104-appb-000034
with
Figure PCTCN2020094104-appb-000035
b.如果
Figure PCTCN2020094104-appb-000036
则超级精英将演变成精英聚类
Figure PCTCN2020094104-appb-000037
的组合;否则将演变成精英聚类
Figure PCTCN2020094104-appb-000038
的组合;
b. If
Figure PCTCN2020094104-appb-000036
Super elites will evolve into elite clusters
Figure PCTCN2020094104-appb-000037
The combination of; otherwise it will evolve into an elite cluster
Figure PCTCN2020094104-appb-000038
The combination;
c.在多粒度子种群中执行竞争和合作的混合协同的大规模脑病历分割,假设S i为第i个超级精英,在i=1至|S i|执行如下操作: . c execution Competition and Cooperation in Multi-granularity subpopulation mixed synergistic medical split brain mass, assuming S i is the i-th super elite, the i = 1 to | perform operations | S i:
(1)插入S i超级精英的代表S i,rep到P i t中; (1) is inserted into S i representing super elite S i, rep in the P i t;
(2)如果n x>|S i|,从多粒度子种群Granu-subpopulation i中选择超级精英P i t(2) if n x> | S i |, selected from a plurality of super elite P i granularity Granu-subpopulation i subset of T;
(3)将所有的S i,j和其他多粒度子种群Granu-subpopulation i的解进行组合,将其进行排序值和计算出S i,j的小生成境数; (3) Combine all the solutions of S i,j and other multi-granularity subpopulation Granu-subpopulation i , sort them and calculate the number of small generation environments of S i,j;
(4)更新S i的超级精英代表取得Pareto优势区域内非优势解,决定获胜的多粒度子种群,并更新S i=S k(4) The super-elite representative who updates S i obtains the non-dominant solution in the dominant area of Pareto, decides the winning multi-granularity subpopulation, and updates S i =S k ;
d.超级精英的模糊成员度 uCh(P i)采用相似成员方式计算,其中参考值P i和超级精英中心C h之间的距离定义为d(P i,C h); . d fuzzy membership degree of super elite uCh (P i) calculated using a member similar manner, wherein a distance defined between the reference values P i and the super elite center C h is d (P i, C h) ;
e.对每一个多粒度子种群超级精英计算均衡CI为
Figure PCTCN2020094104-appb-000039
一致概率CR为
Figure PCTCN2020094104-appb-000040
其中t∈{1,2,...,s};
e. Calculate the equilibrium CI for each super-elite sub-population
Figure PCTCN2020094104-appb-000039
The consensus probability CR is
Figure PCTCN2020094104-appb-000040
Where t∈{1,2,...,s};
f.对于任何不一致均衡度
Figure PCTCN2020094104-appb-000041
获得第t个多粒度子种群超级精英最优一致均衡度为
Figure PCTCN2020094104-appb-000042
其中
f. For any inconsistent balance
Figure PCTCN2020094104-appb-000041
Obtain the optimal uniform equilibrium degree of the t-th multi-granularity subpopulation super elite as
Figure PCTCN2020094104-appb-000042
among them
Figure PCTCN2020094104-appb-000043
Figure PCTCN2020094104-appb-000043
g.取得所有超级精英的全局最优一致概率度为
Figure PCTCN2020094104-appb-000044
t∈{1,2,...,s},构建大规模脑病历属性分割最优一致均衡度和概率度对为
Figure PCTCN2020094104-appb-000045
t∈{1,2,...,s};
g. The global optimal consensus probability of obtaining all super elites is
Figure PCTCN2020094104-appb-000044
t∈{1,2,...,s}, construct the optimal consistent equilibrium degree and probability degree pair of large-scale brain disease record attribute segmentation as
Figure PCTCN2020094104-appb-000045
t∈{1,2,...,s};
h.超级精英基于最优一致均衡度和概率度对
Figure PCTCN2020094104-appb-000046
分割脑病历不同属性区域的特征集为F 1,F 2,...,F n,求得大规模脑病历最优特征集
Figure PCTCN2020094104-appb-000047
h. Super elites are based on the optimal consistent equilibrium degree and probability degree pair
Figure PCTCN2020094104-appb-000046
Segment the feature sets of different attribute regions of brain medical records as F 1 , F 2 ,..., F n , and obtain the optimal feature set of large-scale brain medical records
Figure PCTCN2020094104-appb-000047
本发明与现有技术相比具有如下优点:Compared with the prior art, the present invention has the following advantages:
1)本发明采用基于多粒度Spark超信任模型,构建多粒度种群内不同超级精英之间信任度,对超级精英使用不同的多粒度子种群均衡调整策略进行动态更新,对大规模脑病历进行全局搜索分割与局部精化分割,超级精英在各自区域内能协同提取知识约简子集,大大降低了执行时间,提升了大规模脑病历分割精度。1) The present invention adopts a multi-granular Spark super trust model to build trust between different super elites in a multi-granular population, uses different multi-granular sub-population balance adjustment strategies for super elites to dynamically update, and performs global brain disease records on a large scale. Search segmentation and local refinement segmentation, super elites can collaboratively extract knowledge reduction subsets in their respective regions, which greatly reduces execution time and improves the accuracy of large-scale brain medical record segmentation.
2)本发明在Spark云平台上基于动态精英优势区域构建多粒度种群超级精英动态协同操作机制,取得了大规模脑病历分割最优一致均衡,降低了大规模脑病历特征分割的复杂度成本,进一步提高了云计算Spark云平台上大规模脑病历并行特征提取的细粒度和鲁棒性,为开展脑病历特征选择、规则挖掘以及临床决策支持等智能服务奠定了较好的基础。2) The present invention constructs a multi-granularity population super-elite dynamic cooperative operation mechanism on the Spark cloud platform based on the dynamic elite dominant area, and achieves the optimal and consistent balance of large-scale brain medical record segmentation, and reduces the complexity cost of large-scale brain medical record feature segmentation. It further improves the granularity and robustness of large-scale parallel feature extraction of brain medical records on the cloud computing Spark cloud platform, and lays a good foundation for the development of intelligent services such as brain medical record feature selection, rule mining, and clinical decision support.
附图说明:Description of the drawings:
图1为系统总体流程图;Figure 1 is the overall flow chart of the system;
图2为多粒度超信任Spark模型动态执行过程图;Figure 2 is a diagram of the dynamic execution process of the multi-granularity super-trust Spark model;
图3-5为多粒度种群超级精英动态模糊协同操作过程图;Figure 3-5 is a diagram of the dynamic fuzzy collaborative operation process of multi-granularity population super elites;
具体实施方式:Detailed ways:
为了加深对本发明的理解,下面将结合实施例对本发明作进一步详述,该实施例仅用于解释本发明,并不构成对本发明保护范围的限定。In order to deepen the understanding of the present invention, the present invention will be described in further detail below in conjunction with examples. The examples are only used to explain the present invention and do not constitute a limitation on the protection scope of the present invention.
如图1-图5所示用于大规模脑病历分割的多粒度Spark超信任模糊方法的具体实施方式:具体步骤如下:The specific implementation of the multi-granular Spark super-trust fuzzy method for large-scale brain medical record segmentation is shown in Figure 1 to Figure 5. The specific steps are as follows:
A.在大数据Spark云平台上将大规模脑病历属性集分割至不同的多粒度进化种群Granu-population i,i=1,2,…n,脑病历属性分割任务分解为多个并行化的作业任务,然后在分解后的多个作业任务中计算出不同脑病历候选属性集的等价类; A. On the big data Spark cloud platform, the large-scale brain medical record attribute set is divided into different multi-granular evolutionary populations Granu-population i , i=1, 2,...n, the brain medical record attribute segmentation task is decomposed into multiple parallelized ones Homework tasks, and then calculate the equivalence classes of different brain disease record candidate attribute sets in the decomposed multiple homework tasks;
B.设计基于多粒度超信任模型,将第i个多粒度进化种群Granu-population i用于脑病历第i个属性集的约简和分割处理,构建多粒度种群内不同超级精英之间信任度,计算多粒度种群的信任偏差,大规模脑病历属性集的规模大小通过不同粒度空间中子种群信任度关系进行动态迭代更新;具体包括以下步骤:具体步骤如下: B. The design is based on the multi-granularity super-trust model, and the i-th multi-granularity evolutionary population Granu-population i is used for the reduction and segmentation of the i-th attribute set of the brain disease record to build the trust between different super elites in the multi-granularity population , Calculate the trust bias of the multi-granularity population, and the scale of the large-scale brain disease record attribute set is dynamically updated iteratively through the sub-population trust relationship in different granular spaces; the specific steps are as follows:
a.设置多粒度种群个数为n,且n≥2,初始化多粒度种群为GP h且h∈{1,...,n}; a. Set the number of multi-granularity populations to n, and n≥2, and initialize the multi-granularity population to GP h and h∈{1,...,n};
b.初始化第一个粒度种群的中心为
Figure PCTCN2020094104-appb-000048
然后初始化第二个粒度种群的中心为
Figure PCTCN2020094104-appb-000049
将其作为超级精英的优先权
Figure PCTCN2020094104-appb-000050
b. Initialize the center of the first granularity population as
Figure PCTCN2020094104-appb-000048
Then initialize the center of the second granularity population as
Figure PCTCN2020094104-appb-000049
Make it a priority for the super elite
Figure PCTCN2020094104-appb-000050
c.对于第3个及其以上的多粒度种群中心
Figure PCTCN2020094104-appb-000051
计算当前精英优先权
Figure PCTCN2020094104-appb-000052
与所有当前粒度种群的中心最小距离,计算公式如下:
c. For the 3rd and above multi-granularity population centers
Figure PCTCN2020094104-appb-000051
Calculate current elite priority
Figure PCTCN2020094104-appb-000052
The minimum distance from the center of all current particle size populations is calculated as follows:
Figure PCTCN2020094104-appb-000053
Figure PCTCN2020094104-appb-000053
将该最小距离分配给第u个多粒度种群中心
Figure PCTCN2020094104-appb-000054
重复执行这个过程直至n个多粒度进化种群全部初始化;
Assign the minimum distance to the u-th multi-granularity population center
Figure PCTCN2020094104-appb-000054
Repeat this process until all n multi-granularity evolution populations are initialized;
d.在相同粒度子种群中第i个超级精英的信任度定义如下:d. The trust degree of the i-th super elite in the same granular subpopulation is defined as follows:
Figure PCTCN2020094104-appb-000055
Figure PCTCN2020094104-appb-000055
其中n是精英总数,SP i为第i个超级精英,P ij为在第i个多粒度种群中第j个普通精英; Where n is the total number of elites, SP i is the i-th super elite, and P ij is the j-th ordinary elite in the i-th multi-granularity population;
e.计算第i个超级精英SP i在第h个多粒度种群中心
Figure PCTCN2020094104-appb-000056
中的信任度R i,其迭代计算公式如下:
e. Calculate the i-th super elite SP i at the h-th multi-granularity population center
Figure PCTCN2020094104-appb-000056
The iterative calculation formula of the trust degree R i in is as follows:
Figure PCTCN2020094104-appb-000057
Figure PCTCN2020094104-appb-000057
其中i∈{2,...,N},
Figure PCTCN2020094104-appb-000058
Where i∈{2,...,N},
Figure PCTCN2020094104-appb-000058
f.设多粒度种群中心
Figure PCTCN2020094104-appb-000059
之间相似度在当前的循环次数为t,t∈{2,...,n-1},每一个多粒度种群中心
Figure PCTCN2020094104-appb-000060
的信任度由上一轮第t-1次迭代计算出来,这样大规模脑病历属性集的规模大小将通过不同粒度空间中子种群信任度关系进行动态迭代更新;
f. Set up a multi-granularity population center
Figure PCTCN2020094104-appb-000059
The number of similarities between the current cycles is t,t∈{2,...,n-1}, and each multi-granularity population center
Figure PCTCN2020094104-appb-000060
The trust degree of is calculated from the t-1 iteration of the previous round, so that the size of the large-scale brain disease record attribute set will be dynamically updated iteratively through the sub-population trust relationship in different granular spaces;
g.计算多粒度种群中不同超级精英SP i和SP j信任度间的信任偏差Diff ij,计算公式为 g. Calculate the trust deviation Diff ij between the trust degrees of different super elites SP i and SP j in the multi-granularity population, the calculation formula is
Figure PCTCN2020094104-appb-000061
Figure PCTCN2020094104-appb-000061
式中Re ij为第i个超级精英对第j个超级精英的信誉度,R mj为种群中任选第m个普通精英对第j个超级精英推荐的局部信任度,I(j)为第j个多粒度种群GP j中所有精英集合,|I(j)|为该集合的势; Where Re ij is the credibility of the i-th super elite to the j-th super elite, R mj is the partial trust recommended by the m-th ordinary elite in the population to the j-th super elite, and I(j) is the The set of all elites in j multi-granularity populations GP j , |I(j)| is the potential of the set;
h.第h个多粒度种群和第u个多粒度种群中心之间的种群信任度为
Figure PCTCN2020094104-appb-000062
计算公式如下:
h. The population trust between the h-th multi-granularity population and the u-th multi-granularity population center is
Figure PCTCN2020094104-appb-000062
Calculated as follows:
Figure PCTCN2020094104-appb-000063
Figure PCTCN2020094104-appb-000063
其中m为迭代的次数,
Figure PCTCN2020094104-appb-000064
是两个多粒度种群第t次迭代的变化范围,计算公式为
Where m is the number of iterations,
Figure PCTCN2020094104-appb-000064
Is the variation range of the t-th iteration of the two multi-granularity populations, and the calculation formula is
Figure PCTCN2020094104-appb-000065
Figure PCTCN2020094104-appb-000065
i.对于第h个多粒度种群
Figure PCTCN2020094104-appb-000066
如果满足
Figure PCTCN2020094104-appb-000067
ε为相似度阈值,范围为ε∈[0,1],则多粒度种群符合不同粒度空间中子种群信任度关系;
i. For the h-th multi-granularity population
Figure PCTCN2020094104-appb-000066
If satisfied
Figure PCTCN2020094104-appb-000067
ε is the similarity threshold, and the range is ε∈[0,1], then the multi-granularity population conforms to the subpopulation trust relationship in different granular spaces;
g.构建多粒度种群内不同超级精英之间信任度关系公式,定义为g. Construct a formula for the trust relationship between different super elites in a multi-granularity population, which is defined as
Figure PCTCN2020094104-appb-000068
Figure PCTCN2020094104-appb-000068
其中λ是超级精英之间直接信任度的信心因子,λ的取值和超级精英交互的数目有关,交互的数目越多则λ取值越大,0≤λ≤1。我们取λ=h/H Lmt,其中h为超级精英i和超级精英j之间交互的数目,H Lmt为设定的交互数目门限值,大规模脑病历属性集的规模大小通过不同粒度空间中子种群信任度关系进行动态迭代更新。 Among them, λ is the confidence factor of the direct trust between super elites. The value of λ is related to the number of super elite interactions. The greater the number of interactions, the greater the value of λ, 0≤λ≤1. We take λ=h/H Lmt , where h is the number of interactions between super elite i and super elite j, and H Lmt is the set threshold for the number of interactions. The size of the large-scale brain disease record attribute set is determined by different granularity spaces. The neutron population trust relationship is dynamically updated iteratively.
C.设置用于大规模脑病历分割的多粒度Spark超信任中心调整阈值为λ,在第i次迭代完成后,将粒度中心调整量大于阈值λ的多粒度子种群Granu-population i进行下一次迭代调整,设置粒度中心迁移的调整阈值为ε和多粒度子种群数目调整阈值为θ,优化多粒度V tj的中心c tj,并添加到最终多粒度种群中心集合中,形成包含k个多粒度中心集合;具体包括以下步骤: C. Set the multi-granularity Spark super trust center adjustment threshold for large-scale brain medical record segmentation to λ. After the i-th iteration is completed, the multi-granularity subpopulation Granu-population i whose granularity center adjustment is greater than the threshold λ is performed next time iterative adjustment is provided to adjust the threshold granularity center migration values ε and the number of multi-granularity subset adjust the threshold value [theta], Opportunities c tj multi-granularity V tj and added to the final multi-size population centers set form comprising k multi-granularity Central collection; specifically includes the following steps:
a.使用传统的聚类方法k-means初始化多粒度中心为
Figure PCTCN2020094104-appb-000069
a. Use the traditional clustering method k-means to initialize the multi-granularity center as
Figure PCTCN2020094104-appb-000069
b.设多粒度子种群集和中心都为空集,V=Φ和C=Φ,迭代次数t=1。计算每个多粒度子种群与多粒度中心的距离,按最小距离原则将大规模脑病历属性集划分到相应的多粒度中心,形成k个
Figure PCTCN2020094104-appb-000070
并记录各中心中超级精英个数
Figure PCTCN2020094104-appb-000071
设置初始的调整标号
Figure PCTCN2020094104-appb-000072
b. Assuming that the multi-granularity sub-species cluster and center are both empty sets, V=Φ and C=Φ, and the number of iterations t=1. Calculate the distance between each multi-granularity subpopulation and the multi-granularity center, and divide the large-scale brain disease record attribute set into the corresponding multi-granularity centers according to the principle of minimum distance, forming k
Figure PCTCN2020094104-appb-000070
And record the number of super elites in each center
Figure PCTCN2020094104-appb-000071
Set the initial adjustment label
Figure PCTCN2020094104-appb-000072
c.重新计算每个多粒度中心
Figure PCTCN2020094104-appb-000073
和各个粒度中心移动的初始位移d(c 1i,c 0i),其中|V i|表示多粒度种群V i中种群的个数;
c. Recalculate each multi-granularity center
Figure PCTCN2020094104-appb-000073
And each initial displacement movement of the center of the particle size d (c 1i, c 0i) , where | V i | represents the number of multi-particle populations V i in the population;
d.粒度子种群在第一次迭代后粒度中心c 1与初始粒度中心c 0之间距离为d(c 1,c 0),在第i次迭代后新的粒度中心c′与原粒度中心c之间距离d(c,c′),如果
Figure PCTCN2020094104-appb-000074
ε为相似度阈值,范围为ε∈[0,1],则以c′为代表的粒度中心不再参与下轮迭代调整,否则继续进行迭代调整;
d. The distance between the particle size center c 1 and the initial particle size center c 0 after the first iteration of the particle size subpopulation is d(c 1 , c 0 ), and the new particle size center c′ and the original particle size center after the i-th iteration The distance d(c,c′) between c, if
Figure PCTCN2020094104-appb-000074
ε is the similarity threshold and the range is ε∈[0,1], then the granularity center represented by c′ will no longer participate in the next round of iterative adjustment, otherwise iterative adjustment will continue;
e.计算标号f tj=1的多粒度种群中每个超级精英与参与调整多粒度种群中心的距离,按最小距离原则将脑病历属性划分到相应的多粒度种群,形成k个新多粒度种群{V tj},并记录各多粒度种群中超级精英个数{N tj},求出调整后用于大规模脑病历属性分割的超级精英个数ΔN tje. Calculate the distance between each super elite in the multi-granularity population labeled f tj =1 and the center of the multi-granularity population participating in the adjustment, and divide the brain disease record attributes into corresponding multi-granularity populations according to the principle of minimum distance to form k new multi-granularity populations {V tj }, and record the number of super elites in each multi-granularity population {N tj }, and find the adjusted number of super elites ΔN tj for segmentation of large-scale brain disease records;
f.重新计算参与调整多粒度中心
Figure PCTCN2020094104-appb-000075
和多粒度中心移动的位移d(c tj,c tj);
f. Recalculate and adjust multi-granularity centers
Figure PCTCN2020094104-appb-000075
And the displacement d(c tj ,c tj ) of the movement of the multi-granularity center;
g.设置粒度中心迁移的调整阈值为ε和多粒度子种群数目调整阈值为θ,如果多粒度V tj的中心c tj满足
Figure PCTCN2020094104-appb-000076
Figure PCTCN2020094104-appb-000077
则将多粒度中心V tj中的调整标号设置为0,即f tj=0,并将V tj和c tj添加到最终多粒度种群中心集合中,即V=V∪{V tj}和C=C∪{c tj},如果形成了包含k个多粒度中心集合,此时|V|=k,终止迭代。
g. provided to adjust the size of the center of the migration threshold ε and the number of multiple sub-populations granularity adjustment threshold θ, if the center of the multi-granularity V tj satisfies c tj
Figure PCTCN2020094104-appb-000076
with
Figure PCTCN2020094104-appb-000077
Then the adjustment label in the multi-granularity center V tj is set to 0, that is, f tj =0, and V tj and c tj are added to the final multi-granularity population center set, that is, V=V∪{V tj } and C= C∪{c tj }, if a set containing k multi-granularity centers is formed, at this time |V|=k, the iteration is terminated.
D.对多粒度子种群中超级精英使用均衡调整策略动态更新,将多粒度子种群超级精英划分到一个等腰直角三角形内容,分别计算各自的粒度值
Figure PCTCN2020094104-appb-000078
如果两个超级精英具有相同较低粒度
Figure PCTCN2020094104-appb-000079
则他们的近似度属性值收敛于均衡对为
Figure PCTCN2020094104-appb-000080
如果两个超级精英具有相同较高粒度
Figure PCTCN2020094104-appb-000081
则他们的近似度属性值收敛于均衡对为
Figure PCTCN2020094104-appb-000082
该均衡调整策略有利于增加多粒度子种群最优一致均衡度。
D. Use the equilibrium adjustment strategy to dynamically update the super elites in the multi-granularity sub-population, divide the multi-granularity sub-population super elites into an isosceles right-angled triangle content, and calculate their respective granularity values.
Figure PCTCN2020094104-appb-000078
If two super elites have the same lower granularity
Figure PCTCN2020094104-appb-000079
Then their approximation attribute values converge to the equilibrium pair as
Figure PCTCN2020094104-appb-000080
If two super elites have the same higher granularity
Figure PCTCN2020094104-appb-000081
Then their approximation attribute values converge to the equilibrium pair as
Figure PCTCN2020094104-appb-000082
This equilibrium adjustment strategy is beneficial to increase the optimal uniform equilibrium degree of multi-granularity subpopulations.
E.构建多粒度子种群超级精英动态模糊协同分割策略,在动态精英优势区域内对大规模脑病历属性进行全局搜索分割与局部精化分割,在多粒度子种群中执行竞争和合作的混合协同,构建大规模脑病历属性分割最优一致均衡度和概率度,使超级精英在各自对应的Pareto优势区域内协同提取知识约简子集,并能稳定分割大规模脑病历不同的属性区域,求得大规模脑病历最优特征集
Figure PCTCN2020094104-appb-000083
具体包括以下步骤:
E. Construct a multi-granularity subpopulation super-elite dynamic fuzzy collaborative segmentation strategy, perform global search segmentation and local refinement segmentation on large-scale brain medical record attributes in the dynamic elite dominance area, and perform a hybrid collaboration of competition and cooperation in multi-granularity subpopulations , To construct the optimal uniformity and probability of large-scale brain medical record attribute segmentation, so that super elites can collaboratively extract knowledge reduction subsets in their corresponding Pareto superior areas, and can stably segment large-scale brain medical records with different attribute areas. Optimal feature set of large-scale brain medical records
Figure PCTCN2020094104-appb-000083
It includes the following steps:
a.设两个相邻的超级精英聚类为
Figure PCTCN2020094104-appb-000084
Figure PCTCN2020094104-appb-000085
它们的精英成员关系度分别为
Figure PCTCN2020094104-appb-000086
Figure PCTCN2020094104-appb-000087
a. Suppose two adjacent super elite clusters are
Figure PCTCN2020094104-appb-000084
with
Figure PCTCN2020094104-appb-000085
Their elite membership degrees are respectively
Figure PCTCN2020094104-appb-000086
with
Figure PCTCN2020094104-appb-000087
b.如果
Figure PCTCN2020094104-appb-000088
则超级精英将演变成精英聚类
Figure PCTCN2020094104-appb-000089
的组合;否则将演变成精英聚类
Figure PCTCN2020094104-appb-000090
的组合;
b. If
Figure PCTCN2020094104-appb-000088
Super elites will evolve into elite clusters
Figure PCTCN2020094104-appb-000089
The combination of; otherwise it will evolve into an elite cluster
Figure PCTCN2020094104-appb-000090
The combination;
c.在多粒度子种群中执行竞争和合作的混合协同的大规模脑病历分割,假设S i为第i个超级精英,在i=1至|S i|执行如下操作: . c execution Competition and Cooperation in Multi-granularity subpopulation mixed synergistic medical split brain mass, assuming S i is the i-th super elite, the i = 1 to | perform operations | S i:
(1)插入S i超级精英的代表S i,rep到P i t中; (1) is inserted into S i representing super elite S i, rep in the P i t;
(2)如果n x>|S i|,从多粒度子种群Granu-subpopulation i中选择超级精英P i t(2) if n x> | S i |, selected from a plurality of super elite P i granularity Granu-subpopulation i subset of T;
(3)将所有的S i,j和其他多粒度子种群Granu-subpopulation i的解进行组合,将其进行排序值和计算出S i,j的小生成境数; (3) Combine all the solutions of S i,j and other multi-granularity subpopulation Granu-subpopulation i , sort them and calculate the number of small generation environments of S i,j;
(4)更新S i的超级精英代表取得Pareto优势区域内非优势解,决定获胜的多粒度子种群,并更新S i=S k(4) The super-elite representative who updates S i obtains the non-dominant solution in the dominant area of Pareto, decides the winning multi-granularity subpopulation, and updates S i =S k ;
d.超级精英的模糊成员度 uCh(P i)采用相似成员方式计算,其中参考值P i和超级精英中心C h之间的距离定义为d(P i,C h); . d fuzzy membership degree of super elite uCh (P i) calculated using a member similar manner, wherein a distance defined between the reference values P i and the super elite center C h is d (P i, C h) ;
e.对每一个多粒度子种群超级精英计算均衡CI为
Figure PCTCN2020094104-appb-000091
一致概率CR为
Figure PCTCN2020094104-appb-000092
其中t∈{1,2,...,s};
e. Calculate the equilibrium CI for each super-elite with multiple granularity subpopulations as
Figure PCTCN2020094104-appb-000091
The consensus probability CR is
Figure PCTCN2020094104-appb-000092
Where t∈{1,2,...,s};
f.对于任何不一致均衡度
Figure PCTCN2020094104-appb-000093
获得第t个多粒度子种群超级精英最优一致均衡度为
Figure PCTCN2020094104-appb-000094
其中
f. For any inconsistent balance
Figure PCTCN2020094104-appb-000093
Obtain the optimal uniform equilibrium degree of the t-th multi-granularity subpopulation super elite as
Figure PCTCN2020094104-appb-000094
among them
Figure PCTCN2020094104-appb-000095
Figure PCTCN2020094104-appb-000095
g.取得所有超级精英的全局最优一致概率度为
Figure PCTCN2020094104-appb-000096
t∈{1,2,...,s},构建 大规模脑病历属性分割最优一致均衡度和概率度对为
Figure PCTCN2020094104-appb-000097
t∈{1,2,...,s};
g. The global optimal consensus probability of obtaining all super elites is
Figure PCTCN2020094104-appb-000096
t∈{1,2,...,s}, construct the optimal consistent equilibrium degree and probability degree pair of large-scale brain disease record attribute segmentation as
Figure PCTCN2020094104-appb-000097
t∈{1,2,...,s};
h.超级精英基于最优一致均衡度和概率度对
Figure PCTCN2020094104-appb-000098
分割脑病历不同属性区域的特征集为F 1,F 2,...,F n,求得大规模脑病历最优特征集
Figure PCTCN2020094104-appb-000099
h. Super elites are based on the optimal consistent equilibrium degree and probability degree pair
Figure PCTCN2020094104-appb-000098
Segment the feature sets of different attribute regions of brain medical records as F 1 , F 2 ,..., F n , and obtain the optimal feature set of large-scale brain medical records
Figure PCTCN2020094104-appb-000099
F.比较上述求出的大规模脑病历分割精度RC与预先设定精度值η关系,若满足RC≥η,则输出大规模脑病历最优分割知识集。否则,继续执行上述C、D和E步骤,直至大规模脑病历分割精度满足RC≥η;F. Compare the relationship between the large-scale brain medical record segmentation accuracy RC obtained above and the preset accuracy value η, if RC≥η, then output the large-scale brain medical record optimal segmentation knowledge set. Otherwise, continue to perform the above steps C, D, and E until the segmentation accuracy of large-scale brain medical records meets RC≥η;
G.将大数据脑病历分割最优特征集
Figure PCTCN2020094104-appb-000100
存储至Spark云平台中,为大规模脑病历相关疾病的临床诊断和治疗提供重要的智能辅助诊断知识依据。
G. Segmenting the optimal feature set of the big data brain medical record
Figure PCTCN2020094104-appb-000100
It is stored in the Spark cloud platform to provide an important knowledge basis for intelligent auxiliary diagnosis for the clinical diagnosis and treatment of diseases related to large-scale brain medical records.
本发明采用基于多粒度Spark超信任模型,构建多粒度种群内不同超级精英之间信任度,对超级精英使用不同的多粒度子种群均衡调整策略进行动态更新,对大规模脑病历进行全局搜索分割与局部精化分割,超级精英在各自区域内能协同提取知识约简子集,大大降低了执行时间,提升了大规模脑病历分割精度。The invention adopts a multi-granular Spark super trust model to construct trust between different super elites in a multi-granular population, uses different multi-granular sub-population balance adjustment strategies for super elites to dynamically update, and performs global search and segmentation of large-scale brain disease records With local refined segmentation, super elites can collaboratively extract knowledge reduction subsets in their respective regions, which greatly reduces the execution time and improves the accuracy of large-scale brain medical record segmentation.
本发明在Spark云平台上基于动态精英优势区域构建多粒度种群超级精英动态协同操作机制,取得了大规模脑病历分割最优一致均衡,降低了大规模脑病历特征分割的复杂度成本,进一步提高了云计算Spark云平台上大规模脑病历并行特征提取的细粒度和鲁棒性,为开展脑病历特征选择、规则挖掘以及临床决策支持等智能服务奠定了较好的基础。The present invention constructs a multi-granularity population super-elite dynamic cooperative operation mechanism on the Spark cloud platform based on the dynamic elite dominant area, achieves the optimal and consistent balance of large-scale brain disease record segmentation, reduces the complexity cost of large-scale brain disease record feature segmentation, and further improves The fine-grained and robustness of large-scale parallel feature extraction of brain medical records on the cloud computing Spark cloud platform has laid a good foundation for the development of intelligent services such as brain medical record feature selection, rule mining, and clinical decision support.
对所公开的实施例的上述说明,使本领域专业技术人员能够实现或使用本发明。对这些实施例的多种修改对本领域的专业技术人员来说将是显而易见的,本文中所定义的一般原理可以在不脱离本发明的精神或范围的情况下,在其它实施例中实现。The foregoing description of the disclosed embodiments enables those skilled in the art to implement or use the present invention. Various modifications to these embodiments will be obvious to those skilled in the art, and the general principles defined herein can be implemented in other embodiments without departing from the spirit or scope of the present invention.
因此,本发明将不会被限制于本文所示的这些实施例,而是要符合与本文所公开的原理和新颖特点相一致的最宽的范围。Therefore, the present invention will not be limited to the embodiments shown in this document, but should conform to the widest scope consistent with the principles and novel features disclosed in this document.

Claims (4)

  1. 用于大规模脑病历分割的多粒度Spark超信任模糊方法,其特征在于:具体步骤如下:The multi-granular Spark super-trust fuzzy method for large-scale brain medical record segmentation is characterized in that: the specific steps are as follows:
    A.在大数据Spark云平台上将大规模脑病历属性集分割至不同的多粒度进化种群Granu-population i,i=1,2,…n,脑病历属性分割任务分解为多个并行化的作业任务,然后在分解后的多个作业任务中计算出不同脑病历候选属性集的等价类; A. On the big data Spark cloud platform, the large-scale brain medical record attribute set is divided into different multi-granular evolutionary populations Granu-population i , i=1, 2,...n, the brain medical record attribute segmentation task is decomposed into multiple parallelized ones Homework tasks, and then calculate the equivalence classes of different brain disease record candidate attribute sets in the decomposed multiple homework tasks;
    B.设计基于多粒度超信任模型,将第i个多粒度进化种群Granu-population i用于脑病历第i个属性集的约简和分割处理,构建多粒度种群内不同超级精英之间信任度,计算多粒度种群的信任偏差,大规模脑病历属性集的规模大小通过不同粒度空间中子种群信任度关系进行动态迭代更新; B. The design is based on the multi-granularity super-trust model, and the i-th multi-granularity evolutionary population Granu-population i is used for the reduction and segmentation of the i-th attribute set of the brain disease record to build the trust between different super elites in the multi-granularity population , Calculate the trust bias of multi-granularity populations, and the size of the large-scale brain disease record attribute set is dynamically updated iteratively through the sub-population trust relationship in different granular spaces;
    C.设置用于大规模脑病历分割的多粒度Spark超信任中心调整阈值为λ,在第i次迭代完成后,将粒度中心调整量大于阈值λ的多粒度子种群Granu-population i进行下一次迭代调整,设置粒度中心迁移的调整阈值为ε和多粒度子种群数目调整阈值为θ,优化多粒度V tj的中心c tj,并添加到最终多粒度种群中心集合中,形成包含k个多粒度中心集合; C. Set the multi-granularity Spark super trust center adjustment threshold for large-scale brain medical record segmentation to λ. After the i-th iteration is completed, the multi-granularity subpopulation Granu-population i whose granularity center adjustment is greater than the threshold λ is performed next time iterative adjustment is provided to adjust the threshold granularity center migration values ε and the number of multi-granularity subset adjust the threshold value [theta], Opportunities c tj multi-granularity V tj and added to the final multi-size population centers set form comprising k multi-granularity Central collection
    D.对多粒度子种群中超级精英使用均衡调整策略动态更新,将多粒度子种群超级精英划分到一个等腰直角三角形内容,分别计算各自的粒度值
    Figure PCTCN2020094104-appb-100001
    如果两个超级精英具有相同较低粒度
    Figure PCTCN2020094104-appb-100002
    则他们的近似度属性值收敛于均衡对为
    Figure PCTCN2020094104-appb-100003
    如果两个超级精英具有相同较高粒度
    Figure PCTCN2020094104-appb-100004
    则他们的近似度属性值收敛于均衡对为
    Figure PCTCN2020094104-appb-100005
    该均衡调整策略有利于增加多粒度子种群最优一致均衡度。
    D. Use the equilibrium adjustment strategy to dynamically update the super elites in the multi-granularity sub-population, divide the multi-granularity sub-population super elites into an isosceles right-angled triangle content, and calculate their respective granularity values
    Figure PCTCN2020094104-appb-100001
    If two super elites have the same lower granularity
    Figure PCTCN2020094104-appb-100002
    Then their approximation attribute values converge to the equilibrium pair as
    Figure PCTCN2020094104-appb-100003
    If two super elites have the same higher granularity
    Figure PCTCN2020094104-appb-100004
    Then their approximation attribute values converge to the equilibrium pair as
    Figure PCTCN2020094104-appb-100005
    This equilibrium adjustment strategy is beneficial to increase the optimal uniform equilibrium degree of multi-granularity subpopulations.
    E.构建多粒度子种群超级精英动态模糊协同分割策略,在动态精英优势区域内对大规模脑病历属性进行全局搜索分割与局部精化分割,在多粒度子种群中执行竞争和合作的混合协同,构建大规模脑病历属性分割最优一致均衡度和概率度,使超级精英在各自对应的Pareto优势区域内协同提取知识约简子集,并能稳定分割大规模脑病历不同的属性区域,求得大规模脑病历最优特征集
    Figure PCTCN2020094104-appb-100006
    E. Construct a multi-granularity subpopulation super-elite dynamic fuzzy collaborative segmentation strategy, perform global search segmentation and local refinement segmentation on large-scale brain medical record attributes in the dynamic elite dominance area, and perform a hybrid collaboration of competition and cooperation in multi-granularity subpopulations , To construct the optimal uniformity and probability of large-scale brain medical record attribute segmentation, so that super elites can collaboratively extract knowledge reduction subsets in their corresponding Pareto superior areas, and can stably segment large-scale brain medical records with different attribute areas. Optimal feature set of large-scale brain medical records
    Figure PCTCN2020094104-appb-100006
    F.比较上述求出的大规模脑病历分割精度RC与预先设定精度值η关系,若满足RC≥η,则输出大规模脑病历最优分割知识集。否则,继续执行上述C、D和E步骤,直至大规模脑病历分割精度满足RC≥η;F. Compare the relationship between the large-scale brain medical record segmentation accuracy RC obtained above and the preset accuracy value η, if RC≥η, then output the large-scale brain medical record optimal segmentation knowledge set. Otherwise, continue to perform the above steps C, D and E until the segmentation accuracy of large-scale brain medical records meets RC≥η;
    G.将大数据脑病历分割最优特征集
    Figure PCTCN2020094104-appb-100007
    存储至Spark云平台中,为大规模脑病历相关疾病的临床诊断和治疗提供重要的智能辅助诊断知识依据。
    G. Segmenting the optimal feature set of the big data brain medical record
    Figure PCTCN2020094104-appb-100007
    It is stored in the Spark cloud platform to provide an important knowledge basis for intelligent auxiliary diagnosis for the clinical diagnosis and treatment of diseases related to large-scale brain medical records.
  2. 根据权利要求1所述一种用于大规模脑病历分割的多粒度Spark超信任模糊方法,其特征在于:所述步骤B的具体步骤如下:The multi-granular Spark super-trust fuzzy method for large-scale brain medical record segmentation according to claim 1, wherein the specific steps of step B are as follows:
    a.设置多粒度种群个数为n,且n≥2,初始化多粒度种群为GP h且h∈{1,...,n}; a. Set the number of multi-granularity populations to n, and n≥2, and initialize the multi-granularity population to GP h and h∈{1,...,n};
    b.初始化第一个粒度种群的中心为
    Figure PCTCN2020094104-appb-100008
    然后初始化第二个粒度种群的中心为
    Figure PCTCN2020094104-appb-100009
    将其作为超级精英的优先权
    Figure PCTCN2020094104-appb-100010
    b. Initialize the center of the first granularity population as
    Figure PCTCN2020094104-appb-100008
    Then initialize the center of the second granularity population as
    Figure PCTCN2020094104-appb-100009
    Make it a priority for the super elite
    Figure PCTCN2020094104-appb-100010
    c.对于第3个及其以上的多粒度种群中心
    Figure PCTCN2020094104-appb-100011
    计算当前精英优先权
    Figure PCTCN2020094104-appb-100012
    与所有当前粒 度种群的中心最小距离,计算公式如下:
    c. For the 3rd and above multi-granularity population centers
    Figure PCTCN2020094104-appb-100011
    Calculate current elite priority
    Figure PCTCN2020094104-appb-100012
    The minimum distance from the center of all current particle size populations is calculated as follows:
    Figure PCTCN2020094104-appb-100013
    Figure PCTCN2020094104-appb-100013
    将该最小距离分配给第u个多粒度种群中心
    Figure PCTCN2020094104-appb-100014
    重复执行这个过程直至n个多粒度进化种群全部初始化;
    Assign the minimum distance to the u-th multi-granularity population center
    Figure PCTCN2020094104-appb-100014
    Repeat this process until all n multi-granularity evolutionary populations are initialized;
    d.在相同粒度子种群中第i个超级精英的信任度定义如下:d. The trust degree of the i-th super elite in the same granular subpopulation is defined as follows:
    Figure PCTCN2020094104-appb-100015
    Figure PCTCN2020094104-appb-100015
    其中n是精英总数,SP i为第i个超级精英,P ij为在第i个多粒度种群中第j个普通精英; Where n is the total number of elites, SP i is the i-th super elite, and P ij is the j-th ordinary elite in the i-th multi-granularity population;
    e.计算第i个超级精英SP i在第h个多粒度种群中心
    Figure PCTCN2020094104-appb-100016
    中的信任度R i,其迭代计算公式如下:
    e. Calculate the i-th super elite SP i at the h-th multi-granularity population center
    Figure PCTCN2020094104-appb-100016
    The iterative calculation formula of the trust degree R i in
    Figure PCTCN2020094104-appb-100017
    Figure PCTCN2020094104-appb-100017
    其中i∈{2,...,N},
    Figure PCTCN2020094104-appb-100018
    Where i∈{2,...,N},
    Figure PCTCN2020094104-appb-100018
    f.设多粒度种群中心
    Figure PCTCN2020094104-appb-100019
    之间相似度在当前的循环次数为t,t∈{2,...,n-1},每一个多粒度种群中心
    Figure PCTCN2020094104-appb-100020
    的信任度由上一轮第t-1次迭代计算出来,这样大规模脑病历属性集的规模大小将通过不同粒度空间中子种群信任度关系进行动态迭代更新;
    f. Set up a multi-granularity population center
    Figure PCTCN2020094104-appb-100019
    The number of similarities between the current cycles is t,t∈{2,...,n-1}, and each multi-granularity population center
    Figure PCTCN2020094104-appb-100020
    The trust degree of is calculated from the t-1 iteration of the previous round, so that the size of the large-scale brain disease record attribute set will be dynamically updated iteratively through the sub-population trust relationship in different granular spaces;
    g.计算多粒度种群中不同超级精英SP i和SP j信任度间的信任偏差Diff ij,计算公式为 g. Calculate the trust deviation Diff ij between the trust degrees of different super elites SP i and SP j in the multi-granularity population, the calculation formula is
    Figure PCTCN2020094104-appb-100021
    Figure PCTCN2020094104-appb-100021
    式中Re ij为第i个超级精英对第j个超级精英的信誉度,R mj为种群中任选第m个普通精英对第j个超级精英推荐的局部信任度,I(j)为第j个多粒度种群GP j中所有精英集合,|I(j)|为该集合的势; Where Re ij is the credibility of the i-th super elite to the j-th super elite, R mj is the partial trust recommended by the m-th ordinary elite in the population to the j-th super elite, and I(j) is the The set of all elites in j multi-granularity populations GP j , |I(j)| is the potential of the set;
    h.第h个多粒度种群和第u个多粒度种群中心之间的种群信任度为
    Figure PCTCN2020094104-appb-100022
    计算公式如下:
    h. The population trust between the h-th multi-granularity population and the u-th multi-granularity population center is
    Figure PCTCN2020094104-appb-100022
    Calculated as follows:
    Figure PCTCN2020094104-appb-100023
    Figure PCTCN2020094104-appb-100023
    其中m为迭代的次数,
    Figure PCTCN2020094104-appb-100024
    是两个多粒度种群第t次迭代的变化范围,计算公式为
    Where m is the number of iterations,
    Figure PCTCN2020094104-appb-100024
    Is the variation range of the t-th iteration of the two multi-granularity populations, and the calculation formula is
    Figure PCTCN2020094104-appb-100025
    Figure PCTCN2020094104-appb-100025
    i.对于第h个多粒度种群
    Figure PCTCN2020094104-appb-100026
    如果满足
    Figure PCTCN2020094104-appb-100027
    ε为相似度阈值,范围为ε∈[0,1],则多粒度种群符合不同粒度空间中子种群信任度关系;
    i. For the h-th multi-granularity population
    Figure PCTCN2020094104-appb-100026
    If satisfied
    Figure PCTCN2020094104-appb-100027
    ε is the similarity threshold, and the range is ε∈[0,1], then the multi-granularity population conforms to the subpopulation trust relationship in different granular spaces;
    g.构建多粒度种群内不同超级精英之间信任度关系公式,定义为g. Construct a formula for the trust relationship between different super elites in a multi-granularity population, which is defined as
    Figure PCTCN2020094104-appb-100028
    Figure PCTCN2020094104-appb-100028
    其中λ是超级精英之间直接信任度的信心因子,λ的取值和超级精英交互的数目有关,交互的数目越多则λ取值越大,0≤λ≤1。我们取λ=h/H Lmt,其中h为超级精英i和超级精英j之间交互的数目,H Lmt为设定的交互数目门限值,大规模脑病历属性集的规模大小通过不同粒度空间中子种群信任度关系进行动态迭代更新。 Among them, λ is the confidence factor of the direct trust between super elites. The value of λ is related to the number of super elite interactions. The greater the number of interactions, the greater the value of λ, 0≤λ≤1. We take λ=h/H Lmt , where h is the number of interactions between super elite i and super elite j, and H Lmt is the set threshold for the number of interactions. The size of the large-scale brain disease record attribute set is determined by different granularity spaces. The neutron population trust relationship is dynamically updated iteratively.
  3. 根据权利要求1所述一种用于大规模脑病历分割的多粒度Spark超信任模糊方法,其特征在于:所述步骤C的具体步骤如下:The multi-granular Spark super-trust fuzzy method for large-scale brain medical record segmentation according to claim 1, wherein the specific steps of step C are as follows:
    a.使用传统的聚类方法k-means初始化多粒度中心为
    Figure PCTCN2020094104-appb-100029
    a. Use the traditional clustering method k-means to initialize the multi-granularity center as
    Figure PCTCN2020094104-appb-100029
    b.设多粒度子种群集和中心都为空集,V=Φ和C=Φ,迭代次数t=1。计算每个多粒度子种群与多粒度中心的距离,按最小距离原则将大规模脑病历属性集划分到相应的多粒度中心,形成k个
    Figure PCTCN2020094104-appb-100030
    并记录各中心中超级精英个数
    Figure PCTCN2020094104-appb-100031
    设置初始的调整标号
    Figure PCTCN2020094104-appb-100032
    b. Assuming that the multi-granularity sub-species cluster and center are both empty sets, V=Φ and C=Φ, and the number of iterations t=1. Calculate the distance between each multi-granularity subpopulation and the multi-granularity center, and divide the large-scale brain disease record attribute set into the corresponding multi-granularity centers according to the principle of minimum distance, forming k
    Figure PCTCN2020094104-appb-100030
    And record the number of super elites in each center
    Figure PCTCN2020094104-appb-100031
    Set the initial adjustment label
    Figure PCTCN2020094104-appb-100032
    c.重新计算每个多粒度中心
    Figure PCTCN2020094104-appb-100033
    和各个粒度中心移动的初始位移d(c 1i,c 0i),其中|V i|表示多粒度种群V i中种群的个数;
    c. Recalculate each multi-granularity center
    Figure PCTCN2020094104-appb-100033
    And each initial displacement movement of the center of the particle size d (c 1i, c 0i) , where | V i | represents the number of multi-particle populations V i in the population;
    d.粒度子种群在第一次迭代后粒度中心c 1与初始粒度中心c 0之间距离为d(c 1,c 0),在第i次迭代后新的粒度中心c′与原粒度中心c之间距离d(c,c′),如果
    Figure PCTCN2020094104-appb-100034
    ε为相似度阈值,范围为ε∈[0,1],则以c′为代表的粒度中心不再参与下轮迭代调整,否则继续进行迭代调整;
    d. The distance between the particle size center c 1 and the initial particle size center c 0 after the first iteration of the particle size subpopulation is d(c 1 , c 0 ), and the new particle size center c′ and the original particle size center after the i-th iteration The distance d(c,c′) between c, if
    Figure PCTCN2020094104-appb-100034
    ε is the similarity threshold, and the range is ε∈[0,1], then the granularity center represented by c′ will no longer participate in the next round of iterative adjustment, otherwise iterative adjustment will continue;
    e.计算标号f tj=1的多粒度种群中每个超级精英与参与调整多粒度种群中心的距离,按最小距离原则将脑病历属性划分到相应的多粒度种群,形成k个新多粒度种群{V tj},并记录各多粒度种群中超级精英个数{N tj},求出调整后用于大规模脑病历属性分割的超级精英个数ΔN tje. Calculate the distance between each super elite in the multi-granularity population labeled f tj =1 and the center of the multi-granularity population participating in the adjustment, and divide the brain disease record attributes into corresponding multi-granularity populations according to the principle of minimum distance to form k new multi-granularity populations {V tj }, and record the number of super elites in each multi-granularity population {N tj }, and find the adjusted number of super elites ΔN tj for segmentation of large-scale brain disease records;
    f.重新计算参与调整多粒度中心
    Figure PCTCN2020094104-appb-100035
    和多粒度中心移动的位移d(c tj,c tj);
    f. Recalculate and adjust multi-granularity centers
    Figure PCTCN2020094104-appb-100035
    And the displacement d(c tj ,c tj ) of the movement of the multi-granularity center;
    g.设置粒度中心迁移的调整阈值为ε和多粒度子种群数目调整阈值为θ,如果多粒度V tj的中心c tj满足
    Figure PCTCN2020094104-appb-100036
    Figure PCTCN2020094104-appb-100037
    则将多粒度中心V tj中的调整标号设置为0,即f tj=0,并将V tj和c tj添加到最终多粒度种群中心集合中,即V=V∪{V tj}和C=C∪{c tj},如果形成了包含k个多粒度中心集合,此时|V|=k,终止迭代。
    g. provided to adjust the size of the center of the migration threshold ε and the number of multiple sub-populations granularity adjustment threshold θ, if the center of the multi-granularity V tj satisfies c tj
    Figure PCTCN2020094104-appb-100036
    with
    Figure PCTCN2020094104-appb-100037
    Then the adjustment label in the multi-granularity center V tj is set to 0, that is, f tj =0, and V tj and c tj are added to the final multi-granularity population center set, that is, V=V∪{V tj } and C= C∪{c tj }, if a set containing k multi-granularity centers is formed, at this time |V|=k, the iteration is terminated.
  4. 根据权利要求1所述一种用于大规模脑病历分割的多粒度Spark超信任模糊方法,其特征在于:所述步骤E的具体步骤如下:The multi-granular Spark super-trust fuzzy method for large-scale brain medical record segmentation according to claim 1, wherein the specific steps of step E are as follows:
    a.设两个相邻的超级精英聚类为
    Figure PCTCN2020094104-appb-100038
    Figure PCTCN2020094104-appb-100039
    它们的精英成员关系度分别为
    Figure PCTCN2020094104-appb-100040
    Figure PCTCN2020094104-appb-100041
    a. Suppose two adjacent super elite clusters are
    Figure PCTCN2020094104-appb-100038
    with
    Figure PCTCN2020094104-appb-100039
    Their elite membership degrees are respectively
    Figure PCTCN2020094104-appb-100040
    with
    Figure PCTCN2020094104-appb-100041
    b.如果
    Figure PCTCN2020094104-appb-100042
    则超级精英将演变成精英聚类
    Figure PCTCN2020094104-appb-100043
    的组合;否则将演变成精英聚类
    Figure PCTCN2020094104-appb-100044
    的组合;
    b. If
    Figure PCTCN2020094104-appb-100042
    Super elites will evolve into elite clusters
    Figure PCTCN2020094104-appb-100043
    The combination of; otherwise it will evolve into an elite cluster
    Figure PCTCN2020094104-appb-100044
    The combination;
    c.在多粒度子种群中执行竞争和合作的混合协同的大规模脑病历分割,假设S i为第i个超级精英,在i=1至|S i|执行如下操作: . c execution Competition and Cooperation in Multi-granularity subpopulation mixed synergistic medical split brain mass, assuming S i is the i-th super elite, the i = 1 to | perform operations | S i:
    (1)插入S i超级精英的代表S i,rep到P i t中; (1) is inserted into S i representing super elite S i, rep in the P i t;
    (2)如果n x>|S i|,从多粒度子种群Granu-subpopulation i中选择超级精英P i t(2) if n x> | S i |, selected from a plurality of super elite P i granularity Granu-subpopulation i subset of T;
    (3)将所有的S i,j和其他多粒度子种群Granu-subpopulation i的解进行组合,将其进行排序值和计算出S i,j的小生成境数; (3) Combine all the solutions of S i,j and other multi-granularity subpopulation Granu-subpopulation i , sort them and calculate the number of small generation environments of S i,j;
    (4)更新S i的超级精英代表取得Pareto优势区域内非优势解,决定获胜的多粒度子种群,并更新S i=S k(4) The super-elite representative who updates S i obtains the non-dominant solution in the dominant area of Pareto, decides the winning multi-granularity subpopulation, and updates S i =S k ;
    d.超级精英的模糊成员度 uCh(P i)采用相似成员方式计算,其中参考值P i和超级精英中心C h之间的距离定义为d(P i,C h); . d fuzzy membership degree of super elite uCh (P i) calculated using a member similar manner, wherein a distance defined between the reference values P i and the super elite center C h is d (P i, C h) ;
    e.对每一个多粒度子种群超级精英计算均衡CI为
    Figure PCTCN2020094104-appb-100045
    一致概率CR为
    Figure PCTCN2020094104-appb-100046
    其中t∈{1,2,...,s};
    e. Calculate the equilibrium CI for each super-elite with multiple granularity subpopulations as
    Figure PCTCN2020094104-appb-100045
    The consensus probability CR is
    Figure PCTCN2020094104-appb-100046
    Where t∈{1,2,...,s};
    f.对于任何不一致均衡度
    Figure PCTCN2020094104-appb-100047
    获得第t个多粒度子种群超级精英最优一致均衡度为
    Figure PCTCN2020094104-appb-100048
    其中
    f. For any inconsistent balance
    Figure PCTCN2020094104-appb-100047
    Obtain the optimal uniform equilibrium degree of the t-th multi-granularity subpopulation super elite as
    Figure PCTCN2020094104-appb-100048
    among them
    Figure PCTCN2020094104-appb-100049
    Figure PCTCN2020094104-appb-100049
    g.取得所有超级精英的全局最优一致概率度为
    Figure PCTCN2020094104-appb-100050
    构建大规模脑病历属性分割最优一致均衡度和概率度对为
    Figure PCTCN2020094104-appb-100051
    t∈{1,2,...,s};
    g. The global optimal consensus probability of obtaining all super elites is
    Figure PCTCN2020094104-appb-100050
    Constructing a large-scale brain disease record attribute segmentation optimal consistent balance and probability
    Figure PCTCN2020094104-appb-100051
    t∈{1, 2,..., s};
    h.超级精英基于最优一致均衡度和概率度对
    Figure PCTCN2020094104-appb-100052
    分割脑病历不同属性区域的特征集为F 1,F 2,...,F n,求得大规模脑病历最优特征集
    Figure PCTCN2020094104-appb-100053
    h. Super elites are based on the optimal consistent equilibrium degree and probability degree pair
    Figure PCTCN2020094104-appb-100052
    Segment the feature sets of different attribute regions of brain medical records as F 1 , F 2 ,..., F n , and obtain the optimal feature set of large-scale brain medical records
    Figure PCTCN2020094104-appb-100053
PCT/CN2020/094104 2019-10-28 2020-06-03 Multi-granulation spark-based super-trust fuzzy method for large-scale brain medical record segmentation WO2021082444A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU2020286320A AU2020286320B2 (en) 2019-10-28 2020-06-03 Multi-granularity spark super trust fuzzy method applied to large-scale brain medical record segmentation

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201911030948.0A CN110867224B (en) 2019-10-28 2019-10-28 Multi-granularity Spark super-trust fuzzy method for large-scale brain pathology segmentation
CN201911030948.0 2019-10-28

Publications (1)

Publication Number Publication Date
WO2021082444A1 true WO2021082444A1 (en) 2021-05-06

Family

ID=69653442

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/094104 WO2021082444A1 (en) 2019-10-28 2020-06-03 Multi-granulation spark-based super-trust fuzzy method for large-scale brain medical record segmentation

Country Status (3)

Country Link
CN (1) CN110867224B (en)
AU (1) AU2020286320B2 (en)
WO (1) WO2021082444A1 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110867224B (en) * 2019-10-28 2022-02-08 南通大学 Multi-granularity Spark super-trust fuzzy method for large-scale brain pathology segmentation
CN113012775B (en) * 2021-03-30 2021-10-08 南通大学 Incremental attribute reduction Spark method for classifying red spot electronic medical record pathological changes

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120130929A1 (en) * 2010-11-24 2012-05-24 International Business Machines Corporation Controlling quarantining and biasing in cataclysms for optimization simulations
CN104462853A (en) * 2014-12-29 2015-03-25 南通大学 Population elite distribution cloud collaboration equilibrium method used for feature extraction of electronic medical record
CN105279388A (en) * 2015-11-17 2016-01-27 南通大学 Multilayer cloud computing framework coordinated integrated reduction method for gestational-age newborn brain medical records
CN105719004A (en) * 2016-01-18 2016-06-29 合肥工业大学 Coevolution-based particle swarm optimization for solving multitask problems
CN108133260A (en) * 2018-01-17 2018-06-08 浙江理工大学 The workflow schedule method of multi-objective particle swarm optimization based on real-time status monitoring
CN108986872A (en) * 2018-06-21 2018-12-11 南通大学 More granularity attribute weight Spark methods for big data electronic health record reduction
CN109120017A (en) * 2017-06-22 2019-01-01 南京理工大学 A kind of Method for Reactive Power Optimization in Power based on improvement particle swarm algorithm
CN110867224A (en) * 2019-10-28 2020-03-06 南通大学 Multi-granularity Spark super-trust fuzzy method for large-scale brain pathology segmentation

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN201788510U (en) * 2010-07-13 2011-04-06 南通大学 Dynamic EMR collaborative mining system with particle swarm and extension rough set/concept lattice theories integrated together
EP2784748B1 (en) * 2013-03-28 2017-11-01 Expert Ymaging, SL A computer implemented method for assessing vascular networks from medical images and uses thereof
CN103838972B (en) * 2014-03-13 2016-08-24 南通大学 A kind of quantum coordinating game model implementation method for MRI case history attribute reduction
CN105069503A (en) * 2015-07-30 2015-11-18 重庆邮电大学 Cooperation degree-based hetero-population parallel particle swarm algorithm and implementation method of MapReduce model
CN106157370B (en) * 2016-03-03 2019-04-02 重庆大学 A kind of triangle gridding normalization method based on particle swarm algorithm
US20180108430A1 (en) * 2016-09-30 2018-04-19 Board Of Regents, The University Of Texas System Method and system for population health management in a captivated healthcare system
CN107257307B (en) * 2017-06-29 2020-06-02 中国矿业大学 Spark-based method for solving multi-terminal cooperative access network by parallelization genetic algorithm
CN108446740B (en) * 2018-03-28 2019-06-14 南通大学 A kind of consistent Synergistic method of multilayer for brain image case history feature extraction
CN109117864B (en) * 2018-07-13 2020-02-28 华南理工大学 Coronary heart disease risk prediction method, model and system based on heterogeneous feature fusion
CN109840551B (en) * 2019-01-14 2022-03-15 湖北工业大学 Method for optimizing random forest parameters for machine learning model training
CN109871995B (en) * 2019-02-02 2021-03-26 浙江工业大学 Quantum optimization parameter adjusting method for distributed deep learning under Spark framework

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120130929A1 (en) * 2010-11-24 2012-05-24 International Business Machines Corporation Controlling quarantining and biasing in cataclysms for optimization simulations
CN104462853A (en) * 2014-12-29 2015-03-25 南通大学 Population elite distribution cloud collaboration equilibrium method used for feature extraction of electronic medical record
CN105279388A (en) * 2015-11-17 2016-01-27 南通大学 Multilayer cloud computing framework coordinated integrated reduction method for gestational-age newborn brain medical records
CN105719004A (en) * 2016-01-18 2016-06-29 合肥工业大学 Coevolution-based particle swarm optimization for solving multitask problems
CN109120017A (en) * 2017-06-22 2019-01-01 南京理工大学 A kind of Method for Reactive Power Optimization in Power based on improvement particle swarm algorithm
CN108133260A (en) * 2018-01-17 2018-06-08 浙江理工大学 The workflow schedule method of multi-objective particle swarm optimization based on real-time status monitoring
CN108986872A (en) * 2018-06-21 2018-12-11 南通大学 More granularity attribute weight Spark methods for big data electronic health record reduction
CN110867224A (en) * 2019-10-28 2020-03-06 南通大学 Multi-granularity Spark super-trust fuzzy method for large-scale brain pathology segmentation

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
DING WEIPING , WANG JIANDONG , ZHANG XIAOFENG GUAN ZHIJIN: "Co-evolutionary cloud-based attribute ensemble multi-agent reduction algorithm", JOURNAL OF SOUTHEAST UNIVERSITY ( ENGLISH EDITION), vol. 32, no. 4, 15 December 2016 (2016-12-15), pages 432 - 438, XP055809178, ISSN: 1003-7985, DOI: 10.3969/j.issn.1003-7985.2016.04.007 *
XU MING-JIE; WEI CHENG-JIAN; SHEN HANG: "Research on Parallel K-means Algorithm Based on Spark", MICROELECTRONICS & COMPUTER, vol. 35, no. 5, 1 February 2019 (2019-02-01), pages 95 - 99, XP009527906, DOI: 10.19304/j.cnki.issn1000-7180.2018.05.018 *

Also Published As

Publication number Publication date
AU2020286320B2 (en) 2022-10-20
CN110867224B (en) 2022-02-08
CN110867224A (en) 2020-03-06
AU2020286320A1 (en) 2021-05-27

Similar Documents

Publication Publication Date Title
Kumar et al. Hazy: making it easier to build and maintain big-data analytics
WO2021082444A1 (en) Multi-granulation spark-based super-trust fuzzy method for large-scale brain medical record segmentation
Ge et al. Evolutionary dynamic database partitioning optimization for privacy and utility
Zhang et al. Local multigranulation decision-theoretic rough set in ordered information systems
CN111985623A (en) Attribute graph group discovery method based on maximized mutual information and graph neural network
Wu et al. Generating realistic synthetic population datasets
Yang et al. A novel cluster validity index for fuzzy c-means algorithm
Xu et al. Feature selection using relative dependency complement mutual information in fitting fuzzy rough set model
Nie et al. Temporal-structural importance weighted graph convolutional network for temporal knowledge graph completion
Sundarakumar et al. A heuristic approach to improve the data processing in big data using enhanced Salp Swarm algorithm (ESSA) and MK-means algorithm
Kaur et al. Generative adversarial networks with quantum optimization model for mobile edge computing in IoT big data
Dey et al. A quantum inspired differential evolution algorithm for automatic clustering of real life datasets
Song Deriving the priority weights from probabilistic linguistic preference relation with unknown probabilities
Zhao et al. Entity Alignment: Concepts, Recent Advances and Novel Approaches
Chander et al. A parallel fractional lion algorithm for data clustering based on MapReduce cluster framework
Wang et al. A three-way adaptive density peak clustering (3W-ADPC) method
Yuan et al. Feature selection based on self-information and entropy measures for incomplete neighborhood decision systems
Huang et al. Data mining algorithm for cloud network information based on artificial intelligence decision mechanism
Ju et al. Focus on informative graphs! Semi-supervised active learning for graph-level classification
Sun et al. LSEnet: Lorentz Structural Entropy Neural Network for Deep Graph Clustering
Wu Data association rules mining method based on improved apriori algorithm
Li et al. Empowering multi-class medical data classification by Group-of-Single-Class-predictors and transfer optimization: Cases of structured dataset by machine learning and radiological images by deep learning
Wang et al. Hierarchical Particle Swarm Optimization Based on Mean Value.
Jain Introduction to data mining techniques
Shu et al. Neighbourhood discernibility degree-based semisupervised feature selection for partially labelled mixed-type data with granular ball

Legal Events

Date Code Title Description
ENP Entry into the national phase

Ref document number: 2020286320

Country of ref document: AU

Date of ref document: 20200603

Kind code of ref document: A

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20880867

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20880867

Country of ref document: EP

Kind code of ref document: A1