用于大规模脑病历分割的多粒度Spark超信任模糊方法Multi-granular Spark super-trust fuzzy method for large-scale brain medical record segmentation
技术领域:Technical field:
本发明涉及到医学信息领域,具体来说涉及一种用于大规模脑病历分割的多粒度Spark超信任模糊方法。The present invention relates to the field of medical information, in particular to a multi-granularity Spark super-trust fuzzy method for large-scale brain medical record segmentation.
背景技术:Background technique:
医疗健康服务大数据工程不仅需要构建电子健康档案和电子病历数据库,而且要建设覆盖公共卫生、医疗服务、医疗保障、药品供应、计划生育和综合管理业务的医疗健康管理和服务大数据应用体系。我们在现有的医疗资源条件下,要达成医疗健康服务大数据工程目标,需要充分利用大数据、云计算和移动互联网等多种信息技术,促进电子病历数据库和电子健康档案数据库有效互通,并实现良性互动以实施医疗健康服务大数据工程。The medical health service big data project not only requires the construction of electronic health records and electronic medical records databases, but also a medical health management and service big data application system covering public health, medical services, medical security, drug supply, family planning and integrated management services. Under the existing medical resources, to achieve the goal of the big data project of medical and health services, we need to make full use of various information technologies such as big data, cloud computing and mobile Internet to promote the effective interoperability of electronic medical record databases and electronic health record databases, and Realize benign interaction to implement the big data project of medical and health services.
随着云计算和大数据时代的来临,大规模电子病历智能处理在整个医疗大数据产生和使用过程中异常复杂,电子病历系统中存储的医疗数据具有容量大、来源分散、格式多样、存取速度快以及应用价值高等特征。采用一些人工智能和数据挖掘技术来有效发现和提取出大规模电子病历中重要的医学诊断规则和知识是形成临床决策支持系统的关键,但由于电子病历系统是一个特殊的医疗信息系统,电子病历系统中存储的医学数据具有海量、多样、不完备和时效等复杂特性,给其特征选择、协同服务、知识发现及临床决策支持服务等带来了较大困难。对复杂大规模电子病历如何有效处理是设计面向未来医疗健康服务大数据工程和临床智能决策分析服务系统的关键。结合大规模电子病历系统自身特点,采用一些高效的模型和方法进行复杂医疗病历知识约简处理是未来发展的趋势。With the advent of cloud computing and big data era, large-scale electronic medical record intelligent processing is extremely complicated in the entire process of generating and using medical big data. The medical data stored in the electronic medical record system has large capacity, scattered sources, diverse formats, and access Features such as high speed and high application value. Using some artificial intelligence and data mining techniques to effectively discover and extract important medical diagnosis rules and knowledge in large-scale electronic medical records is the key to forming a clinical decision support system. However, because the electronic medical record system is a special medical information system, electronic medical records The medical data stored in the system has complex characteristics such as massive, diverse, incomplete, and time-sensitive, which brings great difficulties to its feature selection, collaborative services, knowledge discovery, and clinical decision support services. How to effectively process complex and large-scale electronic medical records is the key to designing future-oriented medical and health service big data engineering and clinical intelligent decision analysis service system. Combining the characteristics of the large-scale electronic medical record system, adopting some efficient models and methods to reduce the knowledge of complex medical records is the trend of future development.
利用人工智能和大数据处理方法从大规模脑病历数据中自动分割脑属性,发现潜在的医学规律、对脑疾病的预防、控制和治疗具有重要的作用。大规模脑病历分割问题广泛存在于脑病历特征选择、规则挖掘和临床决策支持系统等研究中,是医疗大数据背景下脑病历智能应用的核心技术。因此迫切需要考虑给出云计算环境下有效方法来解决大规模脑病历分割问题,进一步提高海量脑病历智能处理和服务模式,这是目前医疗大数据背景下脑病历智能辅助诊断治疗以及临床决策支持系统研究中急需解决的关键问题,同时也是脑病历领域中具有挑战性的研究课题。然而由于大规模脑病历高度的不完备性和取值模糊性,导致了脑病历数据属性非真实性特征更加鲜明、不确定性更加显著,大大限制了传统属性分割方法的应用。因此在医疗大数据环境下,针对大规模脑病历的特征提出有效的分割方法,取得脑病历分割中全局搜索约简与局部精化知识协同约简最优一致均衡,对大规模脑病历决策支持分析具有非常重要的意义与价值。Using artificial intelligence and big data processing methods to automatically segment brain attributes from large-scale brain medical records, discover potential medical laws, and play an important role in the prevention, control and treatment of brain diseases. Large-scale brain medical record segmentation problems widely exist in the research of brain medical record feature selection, rule mining and clinical decision support system. It is the core technology of brain medical record intelligent application under the background of medical big data. Therefore, there is an urgent need to consider providing effective methods under the cloud computing environment to solve the problem of large-scale brain medical record segmentation, and further improve the intelligent processing and service mode of massive brain medical records. This is the current intelligent auxiliary diagnosis and treatment of brain medical records and clinical decision support under the background of medical big data. The key issues that need to be solved urgently in system research are also challenging research topics in the field of brain medical records. However, due to the high degree of incompleteness and ambiguity of large-scale brain medical records, the non-authentic characteristics of brain medical records data are more distinctive and the uncertainty is more obvious, which greatly limits the application of traditional attribute segmentation methods. Therefore, in the medical big data environment, an effective segmentation method is proposed for the characteristics of large-scale brain medical records, and the optimal and consistent balance of global search reduction and local refined knowledge collaborative reduction in brain medical record segmentation is obtained, which supports large-scale brain medical records decision-making. Analysis has very important meaning and value.
本发明公开一种用于大规模脑病历分割的多粒度Spark超信任模糊方法,首先在Spark云平台上将大规模脑病历数据属性集分割至不同的多粒度进化子种群Granu-population
i中;设计一种基于多粒度Spark超信任模型,构建多粒度种群内不同超级精英之间信任度;调整多粒度中心阈值,对超级精英使用多粒度子种群均衡调整策略进行动态更新,对大规模脑病历进行全局搜索分割与局部精化分割,超级精英在各自区域内能协同 提取知识约简子集;最后求得大规模脑病历最优分割特征集
并存储至Spark云平台中。本发明能稳定分割大规模脑病历知识约简集,为脑部疾病智能诊断和辅助治疗提供重要的诊断依据。
The present invention discloses a multi-granular Spark super-trust fuzzy method for large-scale brain medical record segmentation. Firstly, the large-scale brain medical record data attribute set is divided into different multi-granular evolutionary subpopulations Granu-population i on the Spark cloud platform; Design a super-trust model based on multi-granularity Spark to build trust among different super elites in multi-granularity populations; adjust multi-granularity center threshold, use multi-granularity sub-population balance adjustment strategy for super elites to dynamically update, and large-scale brain disease records Perform global search segmentation and local refinement segmentation, super elites can collaboratively extract knowledge reduction subsets in their respective regions; finally, obtain the optimal segmentation feature set of large-scale brain medical records And stored in the Spark cloud platform. The invention can stably segment a large-scale brain disease history knowledge reduction collection, and provide an important diagnosis basis for the intelligent diagnosis and auxiliary treatment of brain diseases.
本发明的进一步改进在于:所述步骤B的具体步骤如下:The further improvement of the present invention lies in: the specific steps of step B are as follows:
a.设置多粒度种群个数为n,且n≥2,初始化多粒度种群为GP
h且h∈{1,...,n};
a. Set the number of multi-granularity populations to n, and n≥2, and initialize the multi-granularity population to GP h and h∈{1,...,n};
b.初始化第一个粒度种群的中心为
然后初始化第二个粒度种群的中心为
将其作为超级精英的优先权
b. Initialize the center of the first granularity population as Then initialize the center of the second granularity population as Make it a priority for the super elite
c.对于第3个及其以上的多粒度种群中心
计算当前精英优先权
与所有当前粒度种群的中心最小距离,计算公式如下:
c. For the 3rd and above multi-granularity population centers Calculate current elite priority The minimum distance from the center of all current particle size populations is calculated as follows:
将该最小距离分配给第u个多粒度种群中心
重复执行这个过程直至n个多粒度进化种群全部初始化;
Assign the minimum distance to the u-th multi-granularity population center Repeat this process until all n multi-granularity evolutionary populations are initialized;
d.在相同粒度子种群中第i个超级精英的信任度定义如下:d. The trust degree of the i-th super elite in the same granular subpopulation is defined as follows:
其中n是精英总数,SP
i为第i个超级精英,P
ij为在第i个多粒度种群中第j个普通精英;
Where n is the total number of elites, SP i is the i-th super elite, and P ij is the j-th ordinary elite in the i-th multi-granularity population;
e.计算第i个超级精英SP
i在第h个多粒度种群中心
中的信任度R
i,其迭代计算公式如下:
e. Calculate the i-th super elite SP i at the h-th multi-granularity population center The iterative calculation formula of the trust degree R i in is as follows:
其中i∈{2,...,N},
Where i∈{2,...,N},
f.设多粒度种群中心
之间相似度在当前的循环次数为t,t∈{2,...,n-1},每一个多粒度种群中心
的信任度由上一轮第t-1次迭代计算出来,这样大规模脑病历属性集的规模大小将通过不同粒度空间中子种群信任度关系进行动态迭代更新;
f. Set up a multi-granularity population center The number of similarities between the current cycles is t,t∈{2,...,n-1}, and each multi-granularity population center The trust degree of is calculated from the t-1 iteration of the previous round, so that the size of the large-scale brain disease record attribute set will be dynamically updated iteratively through the sub-population trust relationship in different granular spaces;
g.计算多粒度种群中不同超级精英SP
i和SP
j信任度间的信任偏差Diff
ij,计算公式为
g. Calculate the trust deviation Diff ij between the trust degrees of different super elites SP i and SP j in the multi-granularity population, the calculation formula is
式中Re
ij为第i个超级精英对第j个超级精英的信誉度,R
mj为种群中任选第m个普通精英对第j个超级精英推荐的局部信任度,I(j)为第j个多粒度种群GP
j中所有精英集合,|I(j)|为该集合的势;
Where Re ij is the credibility of the i-th super elite to the j-th super elite, R mj is the partial trust recommended by the m-th ordinary elite in the population to the j-th super elite, and I(j) is the The set of all elites in j multi-granularity populations GP j , |I(j)| is the potential of the set;
h.第h个多粒度种群和第u个多粒度种群中心之间的种群信任度为
计算公式如下:
h. The population trust between the h-th multi-granularity population and the u-th multi-granularity population center is Calculated as follows:
其中m为迭代的次数,
是两个多粒度种群第t次迭代的变化范围,计算公式为
Where m is the number of iterations, Is the variation range of the t-th iteration of the two multi-granularity populations, and the calculation formula is
i.对于第h个多粒度种群
如果满足
ε为相似度阈值,范围为ε∈[0,1],则多粒度种群符合不同粒度空间中子种群信任度关系;
i. For the h-th multi-granularity population If satisfied ε is the similarity threshold, and the range is ε∈[0,1], then the multi-granularity population conforms to the subpopulation trust relationship in different granular spaces;
g.构建多粒度种群内不同超级精英之间信任度关系公式,定义为g. Construct a formula for the trust relationship between different super elites in a multi-granularity population, which is defined as
其中λ是超级精英之间直接信任度的信心因子,λ的取值和超级精英交互的数目有关,交互的数目越多则λ取值越大,0≤λ≤1。我们取λ=h/H
Lmt,其中h为超级精英i和超级精英j之间交互的数目,H
Lmt为设定的交互数目门限值,大规模脑病历属性集的规模大小通过不同粒度空间中子种群信任度关系进行动态迭代更新。
Among them, λ is the confidence factor of the direct trust between super elites. The value of λ is related to the number of super elite interactions. The greater the number of interactions, the greater the value of λ, 0≤λ≤1. We take λ=h/H Lmt , where h is the number of interactions between super elite i and super elite j, and H Lmt is the set threshold for the number of interactions. The size of the large-scale brain disease record attribute set is determined by different granularity spaces. The neutron population trust relationship is dynamically updated iteratively.
本发明的进一步改进在于:所述步骤C的具体步骤如下:A further improvement of the present invention lies in: the specific steps of step C are as follows:
a.使用传统的聚类方法k-means初始化多粒度中心为
a. Use the traditional clustering method k-means to initialize the multi-granularity center as
b.设多粒度子种群集和中心都为空集,V=Φ和C=Φ,迭代次数t=1。计算每个多粒度子种群与多粒度中心的距离,按最小距离原则将大规模脑病历属性集划分到相应的多粒度中心,形成k个
并记录各中心中超级精英个数
设置初始的调整标号
b. Assuming that the multi-granularity sub-species cluster and center are both empty sets, V=Φ and C=Φ, and the number of iterations t=1. Calculate the distance between each multi-granularity subpopulation and the multi-granularity center, and divide the large-scale brain disease record attribute set into the corresponding multi-granularity centers according to the principle of minimum distance, forming k And record the number of super elites in each center Set the initial adjustment label
c.重新计算每个多粒度中心
和各个粒度中心移动的初始位移d(c
1i,c
0i),其中|V
i|表示多粒度种群V
i中种群的个数;
c. Recalculate each multi-granularity center And each initial displacement movement of the center of the particle size d (c 1i, c 0i) , where | V i | represents the number of multi-particle populations V i in the population;
d.粒度子种群在第一次迭代后粒度中心c
1与初始粒度中心c
0之间距离为d(c
1,c
0),在第i次迭代后新的粒度中心c′与原粒度中心c之间距离d(c,c′),如果
ε为相似度阈值,范围为ε∈[0,1],则以c′为代表的粒度中心不再参与下轮迭代调整,否则继续进行迭代调整;
d. The distance between the particle size center c 1 and the initial particle size center c 0 after the first iteration of the particle size subpopulation is d(c 1 , c 0 ), and the new particle size center c′ and the original particle size center after the i-th iteration The distance d(c,c′) between c, if ε is the similarity threshold and the range is ε∈[0,1], then the granularity center represented by c′ will no longer participate in the next round of iterative adjustment, otherwise iterative adjustment will continue;
e.计算标号f
tj=1的多粒度种群中每个超级精英与参与调整多粒度种群中心的距离,按最小距离原则将脑病历属性划分到相应的多粒度种群,形成k个新多粒度种群{V
tj},并记录各多粒度种群中超级精英个数{N
tj},求出调整后用于大规模脑病历属性分割的超级精英个数ΔN
tj;
e. Calculate the distance between each super elite in the multi-granularity population labeled f tj =1 and the center of the multi-granularity population participating in the adjustment, and divide the brain disease record attributes into corresponding multi-granularity populations according to the principle of minimum distance to form k new multi-granularity populations {V tj }, and record the number of super elites in each multi-granularity population {N tj }, and find the adjusted number of super elites ΔN tj for segmentation of large-scale brain disease records;
f.重新计算参与调整多粒度中心
和多粒度中心移动的位移d(c
tj,c
tj);
f. Recalculate and adjust multi-granularity centers And the displacement d(c tj ,c tj ) of the movement of the multi-granularity center;
g.设置粒度中心迁移的调整阈值为ε和多粒度子种群数目调整阈值为θ,如果多粒度V
tj的中心c
tj满足
和
则将多粒度中心V
tj中的调整标号设置为0,即f
tj=0,并将V
tj和c
tj添加到最终多粒度种群中心集合中,即V=V∪{V
tj}和C=C∪{c
tj},如果形成了包含k个多粒度中心集合,此时|V|=k,终止迭代。
g. provided to adjust the size of the center of the migration threshold ε and the number of multiple sub-populations granularity adjustment threshold θ, if the center of the multi-granularity V tj satisfies c tj with Then the adjustment label in the multi-granularity center V tj is set to 0, that is, f tj =0, and V tj and c tj are added to the final multi-granularity population center set, that is, V=V∪{V tj } and C= C∪{c tj }, if a set containing k multi-granularity centers is formed, at this time |V|=k, the iteration is terminated.
本发明的进一步改进在于:所述步骤E的具体步骤如下:The further improvement of the present invention lies in: the specific steps of the step E are as follows:
a.设两个相邻的超级精英聚类为
和
它们的精英成员关系度分别为
和
a. Suppose two adjacent super elite clusters are with Their elite membership degrees are respectively with
b.如果
则超级精英将演变成精英聚类
的组合;否则将演变成精英聚类
的组合;
b. If Super elites will evolve into elite clusters The combination of; otherwise it will evolve into an elite cluster The combination;
c.在多粒度子种群中执行竞争和合作的混合协同的大规模脑病历分割,假设S
i为第i个超级精英,在i=1至|S
i|执行如下操作:
. c execution Competition and Cooperation in Multi-granularity subpopulation mixed synergistic medical split brain mass, assuming S i is the i-th super elite, the i = 1 to | perform operations | S i:
(1)插入S
i超级精英的代表S
i,rep到P
i
t中;
(1) is inserted into S i representing super elite S i, rep in the P i t;
(2)如果n
x>|S
i|,从多粒度子种群Granu-subpopulation
i中选择超级精英P
i
t;
(2) if n x> | S i |, selected from a plurality of super elite P i granularity Granu-subpopulation i subset of T;
(3)将所有的S
i,j和其他多粒度子种群Granu-subpopulation
i的解进行组合,将其进行排序值和计算出S
i,j的小生成境数;
(3) Combine all the solutions of S i,j and other multi-granularity subpopulation Granu-subpopulation i , sort them and calculate the number of small generation environments of S i,j;
(4)更新S
i的超级精英代表取得Pareto优势区域内非优势解,决定获胜的多粒度子种群,并更新S
i=S
k;
(4) The super-elite representative who updates S i obtains the non-dominant solution in the dominant area of Pareto, decides the winning multi-granularity subpopulation, and updates S i =S k ;
d.超级精英的模糊成员度
uCh(P
i)采用相似成员方式计算,其中参考值P
i和超级精英中心C
h之间的距离定义为d(P
i,C
h);
. d fuzzy membership degree of super elite uCh (P i) calculated using a member similar manner, wherein a distance defined between the reference values P i and the super elite center C h is d (P i, C h) ;
e.对每一个多粒度子种群超级精英计算均衡CI为
一致概率CR为
其中t∈{1,2,...,s};
e. Calculate the equilibrium CI for each super-elite sub-population The consensus probability CR is Where t∈{1,2,...,s};
f.对于任何不一致均衡度
获得第t个多粒度子种群超级精英最优一致均衡度为
其中
f. For any inconsistent balance Obtain the optimal uniform equilibrium degree of the t-th multi-granularity subpopulation super elite as among them
g.取得所有超级精英的全局最优一致概率度为
t∈{1,2,...,s},构建大规模脑病历属性分割最优一致均衡度和概率度对为
t∈{1,2,...,s};
g. The global optimal consensus probability of obtaining all super elites is t∈{1,2,...,s}, construct the optimal consistent equilibrium degree and probability degree pair of large-scale brain disease record attribute segmentation as t∈{1,2,...,s};
h.超级精英基于最优一致均衡度和概率度对
分割脑病历不同属性区域的特征集为F
1,F
2,...,F
n,求得大规模脑病历最优特征集
h. Super elites are based on the optimal consistent equilibrium degree and probability degree pair Segment the feature sets of different attribute regions of brain medical records as F 1 , F 2 ,..., F n , and obtain the optimal feature set of large-scale brain medical records
本发明与现有技术相比具有如下优点:Compared with the prior art, the present invention has the following advantages:
1)本发明采用基于多粒度Spark超信任模型,构建多粒度种群内不同超级精英之间信任度,对超级精英使用不同的多粒度子种群均衡调整策略进行动态更新,对大规模脑病历进行全局搜索分割与局部精化分割,超级精英在各自区域内能协同提取知识约简子集,大大降低了执行时间,提升了大规模脑病历分割精度。1) The present invention adopts a multi-granular Spark super trust model to build trust between different super elites in a multi-granular population, uses different multi-granular sub-population balance adjustment strategies for super elites to dynamically update, and performs global brain disease records on a large scale. Search segmentation and local refinement segmentation, super elites can collaboratively extract knowledge reduction subsets in their respective regions, which greatly reduces execution time and improves the accuracy of large-scale brain medical record segmentation.
2)本发明在Spark云平台上基于动态精英优势区域构建多粒度种群超级精英动态协同操作机制,取得了大规模脑病历分割最优一致均衡,降低了大规模脑病历特征分割的复杂度成本,进一步提高了云计算Spark云平台上大规模脑病历并行特征提取的细粒度和鲁棒性,为开展脑病历特征选择、规则挖掘以及临床决策支持等智能服务奠定了较好的基础。2) The present invention constructs a multi-granularity population super-elite dynamic cooperative operation mechanism on the Spark cloud platform based on the dynamic elite dominant area, and achieves the optimal and consistent balance of large-scale brain medical record segmentation, and reduces the complexity cost of large-scale brain medical record feature segmentation. It further improves the granularity and robustness of large-scale parallel feature extraction of brain medical records on the cloud computing Spark cloud platform, and lays a good foundation for the development of intelligent services such as brain medical record feature selection, rule mining, and clinical decision support.
附图说明:Description of the drawings:
图1为系统总体流程图;Figure 1 is the overall flow chart of the system;
图2为多粒度超信任Spark模型动态执行过程图;Figure 2 is a diagram of the dynamic execution process of the multi-granularity super-trust Spark model;
图3-5为多粒度种群超级精英动态模糊协同操作过程图;Figure 3-5 is a diagram of the dynamic fuzzy collaborative operation process of multi-granularity population super elites;
具体实施方式:Detailed ways:
为了加深对本发明的理解,下面将结合实施例对本发明作进一步详述,该实施例仅用于解释本发明,并不构成对本发明保护范围的限定。In order to deepen the understanding of the present invention, the present invention will be described in further detail below in conjunction with examples. The examples are only used to explain the present invention and do not constitute a limitation on the protection scope of the present invention.
如图1-图5所示用于大规模脑病历分割的多粒度Spark超信任模糊方法的具体实施方式:具体步骤如下:The specific implementation of the multi-granular Spark super-trust fuzzy method for large-scale brain medical record segmentation is shown in Figure 1 to Figure 5. The specific steps are as follows:
A.在大数据Spark云平台上将大规模脑病历属性集分割至不同的多粒度进化种群Granu-population
i,i=1,2,…n,脑病历属性分割任务分解为多个并行化的作业任务,然后在分解后的多个作业任务中计算出不同脑病历候选属性集的等价类;
A. On the big data Spark cloud platform, the large-scale brain medical record attribute set is divided into different multi-granular evolutionary populations Granu-population i , i=1, 2,...n, the brain medical record attribute segmentation task is decomposed into multiple parallelized ones Homework tasks, and then calculate the equivalence classes of different brain disease record candidate attribute sets in the decomposed multiple homework tasks;
B.设计基于多粒度超信任模型,将第i个多粒度进化种群Granu-population
i用于脑病历第i个属性集的约简和分割处理,构建多粒度种群内不同超级精英之间信任度,计算多粒度种群的信任偏差,大规模脑病历属性集的规模大小通过不同粒度空间中子种群信任度关系进行动态迭代更新;具体包括以下步骤:具体步骤如下:
B. The design is based on the multi-granularity super-trust model, and the i-th multi-granularity evolutionary population Granu-population i is used for the reduction and segmentation of the i-th attribute set of the brain disease record to build the trust between different super elites in the multi-granularity population , Calculate the trust bias of the multi-granularity population, and the scale of the large-scale brain disease record attribute set is dynamically updated iteratively through the sub-population trust relationship in different granular spaces; the specific steps are as follows:
a.设置多粒度种群个数为n,且n≥2,初始化多粒度种群为GP
h且h∈{1,...,n};
a. Set the number of multi-granularity populations to n, and n≥2, and initialize the multi-granularity population to GP h and h∈{1,...,n};
b.初始化第一个粒度种群的中心为
然后初始化第二个粒度种群的中心为
将其作为超级精英的优先权
b. Initialize the center of the first granularity population as Then initialize the center of the second granularity population as Make it a priority for the super elite
c.对于第3个及其以上的多粒度种群中心
计算当前精英优先权
与所有当前粒度种群的中心最小距离,计算公式如下:
c. For the 3rd and above multi-granularity population centers Calculate current elite priority The minimum distance from the center of all current particle size populations is calculated as follows:
将该最小距离分配给第u个多粒度种群中心
重复执行这个过程直至n个多粒度进化种群全部初始化;
Assign the minimum distance to the u-th multi-granularity population center Repeat this process until all n multi-granularity evolution populations are initialized;
d.在相同粒度子种群中第i个超级精英的信任度定义如下:d. The trust degree of the i-th super elite in the same granular subpopulation is defined as follows:
其中n是精英总数,SP
i为第i个超级精英,P
ij为在第i个多粒度种群中第j个普通精英;
Where n is the total number of elites, SP i is the i-th super elite, and P ij is the j-th ordinary elite in the i-th multi-granularity population;
e.计算第i个超级精英SP
i在第h个多粒度种群中心
中的信任度R
i,其迭代计算公式如下:
e. Calculate the i-th super elite SP i at the h-th multi-granularity population center The iterative calculation formula of the trust degree R i in is as follows:
其中i∈{2,...,N},
Where i∈{2,...,N},
f.设多粒度种群中心
之间相似度在当前的循环次数为t,t∈{2,...,n-1},每一个多粒度种群中心
的信任度由上一轮第t-1次迭代计算出来,这样大规模脑病历属性集的规模大小将通过不同粒度空间中子种群信任度关系进行动态迭代更新;
f. Set up a multi-granularity population center The number of similarities between the current cycles is t,t∈{2,...,n-1}, and each multi-granularity population center The trust degree of is calculated from the t-1 iteration of the previous round, so that the size of the large-scale brain disease record attribute set will be dynamically updated iteratively through the sub-population trust relationship in different granular spaces;
g.计算多粒度种群中不同超级精英SP
i和SP
j信任度间的信任偏差Diff
ij,计算公式为
g. Calculate the trust deviation Diff ij between the trust degrees of different super elites SP i and SP j in the multi-granularity population, the calculation formula is
式中Re
ij为第i个超级精英对第j个超级精英的信誉度,R
mj为种群中任选第m个普通精英对第j个超级精英推荐的局部信任度,I(j)为第j个多粒度种群GP
j中所有精英集合,|I(j)|为该集合的势;
Where Re ij is the credibility of the i-th super elite to the j-th super elite, R mj is the partial trust recommended by the m-th ordinary elite in the population to the j-th super elite, and I(j) is the The set of all elites in j multi-granularity populations GP j , |I(j)| is the potential of the set;
h.第h个多粒度种群和第u个多粒度种群中心之间的种群信任度为
计算公式如下:
h. The population trust between the h-th multi-granularity population and the u-th multi-granularity population center is Calculated as follows:
其中m为迭代的次数,
是两个多粒度种群第t次迭代的变化范围,计算公式为
Where m is the number of iterations, Is the variation range of the t-th iteration of the two multi-granularity populations, and the calculation formula is
i.对于第h个多粒度种群
如果满足
ε为相似度阈值,范围为ε∈[0,1],则多粒度种群符合不同粒度空间中子种群信任度关系;
i. For the h-th multi-granularity population If satisfied ε is the similarity threshold, and the range is ε∈[0,1], then the multi-granularity population conforms to the subpopulation trust relationship in different granular spaces;
g.构建多粒度种群内不同超级精英之间信任度关系公式,定义为g. Construct a formula for the trust relationship between different super elites in a multi-granularity population, which is defined as
其中λ是超级精英之间直接信任度的信心因子,λ的取值和超级精英交互的数目有关,交互的数目越多则λ取值越大,0≤λ≤1。我们取λ=h/H
Lmt,其中h为超级精英i和超级精英j之间交互的数目,H
Lmt为设定的交互数目门限值,大规模脑病历属性集的规模大小通过不同粒度空间中子种群信任度关系进行动态迭代更新。
Among them, λ is the confidence factor of the direct trust between super elites. The value of λ is related to the number of super elite interactions. The greater the number of interactions, the greater the value of λ, 0≤λ≤1. We take λ=h/H Lmt , where h is the number of interactions between super elite i and super elite j, and H Lmt is the set threshold for the number of interactions. The size of the large-scale brain disease record attribute set is determined by different granularity spaces. The neutron population trust relationship is dynamically updated iteratively.
C.设置用于大规模脑病历分割的多粒度Spark超信任中心调整阈值为λ,在第i次迭代完成后,将粒度中心调整量大于阈值λ的多粒度子种群Granu-population
i进行下一次迭代调整,设置粒度中心迁移的调整阈值为ε和多粒度子种群数目调整阈值为θ,优化多粒度V
tj的中心c
tj,并添加到最终多粒度种群中心集合中,形成包含k个多粒度中心集合;具体包括以下步骤:
C. Set the multi-granularity Spark super trust center adjustment threshold for large-scale brain medical record segmentation to λ. After the i-th iteration is completed, the multi-granularity subpopulation Granu-population i whose granularity center adjustment is greater than the threshold λ is performed next time iterative adjustment is provided to adjust the threshold granularity center migration values ε and the number of multi-granularity subset adjust the threshold value [theta], Opportunities c tj multi-granularity V tj and added to the final multi-size population centers set form comprising k multi-granularity Central collection; specifically includes the following steps:
a.使用传统的聚类方法k-means初始化多粒度中心为
a. Use the traditional clustering method k-means to initialize the multi-granularity center as
b.设多粒度子种群集和中心都为空集,V=Φ和C=Φ,迭代次数t=1。计算每个多粒度子种群与多粒度中心的距离,按最小距离原则将大规模脑病历属性集划分到相应的多粒度中心,形成k个
并记录各中心中超级精英个数
设置初始的调整标号
b. Assuming that the multi-granularity sub-species cluster and center are both empty sets, V=Φ and C=Φ, and the number of iterations t=1. Calculate the distance between each multi-granularity subpopulation and the multi-granularity center, and divide the large-scale brain disease record attribute set into the corresponding multi-granularity centers according to the principle of minimum distance, forming k And record the number of super elites in each center Set the initial adjustment label
c.重新计算每个多粒度中心
和各个粒度中心移动的初始位移d(c
1i,c
0i),其中|V
i|表示多粒度种群V
i中种群的个数;
c. Recalculate each multi-granularity center And each initial displacement movement of the center of the particle size d (c 1i, c 0i) , where | V i | represents the number of multi-particle populations V i in the population;
d.粒度子种群在第一次迭代后粒度中心c
1与初始粒度中心c
0之间距离为d(c
1,c
0),在第i次迭代后新的粒度中心c′与原粒度中心c之间距离d(c,c′),如果
ε为相似度阈值,范围为ε∈[0,1],则以c′为代表的粒度中心不再参与下轮迭代调整,否则继续进行迭代调整;
d. The distance between the particle size center c 1 and the initial particle size center c 0 after the first iteration of the particle size subpopulation is d(c 1 , c 0 ), and the new particle size center c′ and the original particle size center after the i-th iteration The distance d(c,c′) between c, if ε is the similarity threshold and the range is ε∈[0,1], then the granularity center represented by c′ will no longer participate in the next round of iterative adjustment, otherwise iterative adjustment will continue;
e.计算标号f
tj=1的多粒度种群中每个超级精英与参与调整多粒度种群中心的距离,按最小距离原则将脑病历属性划分到相应的多粒度种群,形成k个新多粒度种群{V
tj},并记录各多粒度种群中超级精英个数{N
tj},求出调整后用于大规模脑病历属性分割的超级精英个数ΔN
tj;
e. Calculate the distance between each super elite in the multi-granularity population labeled f tj =1 and the center of the multi-granularity population participating in the adjustment, and divide the brain disease record attributes into corresponding multi-granularity populations according to the principle of minimum distance to form k new multi-granularity populations {V tj }, and record the number of super elites in each multi-granularity population {N tj }, and find the adjusted number of super elites ΔN tj for segmentation of large-scale brain disease records;
f.重新计算参与调整多粒度中心
和多粒度中心移动的位移d(c
tj,c
tj);
f. Recalculate and adjust multi-granularity centers And the displacement d(c tj ,c tj ) of the movement of the multi-granularity center;
g.设置粒度中心迁移的调整阈值为ε和多粒度子种群数目调整阈值为θ,如果多粒度V
tj的中心c
tj满足
和
则将多粒度中心V
tj中的调整标号设置为0,即f
tj=0,并将V
tj和c
tj添加到最终多粒度种群中心集合中,即V=V∪{V
tj}和C=C∪{c
tj},如果形成了包含k个多粒度中心集合,此时|V|=k,终止迭代。
g. provided to adjust the size of the center of the migration threshold ε and the number of multiple sub-populations granularity adjustment threshold θ, if the center of the multi-granularity V tj satisfies c tj with Then the adjustment label in the multi-granularity center V tj is set to 0, that is, f tj =0, and V tj and c tj are added to the final multi-granularity population center set, that is, V=V∪{V tj } and C= C∪{c tj }, if a set containing k multi-granularity centers is formed, at this time |V|=k, the iteration is terminated.
D.对多粒度子种群中超级精英使用均衡调整策略动态更新,将多粒度子种群超级精英划分到一个等腰直角三角形内容,分别计算各自的粒度值
如果两个超级精英具有相同较低粒度
则他们的近似度属性值收敛于均衡对为
如果两个超级精英具有相同较高粒度
则他们的近似度属性值收敛于均衡对为
该均衡调整策略有利于增加多粒度子种群最优一致均衡度。
D. Use the equilibrium adjustment strategy to dynamically update the super elites in the multi-granularity sub-population, divide the multi-granularity sub-population super elites into an isosceles right-angled triangle content, and calculate their respective granularity values. If two super elites have the same lower granularity Then their approximation attribute values converge to the equilibrium pair as If two super elites have the same higher granularity Then their approximation attribute values converge to the equilibrium pair as This equilibrium adjustment strategy is beneficial to increase the optimal uniform equilibrium degree of multi-granularity subpopulations.
E.构建多粒度子种群超级精英动态模糊协同分割策略,在动态精英优势区域内对大规模脑病历属性进行全局搜索分割与局部精化分割,在多粒度子种群中执行竞争和合作的混合协同,构建大规模脑病历属性分割最优一致均衡度和概率度,使超级精英在各自对应的Pareto优势区域内协同提取知识约简子集,并能稳定分割大规模脑病历不同的属性区域,求得大规模脑病历最优特征集
具体包括以下步骤:
E. Construct a multi-granularity subpopulation super-elite dynamic fuzzy collaborative segmentation strategy, perform global search segmentation and local refinement segmentation on large-scale brain medical record attributes in the dynamic elite dominance area, and perform a hybrid collaboration of competition and cooperation in multi-granularity subpopulations , To construct the optimal uniformity and probability of large-scale brain medical record attribute segmentation, so that super elites can collaboratively extract knowledge reduction subsets in their corresponding Pareto superior areas, and can stably segment large-scale brain medical records with different attribute areas. Optimal feature set of large-scale brain medical records It includes the following steps:
a.设两个相邻的超级精英聚类为
和
它们的精英成员关系度分别为
和
a. Suppose two adjacent super elite clusters are with Their elite membership degrees are respectively with
b.如果
则超级精英将演变成精英聚类
的组合;否则将演变成精英聚类
的组合;
b. If Super elites will evolve into elite clusters The combination of; otherwise it will evolve into an elite cluster The combination;
c.在多粒度子种群中执行竞争和合作的混合协同的大规模脑病历分割,假设S
i为第i个超级精英,在i=1至|S
i|执行如下操作:
. c execution Competition and Cooperation in Multi-granularity subpopulation mixed synergistic medical split brain mass, assuming S i is the i-th super elite, the i = 1 to | perform operations | S i:
(1)插入S
i超级精英的代表S
i,rep到P
i
t中;
(1) is inserted into S i representing super elite S i, rep in the P i t;
(2)如果n
x>|S
i|,从多粒度子种群Granu-subpopulation
i中选择超级精英P
i
t;
(2) if n x> | S i |, selected from a plurality of super elite P i granularity Granu-subpopulation i subset of T;
(3)将所有的S
i,j和其他多粒度子种群Granu-subpopulation
i的解进行组合,将其进行排序值和计算出S
i,j的小生成境数;
(3) Combine all the solutions of S i,j and other multi-granularity subpopulation Granu-subpopulation i , sort them and calculate the number of small generation environments of S i,j;
(4)更新S
i的超级精英代表取得Pareto优势区域内非优势解,决定获胜的多粒度子种群,并更新S
i=S
k;
(4) The super-elite representative who updates S i obtains the non-dominant solution in the dominant area of Pareto, decides the winning multi-granularity subpopulation, and updates S i =S k ;
d.超级精英的模糊成员度
uCh(P
i)采用相似成员方式计算,其中参考值P
i和超级精英中心C
h之间的距离定义为d(P
i,C
h);
. d fuzzy membership degree of super elite uCh (P i) calculated using a member similar manner, wherein a distance defined between the reference values P i and the super elite center C h is d (P i, C h) ;
e.对每一个多粒度子种群超级精英计算均衡CI为
一致概率CR为
其中t∈{1,2,...,s};
e. Calculate the equilibrium CI for each super-elite with multiple granularity subpopulations as The consensus probability CR is Where t∈{1,2,...,s};
f.对于任何不一致均衡度
获得第t个多粒度子种群超级精英最优一致均衡度为
其中
f. For any inconsistent balance Obtain the optimal uniform equilibrium degree of the t-th multi-granularity subpopulation super elite as among them
g.取得所有超级精英的全局最优一致概率度为
t∈{1,2,...,s},构建 大规模脑病历属性分割最优一致均衡度和概率度对为
t∈{1,2,...,s};
g. The global optimal consensus probability of obtaining all super elites is t∈{1,2,...,s}, construct the optimal consistent equilibrium degree and probability degree pair of large-scale brain disease record attribute segmentation as t∈{1,2,...,s};
h.超级精英基于最优一致均衡度和概率度对
分割脑病历不同属性区域的特征集为F
1,F
2,...,F
n,求得大规模脑病历最优特征集
h. Super elites are based on the optimal consistent equilibrium degree and probability degree pair Segment the feature sets of different attribute regions of brain medical records as F 1 , F 2 ,..., F n , and obtain the optimal feature set of large-scale brain medical records
F.比较上述求出的大规模脑病历分割精度RC与预先设定精度值η关系,若满足RC≥η,则输出大规模脑病历最优分割知识集。否则,继续执行上述C、D和E步骤,直至大规模脑病历分割精度满足RC≥η;F. Compare the relationship between the large-scale brain medical record segmentation accuracy RC obtained above and the preset accuracy value η, if RC≥η, then output the large-scale brain medical record optimal segmentation knowledge set. Otherwise, continue to perform the above steps C, D, and E until the segmentation accuracy of large-scale brain medical records meets RC≥η;
G.将大数据脑病历分割最优特征集
存储至Spark云平台中,为大规模脑病历相关疾病的临床诊断和治疗提供重要的智能辅助诊断知识依据。
G. Segmenting the optimal feature set of the big data brain medical record It is stored in the Spark cloud platform to provide an important knowledge basis for intelligent auxiliary diagnosis for the clinical diagnosis and treatment of diseases related to large-scale brain medical records.
本发明采用基于多粒度Spark超信任模型,构建多粒度种群内不同超级精英之间信任度,对超级精英使用不同的多粒度子种群均衡调整策略进行动态更新,对大规模脑病历进行全局搜索分割与局部精化分割,超级精英在各自区域内能协同提取知识约简子集,大大降低了执行时间,提升了大规模脑病历分割精度。The invention adopts a multi-granular Spark super trust model to construct trust between different super elites in a multi-granular population, uses different multi-granular sub-population balance adjustment strategies for super elites to dynamically update, and performs global search and segmentation of large-scale brain disease records With local refined segmentation, super elites can collaboratively extract knowledge reduction subsets in their respective regions, which greatly reduces the execution time and improves the accuracy of large-scale brain medical record segmentation.
本发明在Spark云平台上基于动态精英优势区域构建多粒度种群超级精英动态协同操作机制,取得了大规模脑病历分割最优一致均衡,降低了大规模脑病历特征分割的复杂度成本,进一步提高了云计算Spark云平台上大规模脑病历并行特征提取的细粒度和鲁棒性,为开展脑病历特征选择、规则挖掘以及临床决策支持等智能服务奠定了较好的基础。The present invention constructs a multi-granularity population super-elite dynamic cooperative operation mechanism on the Spark cloud platform based on the dynamic elite dominant area, achieves the optimal and consistent balance of large-scale brain disease record segmentation, reduces the complexity cost of large-scale brain disease record feature segmentation, and further improves The fine-grained and robustness of large-scale parallel feature extraction of brain medical records on the cloud computing Spark cloud platform has laid a good foundation for the development of intelligent services such as brain medical record feature selection, rule mining, and clinical decision support.
对所公开的实施例的上述说明,使本领域专业技术人员能够实现或使用本发明。对这些实施例的多种修改对本领域的专业技术人员来说将是显而易见的,本文中所定义的一般原理可以在不脱离本发明的精神或范围的情况下,在其它实施例中实现。The foregoing description of the disclosed embodiments enables those skilled in the art to implement or use the present invention. Various modifications to these embodiments will be obvious to those skilled in the art, and the general principles defined herein can be implemented in other embodiments without departing from the spirit or scope of the present invention.
因此,本发明将不会被限制于本文所示的这些实施例,而是要符合与本文所公开的原理和新颖特点相一致的最宽的范围。Therefore, the present invention will not be limited to the embodiments shown in this document, but should conform to the widest scope consistent with the principles and novel features disclosed in this document.