CN111563549A - Medical image clustering method based on multitask evolutionary algorithm - Google Patents
Medical image clustering method based on multitask evolutionary algorithm Download PDFInfo
- Publication number
- CN111563549A CN111563549A CN202010364563.4A CN202010364563A CN111563549A CN 111563549 A CN111563549 A CN 111563549A CN 202010364563 A CN202010364563 A CN 202010364563A CN 111563549 A CN111563549 A CN 111563549A
- Authority
- CN
- China
- Prior art keywords
- roi
- medical image
- clustering
- image
- cluster
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 38
- 238000005457 optimization Methods 0.000 claims description 12
- 230000004913 activation Effects 0.000 claims description 10
- 230000035772 mutation Effects 0.000 claims description 5
- 239000006185 dispersion Substances 0.000 claims description 3
- 239000011159 matrix material Substances 0.000 claims description 3
- 230000003044 adaptive effect Effects 0.000 claims 1
- 238000004364 calculation method Methods 0.000 claims 1
- 125000004432 carbon atom Chemical group C* 0.000 claims 1
- 238000004891 communication Methods 0.000 abstract description 4
- 230000000694 effects Effects 0.000 abstract description 4
- 238000001994 activation Methods 0.000 description 7
- 238000005516 engineering process Methods 0.000 description 3
- 238000013075 data extraction Methods 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 238000003745 diagnosis Methods 0.000 description 1
- 238000002059 diagnostic imaging Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 201000010099 disease Diseases 0.000 description 1
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 230000002068 genetic effect Effects 0.000 description 1
- 230000003902 lesion Effects 0.000 description 1
- 230000013011 mating Effects 0.000 description 1
- 230000005012 migration Effects 0.000 description 1
- 238000013508 migration Methods 0.000 description 1
- 238000005065 mining Methods 0.000 description 1
- 239000002245 particle Substances 0.000 description 1
- 230000035945 sensitivity Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Image Analysis (AREA)
Abstract
Description
技术领域technical field
本发明涉及医学技术和和智能计算两大领域,尤其涉及到基于多任务进化算法的医学图像聚类方法。The invention relates to the two fields of medical technology and intelligent computing, in particular to a medical image clustering method based on a multi-task evolutionary algorithm.
背景技术Background technique
在近20年来,医学成像技术成为了医学技术中发展飞速的领域之一,医学图像越来越容易获得和存储。在图像数据中,医疗图像数据占据了很大的比例。在医疗系统中,医学图像有着重要的作用。通过对医学图像的观察,病变部位做出更直接更清晰地判断,从而得到更加准确的诊断结果。准确地进行医学图像聚类,可以更好的在医务人员判断和诊断病情病因时提供科学参考,从而会大大减少因为人类本身视力分辨力不足或是医疗人员主观上临床经验不足产生的误诊率,进一步提高医学图像的利用率。In the past 20 years, medical imaging technology has become one of the rapidly developing fields in medical technology, and medical images are more and more easily obtained and stored. Among the image data, medical image data occupies a large proportion. In the medical system, medical images play an important role. Through the observation of medical images, the lesion can be judged more directly and clearly, so as to obtain more accurate diagnosis results. Accurate clustering of medical images can better provide scientific reference for medical staff to judge and diagnose the cause of the disease, thereby greatly reducing the misdiagnosis rate caused by insufficient human vision resolution or insufficient clinical experience of medical staff. Further improve the utilization of medical images.
目前,对于医学图像聚类来说,虽然国内外已经有将传统的聚类算法移植到医学图像上,比如基于K均值的方法和它的很多变种,但该些都存在“对初始中心敏感和对K取值敏感、基于密度的方法大都对领域半径和MinPtr的取值敏感,并且受噪声的影响比较大、基于网格的方法在精度上有所缺失”的缺点。At present, for medical image clustering, although traditional clustering algorithms have been transplanted to medical images at home and abroad, such as the method based on K-means and its many variants, but these all have "sensitivity to the initial center and Sensitive to the value of K, the density-based methods are mostly sensitive to the value of the field radius and MinPtr, and are greatly affected by noise, and the grid-based methods are lacking in accuracy”.
此外还有使用演化算法进行聚类,常用的有基于遗传算法或者差分算法等,但它们基于单任务的框架,一次运行只能对单个目标进行优化,得到单一的优化结果。In addition, there is also the use of evolutionary algorithms for clustering. Commonly used are genetic algorithms or differential algorithms, but they are based on single-task frameworks. Only a single target can be optimized in one operation to obtain a single optimization result.
还有的是,对于医务人员来说,并非每个医学图像中的像素都是值得去观察的,医生们会更关注那些与众不同的像素区域,这些区域称之为ROI(region of interest)。在以往的图像研究中,研究人员都是基于传统图像的特征(比如颜色、纹理和形状)来提取ROI的。但是这并不适用的医学图像。和普通的图像数据相比,医学图像有很多特点。What's more, for medical staff, not every pixel in a medical image is worth observing, and doctors will pay more attention to those distinctive pixel areas, which are called ROI (region of interest). In previous image studies, researchers extracted ROIs based on traditional image features such as color, texture, and shape. But this does not apply to medical images. Compared with ordinary image data, medical images have many characteristics.
发明内容SUMMARY OF THE INVENTION
本发明的目的在于克服现有技术的不足,提供一种更能充分表达医学图像的内涵、能同时优化一个种群得到多个聚类结果、通过跨域交流更容易收敛到全局最优、聚类效果更加明显的基于多任务进化算法的医学图像聚类方法,The purpose of the present invention is to overcome the deficiencies of the prior art, to provide a method that can more fully express the connotation of medical images, can simultaneously optimize a population to obtain multiple clustering results, and is easier to converge to the global optimum through cross-domain communication. The medical image clustering method based on multi-task evolutionary algorithm with more obvious effect,
为实现上述目的,本发明所提供的技术方案为:For achieving the above object, the technical scheme provided by the present invention is:
基于多任务进化算法的医学图像聚类方法,包括以下步骤:The medical image clustering method based on multi-task evolutionary algorithm includes the following steps:
S1、提取医学图像的ROI特征描述数据;S1. Extract the ROI feature description data of the medical image;
S2、读入提取医学图像的ROI特征描述数据,运用NMP聚类规则,通过对多个聚类内部指标进行优化,在多任务框架下得到多个聚类结果;S2. Read in and extract the ROI feature description data of the medical image, and use the NMP clustering rules to obtain multiple clustering results under the multi-task framework by optimizing multiple clustering internal indicators;
S3、利用医生的专家知识从中选出一个最优的结果。S3. Use the expert knowledge of the doctor to select an optimal result.
进一步地,所述步骤S1提取的医学图像的ROI特征描述数据包括相对灰度s1、相对面积s2、相对质心坐标s3、似圆性s4、角度s5以及对称性s6。Further, the ROI feature description data of the medical image extracted in the step S1 includes relative grayscale s1, relative area s2, relative centroid coordinates s3, circularity s4, angle s5, and symmetry s6.
进一步地,所述步骤S1提取医学图像的ROI特征描述数据的具体过程如下:Further, the specific process of extracting the ROI feature description data of the medical image in the step S1 is as follows:
S1-1、读入医学图像;S1-1, read in medical images;
S1-2、扫描医学图像,得到图像的像素数、长、宽和灰度最大值;S1-2. Scan the medical image to obtain the pixel number, length, width and maximum gray value of the image;
S1-3、检测医学图像的对称性s6;S1-3, detect the symmetry s6 of the medical image;
S1-4、根据灰度范围提取医学图像的ROI区域;S1-4, extract the ROI area of the medical image according to the grayscale range;
S1-5、对步骤S1-4得到的ROI区域提取灰度均值、像素个数、最长轴、最短轴、质心坐标以及角度;S1-5, extracting the mean gray value, the number of pixels, the longest axis, the shortest axis, the coordinates of the centroid and the angle from the ROI region obtained in step S1-4;
S1-6、计算ROI区域的特征描述数据。S1-6, calculate the feature description data of the ROI area.
进一步地,所述步骤S1-3检测医学图像对称性s6的过程为:Further, the process of detecting the symmetry s6 of the medical image in the steps S1-3 is:
将医学图像对折做差,并用灰度阈值做二值化处理,若剩余的像素点少于设定值,则判定医学图像对称,反之则不对称;The medical image is folded in half, and the grayscale threshold is used for binarization. If the remaining pixels are less than the set value, the medical image is judged to be symmetrical, otherwise it is asymmetrical;
所述步骤S1-6计算ROI区域的特征描述数据的具体过程如下:The specific process of calculating the feature description data of the ROI region in the step S1-6 is as follows:
相对灰度s1:其中ROI.gray为ROI区域的平均灰度,IMAGE.gray为整个图像的平均灰度;Relative grayscale s1: Among them, ROI.gray is the average gray level of the ROI area, and IMAGE.gray is the average gray level of the entire image;
相对面积s2:s2=ROI.area/IMAGE.area,ROI.area为ROI区域的像素个数,IMAGE.area为整个图像的像素个数;Relative area s2: s2=ROI.area/IMAGE.area, ROI.area is the number of pixels in the ROI area, and IMAGE.area is the number of pixels in the entire image;
相对质心坐标s3:s3=(ROI.x/IMAGE.length,ROI.y/IMAGE.height),ROI.x为ROI质心的横坐标,IMAGE.length为原图像的长度,ROI.y为ROI质心的纵坐标,IMAGE.height为原图像的高度;Relative centroid coordinate s3: s3=(ROI.x/IMAGE.length, ROI.y/IMAGE.height), ROI.x is the abscissa of the ROI centroid, IMAGE.length is the length of the original image, and ROI.y is the ROI centroid , IMAGE.height is the height of the original image;
似圆性s4:s4=4×π×ROI.area/ROI.perimeter2,ROI.area为ROI区域的像素个数,ROI.perimeter为ROI区域周边像素的个数;Circular s4: s4=4×π×ROI.area/ROI.perimeter 2 , ROI.area is the number of pixels in the ROI area, and ROI.perimeter is the number of pixels around the ROI area;
角度s5:s5=(Orientation+90)/180,Orientation为ROI的长轴到X轴的夹角。Angle s5: s5=(Orientation+90)/180, and Orientation is the angle between the long axis of the ROI and the X axis.
进一步地,所述步骤S2的具体过程如下:Further, the specific process of the step S2 is as follows:
S2-1、读入提取医学图像的ROI特征描述数据;S2-1. Read in and extract the ROI feature description data of the medical image;
S2-2、初始化种群和最大迭代次数n,k=0;S2-2, initialize the population and the maximum number of iterations n, k=0;
S2-3、结合NMP聚类规则执行聚类并用聚类指标进行评估;S2-3, perform clustering in combination with NMP clustering rules and use the clustering index to evaluate;
S2-4、计算每个个体的技能因子τ;S2-4, calculate the skill factor τ of each individual;
S2-5、生成子代;从种群中随机选取两个个体a和b作为父代,产生一个大于0小于1的随机数rand,如果rand小于算法参数rmp或者两个个体的技能因子τ相等,则执行模拟二进制交叉,否则分别对这两个个体执行变异;重复杂交或变异的步骤直至得到子代个数和种群个体数相等时停止,并进入步骤S2-6;S2-5. Generate offspring; randomly select two individuals a and b from the population as the parent, and generate a random number rand greater than 0 and less than 1. If rand is less than the algorithm parameter rmp or the skill factor τ of the two individuals is equal, Then perform the simulated binary crossover, otherwise perform mutation on the two individuals respectively; repeat the steps of crossover or mutation until the number of offspring is equal to the number of individuals in the population, stop, and enter step S2-6;
S2-6、计算生成的子代在各个聚类指标优化任务下的适应值;S2-6, calculating the fitness value of the generated offspring under each clustering index optimization task;
S2-7、将父代和子代合并组成新的种群,聚类后根据适应值重新计算更新其中所有个体的技能因子τ和标量适应值φ;S2-7, merge the parent and child to form a new population, and recalculate and update the skill factor τ and scalar fitness value φ of all individuals in it according to the fitness value after clustering;
S2-8、根据标量适应值φ对种群内个体排序,之后从优到劣依次选取个体进入下一代种群,φ更大的个体优先被选入下一代种群;k=k+1;S2-8. Sort the individuals in the population according to the scalar fitness value φ, and then select individuals from superior to inferior to enter the next-generation population, and individuals with larger φ are preferentially selected into the next-generation population; k=k+1;
S2-9、若k<n,则返回步骤S2-3,否则停止迭代;S2-9, if k<n, return to step S2-3, otherwise stop the iteration;
S2-10、找出技能因子τ等于1的个体,记录作为当前各个任务最优的个体。S2-10, find out the individual whose skill factor τ is equal to 1, and record the individual as the current optimal individual for each task.
进一步地,所述步骤S2-2中,种群通过以下编码完成初始化:Further, in the step S2-2, the population is initialized by the following coding:
设定聚类类别数的最大值Kmax,种群的个体编码为Kmax+Kmax*d维的向量d为数据维度数,mij为簇中心的坐标向量,Tij(j=1,...,Kmax)为得到类别质心点的激活阈值;Set the maximum value K max of the number of cluster categories, and the individual coding of the population is a vector of K max +K max *d dimension d is the number of data dimensions, m ij is the coordinate vector of the cluster center, and T ij (j=1,...,K max ) is the activation threshold for obtaining the class centroid point;
激活得到质心点的定义具体如下:The definition of the activated centroid point is as follows:
若Tij大于0.5,则对应簇的质心点mij被激活,否则不被激活;若Tij大于1或为负数则重置为1或0,如果得到的质心数小于设置的最小类别数则随机选取几个激活阈值激活以满足最小类别数的要求。If T ij is greater than 0.5, the centroid point m ij of the corresponding cluster is activated, otherwise it is not activated; if T ij is greater than 1 or a negative number, it is reset to 1 or 0, and if the number of centroids obtained is less than the minimum number of categories set, then Several activation threshold activations are randomly selected to satisfy the minimum number of classes.
进一步地,步骤S2-3中,所述NMP聚类规则具体如下:Further, in step S2-3, the NMP clustering rules are specifically as follows:
给定一个个数为N数据集,数据表示为X=(x1,...,xN),K个簇的质心为C={C1,...,CK},D表示距离,那么,在NMP规则中样本点xi=(i=0,...,N)和某一聚类类别Ch(h=1,...,K)的距离定义如下:Given a data set with a number of N, the data is represented as X=(x 1 ,...,x N ), the centroids of K clusters are C={C 1 ,...,C K }, and D represents the distance , then, in the NMP rule, the distance between the sample point x i =(i=0,...,N) and a certain clustering category C h (h=1,...,K) is defined as follows:
D(xi,Ch)=min{D(xi,xj),D(xi,mh)|xj∈Ch}D(x i ,C h )=min{D(x i ,x j ),D(x i ,m h )|x j ∈C h }
即样本到簇类别的距离是样本到簇各点中距离最小的那个距离;That is, the distance from the sample to the cluster category is the distance from the sample to the points of the cluster with the smallest distance;
每个样本被分配到最近的簇中,所有被分配到同一个簇的样本组成了一个候选样本集,这个簇称为这些样本的一个未确定簇;然后,对每一个簇,从候选样本集中选一个最近的样本,所有簇的这种最近样本点合并称为最近样本集;最后,在最近样本集中找到一个样本,这个样本离它所在的未确定簇距离是最近样本集中最小的,便将该样本分配到它的未确定簇中;不断重复上述步骤直到所有样本被分配完毕。Each sample is assigned to the nearest cluster, and all samples assigned to the same cluster form a candidate sample set, which is called an undetermined cluster of these samples; then, for each cluster, from the candidate sample set Select a nearest sample, and the nearest sample points of all clusters are combined as the nearest sample set; finally, find a sample in the nearest sample set, the distance between this sample and the undetermined cluster where it is located is the smallest in the nearest sample set. The sample is assigned to its undetermined cluster; the above steps are repeated until all samples have been assigned.
进一步地,步骤S2-3中,所述聚类指标包括CH指标、Dunn指标、SIL指标,分别对应一个优化任务;各指标具体如下:Further, in step S2-3, the clustering index includes CH index, Dunn index, and SIL index, respectively corresponding to an optimization task; each index is as follows:
CH指标: CH indicator:
上式中,表示类别间离差矩阵的迹,m表示整个数据集的平均值向量;In the above formula, Represents the trace of the inter-class dispersion matrix, and m represents the mean vector of the entire dataset;
Dunn指标: Dunn Metrics:
上式中,D(Ci,Cj)代表不同类别间的距离是两个最靠近的数据点之间的距离,公式表达如下:δ(Ci)为这个类别的两个最远距离点间距离: In the above formula, D(C i , C j ) represents that the distance between different categories is the distance between the two closest data points. The formula is expressed as follows: δ(C i ) is the distance between the two most distant points of this category:
SIL指标: SIL indicator:
上式中,sj=(bj-aj)/max(aj,bj)代表数据点xj的轮廓宽度;数据点xj到它所属类别的其他数据点的平均距离aj和到其他类别数据点的最小距离bj的计算公式如下:In the above formula, s j =(b j -a j )/max(a j ,b j ) represents the outline width of the data point x j ; the average distances a j and The formula for calculating the minimum distance b j to other categories of data points is as follows:
进一步地,步骤S2-5中,所述模拟二进制交叉的操作如下:Further, in step S2-5, the operation of simulating binary crossover is as follows:
设有两个父代xa=[xa(1),...,xa(d)]和xb=[xb(1),...,xb(d)],d为数据的维度,首先计算分布因子c(j):Given two parents x a =[x a (1),...,x a (d)] and x b =[x b (1),...,x b (d)], d is The dimension of the data, first calculate the distribution factor c(j):
其中,β为一个大于0的系统参数,r在各个维度上都是大于0小于1的随机数;Among them, β is a system parameter greater than 0, and r is a random number greater than 0 and less than 1 in each dimension;
得到子代为:Get the offspring as:
xe(j)=[(1+c(j))xa(j)+(1-c(j))xb(j)]/2x e (j)=[(1+c(j))x a (j)+(1-c(j))x b (j)]/2
xf(j)=[(1+c(j))xb(j)+(1-c(j))xa(j)]/2。 xf (j)=[(1+c(j))xb(j)+(1- c (j)) xa (j)]/2.
与现有技术相比,本方案原理及优点如下:Compared with the prior art, the principle and advantages of this scheme are as follows:
1、通过多因子进化算法,针对CH指标、Dunn指标、SIL指标等聚类内部指标同时优化一个种群得到多个聚类结果,通过跨域交流更容易收敛到全局最优。1. Through the multi-factor evolution algorithm, a population is simultaneously optimized for the internal indicators of the cluster such as CH index, Dunn index, and SIL index to obtain multiple clustering results, and it is easier to converge to the global optimum through cross-domain communication.
2、传统的基于颜色、纹理和形状的图像特征挖掘方法往往忽视了医学图像ROI独特属性承载的信息,本发明使用的ROI属性为相对灰度用、相对面积、相对质心坐标、似圆性、角度和对称性,在考虑传统图像特征时也考虑到了医学图像特有的特征。2. The traditional image feature mining method based on color, texture and shape often ignores the information carried by the unique attributes of the medical image ROI. The ROI attributes used in the present invention are relative grayscale, relative area, relative centroid coordinates, circularity, Angle and symmetry, characteristics specific to medical images are also taken into account when considering traditional image features.
3、基于NMP(nearest multiple prototypes)的聚类规则:传统的粒子群优化方式应用于聚类优化时,对于聚类形态非圆形的聚类问题,效果不佳。同样是基于距离的聚类规则,NMP规则是一个动态的过程,更加灵活,通常有更好的聚类效果。3. Clustering rules based on NMP (nearest multiple prototypes): When the traditional particle swarm optimization method is applied to clustering optimization, the effect is not good for the clustering problem of non-circular cluster shape. It is also a distance-based clustering rule. NMP rule is a dynamic process, which is more flexible and usually has better clustering effect.
附图说明Description of drawings
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的服务作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to illustrate the embodiments of the present invention or the technical solutions in the prior art more clearly, the following briefly introduces the services required in the description of the embodiments or the prior art. Obviously, the drawings in the following description are only For some embodiments of the present invention, for those of ordinary skill in the art, other drawings can also be obtained according to these drawings without creative efforts.
图1为本发明基于多任务进化算法的医学图像聚类方法的整体流程图;Fig. 1 is the overall flow chart of the medical image clustering method based on multi-task evolutionary algorithm of the present invention;
图2为医学图像ROI数据提取的流程图;Fig. 2 is the flow chart of medical image ROI data extraction;
图3为多任务聚类进化算法编码示意图。FIG. 3 is a schematic diagram of coding of multi-task cluster evolution algorithm.
具体实施方式Detailed ways
下面结合具体实施例对本发明作进一步说明:Below in conjunction with specific embodiment, the present invention will be further described:
如图1所示,本实施例所述的基于多任务进化算法的医学图像聚类方法,包括以下步骤:As shown in Figure 1, the medical image clustering method based on the multi-task evolutionary algorithm described in this embodiment includes the following steps:
S1、提取医学图像的ROI特征描述数据,其中包括相对灰度s1、相对面积s2、相对质心坐标s3、似圆性s4、角度s5以及对称性s6。S1. Extract ROI feature description data of the medical image, including relative grayscale s1, relative area s2, relative centroid coordinates s3, circularity s4, angle s5, and symmetry s6.
如图2所示,提取的具体过程如下:As shown in Figure 2, the specific process of extraction is as follows:
S1-1、读入医学图像;S1-1, read in medical images;
S1-2、扫描医学图像,得到图像的像素数、长、宽和灰度最大值;S1-2. Scan the medical image to obtain the pixel number, length, width and maximum gray value of the image;
S1-3、检测医学图像的对称性s6:S1-3. Detect the symmetry of the medical image s6:
将医学图像对折做差,并用灰度阈值做二值化处理,若剩余的像素点少于设定值,则判定医学图像对称,反之则不对称;The medical image is folded in half, and the grayscale threshold is used for binarization. If the remaining pixels are less than the set value, the medical image is judged to be symmetrical, otherwise it is asymmetrical;
S1-4、根据灰度范围提取医学图像的ROI区域;S1-4, extract the ROI area of the medical image according to the grayscale range;
S1-5、对步骤S1-4得到的ROI区域提取灰度均值、像素个数、最长轴、最短轴、质心坐标以及角度;S1-5, extracting the mean gray value, the number of pixels, the longest axis, the shortest axis, the coordinates of the centroid and the angle from the ROI region obtained in step S1-4;
S1-6、计算ROI区域的特征描述数据:S1-6. Calculate the feature description data of the ROI area:
相对灰度s1:其中ROI.gray为ROI区域的平均灰度,IMAGE.gray为整个图像的平均灰度;Relative grayscale s1: Among them, ROI.gray is the average gray level of the ROI area, and IMAGE.gray is the average gray level of the entire image;
相对面积s2:s2=ROI.area/IMAGE.area,ROI.area为ROI区域的像素个数,IMAGE.area为整个图像的像素个数;Relative area s2: s2=ROI.area/IMAGE.area, ROI.area is the number of pixels in the ROI area, and IMAGE.area is the number of pixels in the entire image;
相对质心坐标s3:s3=(ROI.x/IMAGE.length,ROI.y/IMAGE.height),ROI.x为ROI质心的横坐标,IMAGE.length为原图像的长度,ROI.y为ROI质心的纵坐标,IMAGE.height为原图像的高度;Relative centroid coordinate s3: s3=(ROI.x/IMAGE.length, ROI.y/IMAGE.height), ROI.x is the abscissa of the ROI centroid, IMAGE.length is the length of the original image, and ROI.y is the ROI centroid , IMAGE.height is the height of the original image;
似圆性s4:s4=4×π×ROI.area/ROI.perimeter2,ROI.area为ROI区域的像素个数,ROI.perimeter为ROI区域周边像素的个数;Circular s4: s4=4×π×ROI.area/ROI.perimeter 2 , ROI.area is the number of pixels in the ROI area, and ROI.perimeter is the number of pixels around the ROI area;
角度s5:s5=(Orientation+90)/180,Orientation为ROI的长轴到X轴的夹角。Angle s5: s5=(Orientation+90)/180, and Orientation is the angle between the long axis of the ROI and the X axis.
S2、提取医学图像的ROI特征描述数据后,进行数据读入,运用NMP聚类规则,通过对多个聚类内部指标进行优化,在多任务框架下得到多个聚类结果;具体过程如下:S2. After extracting the ROI feature description data of the medical image, read in the data, use the NMP clustering rules, and obtain multiple clustering results under the multi-task framework by optimizing the internal indicators of multiple clusters; the specific process is as follows:
S2-1、读入提取医学图像的ROI特征描述数据;S2-1. Read in and extract the ROI feature description data of the medical image;
S2-2、初始化种群和最大迭代次数n,k=0;其中,本步骤通过以下编码完成初始化:S2-2, initialize the population and the maximum number of iterations n, k=0; wherein, this step completes the initialization through the following coding:
设定聚类类别数的最大值Kmax,种群的个体编码为Kmax+Kmax*d维的向量d为数据维度数,mij为簇中心的坐标向量,Tij(j=1,...,Kmax)为得到类别质心点的激活阈值;Set the maximum value K max of the number of cluster categories, and the individual coding of the population is a vector of K max +K max *d dimension d is the number of data dimensions, m ij is the coordinate vector of the cluster center, and T ij (j=1,...,K max ) is the activation threshold for obtaining the class centroid point;
激活得到质心点的定义具体如下:The definition of the activated centroid point is as follows:
若Tij大于0.5,则对应簇的质心点mij被激活,否则不被激活;若Tij大于1或为负数则重置为1或0,如果得到的质心数小于设置的最小类别数则随机选取几个激活阈值激活以满足最小类别数的要求。如附图的图3所示,一个维度d为2,最大聚类类别数Kmax为4的某个个体,前面为激活阈值,第二个位置0.4小于0.5,因此它对应的第二个质心为未激活状态,以此类推,其他位置的阈值大于0.5从而被激活。因此,这个个体的聚类数目为3。实际情况中,d被设置为6。If T ij is greater than 0.5, the centroid point m ij of the corresponding cluster is activated, otherwise it is not activated; if T ij is greater than 1 or a negative number, it is reset to 1 or 0, and if the number of centroids obtained is less than the minimum number of categories set, then Several activation threshold activations are randomly selected to satisfy the minimum number of classes. As shown in Figure 3 of the accompanying drawings, an individual whose dimension d is 2 and the maximum number of cluster categories Kmax is 4, the front is the activation threshold, and the second position 0.4 is less than 0.5, so its corresponding second centroid is Inactive state, and so on, the threshold value of other positions is greater than 0.5 to be activated. Therefore, the number of clusters for this individual is 3. In practice, d is set to 6.
S2-3、结合NMP聚类规则执行聚类并用聚类指标进行评估;S2-3, perform clustering in combination with NMP clustering rules and use the clustering index to evaluate;
本步骤所述的NMP聚类规则具体如下:The NMP clustering rules described in this step are as follows:
给定一个个数为N数据集,数据表示为X=(x1,...,xN),K个簇的质心为C={C1,...,CK},D表示距离,那么,在NMP规则中样本点xi=(i=0,...,N)和某一聚类类别Ch(h=1,...,K)的距离定义如下:Given a data set with a number of N, the data is represented as X=(x 1 ,...,x N ), the centroids of K clusters are C={C 1 ,...,C K }, and D represents the distance , then, in the NMP rule, the distance between the sample point x i =(i=0,...,N) and a certain clustering category C h (h=1,...,K) is defined as follows:
D(xi,Ch)=min{D(xi,xj),D(xi,mh)|xj∈Ch}D(x i ,C h )=min{D(x i ,x j ),D(x i ,m h )|x j ∈C h }
即样本到簇类别的距离是样本到簇各点中距离最小的那个距离;That is, the distance from the sample to the cluster category is the distance from the sample to the points of the cluster with the smallest distance;
每个样本被分配到最近的簇中,所有被分配到同一个簇的样本组成了一个候选样本集,这个簇称为这些样本的一个未确定簇;然后,对每一个簇,从候选样本集中选一个最近的样本,所有簇的这种最近样本点合并称为最近样本集;最后,在最近样本集中找到一个样本,这个样本离它所在的未确定簇距离是最近样本集中最小的,便将该样本分配到它的未确定簇中;不断重复上述步骤直到所有样本被分配完毕。Each sample is assigned to the nearest cluster, and all samples assigned to the same cluster form a candidate sample set, which is called an undetermined cluster of these samples; then, for each cluster, from the candidate sample set Select a nearest sample, and the nearest sample points of all clusters are combined as the nearest sample set; finally, find a sample in the nearest sample set, the distance between this sample and the undetermined cluster where it is located is the smallest in the nearest sample set. The sample is assigned to its undetermined cluster; the above steps are repeated until all samples have been assigned.
而聚类指标包括CH指标、Dunn指标、SIL指标,分别对应一个优化任务;各指标具体如下:The clustering index includes CH index, Dunn index, and SIL index, which correspond to an optimization task respectively. The specific indicators are as follows:
CH指标: CH indicator:
上式中,表示类别间离差矩阵的迹,m表示整个数据集的平均值向量;In the above formula, Represents the trace of the inter-class dispersion matrix, and m represents the mean vector of the entire dataset;
Dunn指标: Dunn Metrics:
上式中,D(Ci,Cj)代表不同类别间的距离是两个最靠近的数据点之间的距离,公式表达如下:δ(Ci)为这个类别的两个最远距离点间距离: In the above formula, D(C i , C j ) represents that the distance between different categories is the distance between the two closest data points. The formula is expressed as follows: δ(C i ) is the distance between the two most distant points of this category:
SIL指标: SIL indicator:
上式中,sj=(bj-aj)/max(aj,bj)代表数据点xj的轮廓宽度;数据点xj到它所属类别的其他数据点的平均距离aj和到其他类别数据点的最小距离bj的计算公式如下:In the above formula, s j =(b j -a j )/max(a j ,b j ) represents the outline width of the data point x j ; the average distances a j and The formula for calculating the minimum distance b j to other categories of data points is as follows:
S2-4、计算每个个体的技能因子τ,该值记录单个个体i在所有任务中表现最好的那一个优化任务。如个体i,在第j个任务的个体排序中最靠前,则τi=j。在计算技能因子时,对种群中所有个体在某个任务下依照适应值大小排序后,在序列中对应的序号即为某个个体i;在对应任务j的阶乘等级ri j。S2-4. Calculate the skill factor τ of each individual, and this value records the optimal task that a single individual i performs best in all tasks. For example, if the individual i is the most advanced in the individual ranking of the jth task, then τ i =j. When calculating the skill factor, after sorting all individuals in the population according to the size of the fitness value under a certain task, the corresponding serial number in the sequence is an individual i; the factorial level r i j corresponding to the task j .
S2-5、生成子代;从种群中随机选取两个个体a和b作为父代,产生一个大于0小于1的随机数rand,如果rand小于算法参数rmp(random mating probability)或者两个个体的技能因子τ相等,则执行模拟二进制交叉,否则分别对这两个个体执行变异;重复杂交或变异的步骤直至得到子代个数和种群个体数相等时停止,并进入步骤S2-6;S2-5. Generate offspring; randomly select two individuals a and b from the population as the parent, and generate a random number rand greater than 0 and less than 1, if rand is less than the algorithm parameter rmp (random mating probability) or the two individuals If the skill factor τ is equal, the simulated binary crossover is performed, otherwise, the mutation is performed on the two individuals respectively; the steps of crossover or mutation are repeated until the number of offspring is equal to the number of individuals in the population, and the process goes to step S2-6;
其中,模拟二进制交叉的操作如下:Among them, the operation of simulating binary crossover is as follows:
设有两个父代xa=[xa(1),...,xa(d)]和xb=[xb(1),...,xb(d)],d为数据的维度,首先计算分布因子(spread factor)c(j):Given two parents x a =[x a (1),...,x a (d)] and x b =[x b (1),...,x b (d)], d is The dimension of the data, first calculate the distribution factor (spread factor) c(j):
其中,β为一个大于0的系统参数,r在各个维度上都是大于0小于1的随机数;Among them, β is a system parameter greater than 0, and r is a random number greater than 0 and less than 1 in each dimension;
得到子代为:Get the offspring as:
xe(j)=[(1+c(j))xa(j)+(1-c(j))xb(j)]/2x e (j)=[(1+c(j))x a (j)+(1-c(j))x b (j)]/2
xf(j)=[(1+c(j))xb(j)+(1-c(j))xa(j)]/2。 xf (j)=[(1+c(j))xb(j)+(1- c (j)) xa (j)]/2.
S2-6、计算生成的子代在各个聚类指标优化任务下的适应值;S2-6, calculating the fitness value of the generated offspring under each clustering index optimization task;
S2-7、将父代和子代合并组成新的种群,聚类后根据适应值重新计算更新其中所有个体的技能因子τ和标量适应值φ;若第i个个体在所有K个任务中的阶乘等级分别为那么该个体的标量适应值为即该个体标量适应值由它在表现最好的那个任务中的阶乘等级决定;S2-7. Combine the parent and child to form a new population. After clustering, recalculate and update the skill factor τ and scalar fitness value φ of all individuals in it according to the fitness value; if the factorial of the i-th individual in all K tasks The grades are Then the scalar fitness of the individual is That is, the scalar fitness value of the individual is determined by its factorial level in the task that performs best;
S2-8、根据标量适应值φ对种群内个体排序,之后从优到劣依次选取个体进入下一代种群,φ更大的个体优先被选入下一代种群;k=k+1;S2-8. Sort the individuals in the population according to the scalar fitness value φ, and then select individuals from superior to inferior to enter the next-generation population, and individuals with larger φ are preferentially selected into the next-generation population; k=k+1;
S2-9、若k<n,则返回步骤S2-3,否则停止迭代;S2-9, if k<n, return to step S2-3, otherwise stop the iteration;
S2-10、找出技能因子τ等于1的个体,记录作为当前各个任务最优的个体。S2-10, find out the individual whose skill factor τ is equal to 1, and record the individual as the current optimal individual for each task.
S3、步骤S2结束后得到3个聚类结果,利用医生的专家知识从中选出一个最优的结果。S3. Three clustering results are obtained after step S2, and an optimal result is selected from the expert knowledge of the doctor.
实施例通过多任务聚类可以将对不同指标的优化视作不同的任务,分别找到各个任务下最优个体,并用专家知识找出最适合该类医学图像的那个聚类指标。聚类指标可以选择适合实际情况的指标,以达到更好的效果。该方法利用了多任务学习的迁移过程,其不同学习任务间的交流有利于突破局部最优以更加接近全局最优,而且其优化速度比用同一个方法对不同的优化目标接连运行多次更加有效率。In the embodiment, the optimization of different indicators can be regarded as different tasks through multi-task clustering, and the optimal individuals under each task can be found respectively, and expert knowledge can be used to find out which clustering indicator is most suitable for this type of medical image. Clustering indicators can choose indicators suitable for the actual situation to achieve better results. This method utilizes the migration process of multi-task learning, and the communication between different learning tasks is conducive to breaking through the local optimum to get closer to the global optimum, and its optimization speed is faster than using the same method to run multiple times for different optimization objectives. Efficient.
以上所述之实施例子只为本发明之较佳实施例,并非以此限制本发明的实施范围,故凡依本发明之形状、原理所作的变化,均应涵盖在本发明的保护范围内。The above-mentioned embodiments are only preferred embodiments of the present invention, and are not intended to limit the scope of implementation of the present invention. Therefore, any changes made according to the shape and principle of the present invention should be included within the protection scope of the present invention.
Claims (9)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010364563.4A CN111563549B (en) | 2020-04-30 | 2020-04-30 | Medical image clustering method based on multitasking evolutionary algorithm |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010364563.4A CN111563549B (en) | 2020-04-30 | 2020-04-30 | Medical image clustering method based on multitasking evolutionary algorithm |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111563549A true CN111563549A (en) | 2020-08-21 |
CN111563549B CN111563549B (en) | 2023-07-28 |
Family
ID=72070695
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010364563.4A Active CN111563549B (en) | 2020-04-30 | 2020-04-30 | Medical image clustering method based on multitasking evolutionary algorithm |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111563549B (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115204323A (en) * | 2022-09-16 | 2022-10-18 | 华智生物技术有限公司 | Seed multi-feature based clustering and synthesis method, system, device and medium |
CN115222007A (en) * | 2022-05-31 | 2022-10-21 | 复旦大学 | An improved particle swarm parameter optimization method for glioma multi-task integrated network |
CN115346665A (en) * | 2022-10-19 | 2022-11-15 | 南昌大学第二附属医院 | Method, system and equipment for constructing retinopathy risk prediction model |
Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101499136A (en) * | 2009-03-05 | 2009-08-05 | 西安电子科技大学 | Image over-segmenting optimization method based on multi-target evolution clustering and spatial information |
CN102567963A (en) * | 2011-11-10 | 2012-07-11 | 西安电子科技大学 | Quantum multi-target clustering-based remote sensing image segmentation method |
CN104156945A (en) * | 2014-07-16 | 2014-11-19 | 西安电子科技大学 | Method for segmenting gray scale image based on multi-objective particle swarm optimization algorithm |
CN106886467A (en) * | 2017-02-24 | 2017-06-23 | 电子科技大学 | Method for optimizing is tested in multitask based on the comprehensive multi-target evolution of packet |
EP3273387A1 (en) * | 2016-07-19 | 2018-01-24 | Siemens Healthcare GmbH | Medical image segmentation with a multi-task neural network system |
WO2018086433A1 (en) * | 2016-11-08 | 2018-05-17 | 江苏大学 | Medical image segmenting method |
US20190122071A1 (en) * | 2017-10-24 | 2019-04-25 | International Business Machines Corporation | Emotion classification based on expression variations associated with same or similar emotions |
CN110136828A (en) * | 2019-05-16 | 2019-08-16 | 杭州健培科技有限公司 | A method of medical image multitask auxiliary diagnosis is realized based on deep learning |
US20190272333A1 (en) * | 2018-03-01 | 2019-09-05 | King Fahd University Of Petroleum And Minerals | Heuristic for the data clustering problem |
CN110458859A (en) * | 2019-07-01 | 2019-11-15 | 南开大学 | A Multiple Myeloma Lesion Segmentation System Based on Multiple Sequence MRI |
CN110991518A (en) * | 2019-11-28 | 2020-04-10 | 山东大学 | Two-stage feature selection method and system based on evolution multitask |
-
2020
- 2020-04-30 CN CN202010364563.4A patent/CN111563549B/en active Active
Patent Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101499136A (en) * | 2009-03-05 | 2009-08-05 | 西安电子科技大学 | Image over-segmenting optimization method based on multi-target evolution clustering and spatial information |
CN102567963A (en) * | 2011-11-10 | 2012-07-11 | 西安电子科技大学 | Quantum multi-target clustering-based remote sensing image segmentation method |
CN104156945A (en) * | 2014-07-16 | 2014-11-19 | 西安电子科技大学 | Method for segmenting gray scale image based on multi-objective particle swarm optimization algorithm |
EP3273387A1 (en) * | 2016-07-19 | 2018-01-24 | Siemens Healthcare GmbH | Medical image segmentation with a multi-task neural network system |
WO2018086433A1 (en) * | 2016-11-08 | 2018-05-17 | 江苏大学 | Medical image segmenting method |
CN106886467A (en) * | 2017-02-24 | 2017-06-23 | 电子科技大学 | Method for optimizing is tested in multitask based on the comprehensive multi-target evolution of packet |
US20190122071A1 (en) * | 2017-10-24 | 2019-04-25 | International Business Machines Corporation | Emotion classification based on expression variations associated with same or similar emotions |
US20190272333A1 (en) * | 2018-03-01 | 2019-09-05 | King Fahd University Of Petroleum And Minerals | Heuristic for the data clustering problem |
CN110136828A (en) * | 2019-05-16 | 2019-08-16 | 杭州健培科技有限公司 | A method of medical image multitask auxiliary diagnosis is realized based on deep learning |
CN110458859A (en) * | 2019-07-01 | 2019-11-15 | 南开大学 | A Multiple Myeloma Lesion Segmentation System Based on Multiple Sequence MRI |
CN110991518A (en) * | 2019-11-28 | 2020-04-10 | 山东大学 | Two-stage feature selection method and system based on evolution multitask |
Non-Patent Citations (3)
Title |
---|
GENG-BIN CHEN ETAL: "Automatic clustering approach based on particle swarm optimization for data with arbitrary shaped clusters", 《HTTPS://IEEEXPLORE.IEEE.ORG/DOCUMENT/7885913》 * |
程美英等: "多任务处理协同进化粒子群算法", 《模式识别与人工智能》 * |
郭亮: "基于图论和差分进化的医学图像聚类分析方法的研究", 《中国优秀博硕士学位论文全文数据库(硕士)信息科技辑》 * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115222007A (en) * | 2022-05-31 | 2022-10-21 | 复旦大学 | An improved particle swarm parameter optimization method for glioma multi-task integrated network |
CN115204323A (en) * | 2022-09-16 | 2022-10-18 | 华智生物技术有限公司 | Seed multi-feature based clustering and synthesis method, system, device and medium |
CN115204323B (en) * | 2022-09-16 | 2022-12-02 | 华智生物技术有限公司 | Seed multi-feature based clustering and synthesis method, system, device and medium |
CN115346665A (en) * | 2022-10-19 | 2022-11-15 | 南昌大学第二附属医院 | Method, system and equipment for constructing retinopathy risk prediction model |
CN115346665B (en) * | 2022-10-19 | 2023-03-10 | 南昌大学第二附属医院 | Method, system and equipment for constructing retinopathy incidence risk prediction model |
Also Published As
Publication number | Publication date |
---|---|
CN111563549B (en) | 2023-07-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Shrikumar et al. | Technical note on transcription factor motif discovery from importance scores (TF-MoDISco) version 0.5. 6.5 | |
CN109994200B (en) | A multi-omics cancer data integration analysis method based on similarity fusion | |
Shi et al. | A link clustering based overlapping community detection algorithm | |
JP6814981B2 (en) | Learning device, identification device, learning identification system, and program | |
CN106537422B (en) | System and method for the relationship in capturing information | |
CN111563549B (en) | Medical image clustering method based on multitasking evolutionary algorithm | |
CN113255728A (en) | Depression classification method based on map embedding and multi-modal brain network | |
Li et al. | Classifiability-based omnivariate decision trees | |
Ephzibah et al. | A neuro fuzzy expert system for heart disease diagnosis | |
CN113469270B (en) | A semi-supervised intuitionistic clustering method based on decomposed multi-objective differential evolution superpixels | |
Cui et al. | Learning global pairwise interactions with Bayesian neural networks | |
CN103810288A (en) | Method for carrying out community detection on heterogeneous social network on basis of clustering algorithm | |
CN111090764A (en) | Image classification method and device based on multi-task learning and graph convolutional neural network | |
CN106021990A (en) | Method for achieving classification and self-recognition of biological genes by means of specific characters | |
Zhang et al. | Node features adjusted stochastic block model | |
CN112668633A (en) | Adaptive graph migration learning method based on fine granularity field | |
Zhou et al. | Pre-clustering active learning method for automatic classification of building structures in urban areas | |
CN108388769B (en) | Protein functional module identification method based on edge-driven label propagation algorithm | |
Sun et al. | Multi-view biclustering for genotype-phenotype association studies of complex diseases | |
Ma et al. | Few-shot learning via dirichlet tessellation ensemble | |
Ma et al. | Few-shot learning as cluster-induced Voronoi diagrams: a geometric approach | |
Piao et al. | To be critical: Self-calibrated weakly supervised learning for salient object detection | |
CN117667890A (en) | Knowledge base construction method and system for standard digitization | |
CN115661498A (en) | Self-optimization single cell clustering method | |
CN111581008A (en) | Fast and accurate detection method of outliers based on parallel cloud computing |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |