CN111563549A

CN111563549A - Medical image clustering method based on multitask evolutionary algorithm

Info

Publication number: CN111563549A
Application number: CN202010364563.4A
Authority: CN
Inventors: 胡晓敏; 颜志鹏; 陈伟能; 李敏
Original assignee: Guangdong University of Technology
Current assignee: Guangdong University of Technology
Priority date: 2020-04-30
Filing date: 2020-04-30
Publication date: 2020-08-21
Anticipated expiration: 2040-04-30
Also published as: CN111563549B

Abstract

The invention discloses a medical image clustering method based on a multitask evolutionary algorithm, which comprises the following steps of: s1, extracting ROI feature description data of the medical image; s2, reading ROI feature description data of the extracted medical image, and obtaining a plurality of clustering results under a multi-task framework by optimizing a plurality of clustering internal indexes by applying an NMP clustering rule; and S3, selecting an optimal result from the results by using expert knowledge of the doctor. The method can fully express the connotation of the medical image, can simultaneously optimize one population to obtain a plurality of clustering results, is easier to converge to global optimum through cross-domain communication, and has more obvious clustering effect.

Description

Medical Image Clustering Method Based on Multi-task Evolutionary Algorithm

技术领域technical field

本发明涉及医学技术和和智能计算两大领域，尤其涉及到基于多任务进化算法的医学图像聚类方法。The invention relates to the two fields of medical technology and intelligent computing, in particular to a medical image clustering method based on a multi-task evolutionary algorithm.

背景技术Background technique

在近20年来，医学成像技术成为了医学技术中发展飞速的领域之一，医学图像越来越容易获得和存储。在图像数据中，医疗图像数据占据了很大的比例。在医疗系统中，医学图像有着重要的作用。通过对医学图像的观察，病变部位做出更直接更清晰地判断，从而得到更加准确的诊断结果。准确地进行医学图像聚类，可以更好的在医务人员判断和诊断病情病因时提供科学参考，从而会大大减少因为人类本身视力分辨力不足或是医疗人员主观上临床经验不足产生的误诊率，进一步提高医学图像的利用率。In the past 20 years, medical imaging technology has become one of the rapidly developing fields in medical technology, and medical images are more and more easily obtained and stored. Among the image data, medical image data occupies a large proportion. In the medical system, medical images play an important role. Through the observation of medical images, the lesion can be judged more directly and clearly, so as to obtain more accurate diagnosis results. Accurate clustering of medical images can better provide scientific reference for medical staff to judge and diagnose the cause of the disease, thereby greatly reducing the misdiagnosis rate caused by insufficient human vision resolution or insufficient clinical experience of medical staff. Further improve the utilization of medical images.

目前，对于医学图像聚类来说，虽然国内外已经有将传统的聚类算法移植到医学图像上，比如基于K均值的方法和它的很多变种，但该些都存在“对初始中心敏感和对K取值敏感、基于密度的方法大都对领域半径和MinPtr的取值敏感，并且受噪声的影响比较大、基于网格的方法在精度上有所缺失”的缺点。At present, for medical image clustering, although traditional clustering algorithms have been transplanted to medical images at home and abroad, such as the method based on K-means and its many variants, but these all have "sensitivity to the initial center and Sensitive to the value of K, the density-based methods are mostly sensitive to the value of the field radius and MinPtr, and are greatly affected by noise, and the grid-based methods are lacking in accuracy”.

此外还有使用演化算法进行聚类，常用的有基于遗传算法或者差分算法等，但它们基于单任务的框架，一次运行只能对单个目标进行优化，得到单一的优化结果。In addition, there is also the use of evolutionary algorithms for clustering. Commonly used are genetic algorithms or differential algorithms, but they are based on single-task frameworks. Only a single target can be optimized in one operation to obtain a single optimization result.

还有的是，对于医务人员来说，并非每个医学图像中的像素都是值得去观察的，医生们会更关注那些与众不同的像素区域，这些区域称之为ROI(region of interest)。在以往的图像研究中，研究人员都是基于传统图像的特征(比如颜色、纹理和形状)来提取ROI的。但是这并不适用的医学图像。和普通的图像数据相比，医学图像有很多特点。What's more, for medical staff, not every pixel in a medical image is worth observing, and doctors will pay more attention to those distinctive pixel areas, which are called ROI (region of interest). In previous image studies, researchers extracted ROIs based on traditional image features such as color, texture, and shape. But this does not apply to medical images. Compared with ordinary image data, medical images have many characteristics.

发明内容SUMMARY OF THE INVENTION

本发明的目的在于克服现有技术的不足，提供一种更能充分表达医学图像的内涵、能同时优化一个种群得到多个聚类结果、通过跨域交流更容易收敛到全局最优、聚类效果更加明显的基于多任务进化算法的医学图像聚类方法，The purpose of the present invention is to overcome the deficiencies of the prior art, to provide a method that can more fully express the connotation of medical images, can simultaneously optimize a population to obtain multiple clustering results, and is easier to converge to the global optimum through cross-domain communication. The medical image clustering method based on multi-task evolutionary algorithm with more obvious effect,

为实现上述目的，本发明所提供的技术方案为：For achieving the above object, the technical scheme provided by the present invention is:

基于多任务进化算法的医学图像聚类方法，包括以下步骤：The medical image clustering method based on multi-task evolutionary algorithm includes the following steps:

S1、提取医学图像的ROI特征描述数据；S1. Extract the ROI feature description data of the medical image;

S2、读入提取医学图像的ROI特征描述数据，运用NMP聚类规则，通过对多个聚类内部指标进行优化，在多任务框架下得到多个聚类结果；S2. Read in and extract the ROI feature description data of the medical image, and use the NMP clustering rules to obtain multiple clustering results under the multi-task framework by optimizing multiple clustering internal indicators;

S3、利用医生的专家知识从中选出一个最优的结果。S3. Use the expert knowledge of the doctor to select an optimal result.

进一步地，所述步骤S1提取的医学图像的ROI特征描述数据包括相对灰度s1、相对面积s2、相对质心坐标s3、似圆性s4、角度s5以及对称性s6。Further, the ROI feature description data of the medical image extracted in the step S1 includes relative grayscale s1, relative area s2, relative centroid coordinates s3, circularity s4, angle s5, and symmetry s6.

进一步地，所述步骤S1提取医学图像的ROI特征描述数据的具体过程如下：Further, the specific process of extracting the ROI feature description data of the medical image in the step S1 is as follows:

S1-1、读入医学图像；S1-1, read in medical images;

S1-2、扫描医学图像，得到图像的像素数、长、宽和灰度最大值；S1-2. Scan the medical image to obtain the pixel number, length, width and maximum gray value of the image;

S1-3、检测医学图像的对称性s6；S1-3, detect the symmetry s6 of the medical image;

S1-4、根据灰度范围提取医学图像的ROI区域；S1-4, extract the ROI area of the medical image according to the grayscale range;

S1-5、对步骤S1-4得到的ROI区域提取灰度均值、像素个数、最长轴、最短轴、质心坐标以及角度；S1-5, extracting the mean gray value, the number of pixels, the longest axis, the shortest axis, the coordinates of the centroid and the angle from the ROI region obtained in step S1-4;

S1-6、计算ROI区域的特征描述数据。S1-6, calculate the feature description data of the ROI area.

进一步地，所述步骤S1-3检测医学图像对称性s6的过程为：Further, the process of detecting the symmetry s6 of the medical image in the steps S1-3 is:

将医学图像对折做差，并用灰度阈值做二值化处理，若剩余的像素点少于设定值，则判定医学图像对称，反之则不对称；The medical image is folded in half, and the grayscale threshold is used for binarization. If the remaining pixels are less than the set value, the medical image is judged to be symmetrical, otherwise it is asymmetrical;

所述步骤S1-6计算ROI区域的特征描述数据的具体过程如下：The specific process of calculating the feature description data of the ROI region in the step S1-6 is as follows:

相对灰度s1：

其中ROI.gray为ROI区域的平均灰度，IMAGE.gray为整个图像的平均灰度；Relative grayscale s1:

Among them, ROI.gray is the average gray level of the ROI area, and IMAGE.gray is the average gray level of the entire image;

相对面积s2：s2＝ROI.area/IMAGE.area，ROI.area为ROI区域的像素个数，IMAGE.area为整个图像的像素个数；Relative area s2: s2=ROI.area/IMAGE.area, ROI.area is the number of pixels in the ROI area, and IMAGE.area is the number of pixels in the entire image;

相对质心坐标s3：s3＝(ROI.x/IMAGE.length,ROI.y/IMAGE.height)，ROI.x为ROI质心的横坐标，IMAGE.length为原图像的长度，ROI.y为ROI质心的纵坐标，IMAGE.height为原图像的高度；Relative centroid coordinate s3: s3=(ROI.x/IMAGE.length, ROI.y/IMAGE.height), ROI.x is the abscissa of the ROI centroid, IMAGE.length is the length of the original image, and ROI.y is the ROI centroid , IMAGE.height is the height of the original image;

似圆性s4：s4＝4×π×ROI.area/ROI.perimeter²，ROI.area为ROI区域的像素个数，ROI.perimeter为ROI区域周边像素的个数；Circular s4: s4=4×π×ROI.area/ROI.perimeter ² , ROI.area is the number of pixels in the ROI area, and ROI.perimeter is the number of pixels around the ROI area;

角度s5：s5＝(Orientation+90)/180，Orientation为ROI的长轴到X轴的夹角。Angle s5: s5=(Orientation+90)/180, and Orientation is the angle between the long axis of the ROI and the X axis.

进一步地，所述步骤S2的具体过程如下：Further, the specific process of the step S2 is as follows:

S2-1、读入提取医学图像的ROI特征描述数据；S2-1. Read in and extract the ROI feature description data of the medical image;

S2-2、初始化种群和最大迭代次数n，k＝0；S2-2, initialize the population and the maximum number of iterations n, k=0;

S2-3、结合NMP聚类规则执行聚类并用聚类指标进行评估；S2-3, perform clustering in combination with NMP clustering rules and use the clustering index to evaluate;

S2-4、计算每个个体的技能因子τ；S2-4, calculate the skill factor τ of each individual;

S2-5、生成子代；从种群中随机选取两个个体a和b作为父代，产生一个大于0小于1的随机数rand，如果rand小于算法参数rmp或者两个个体的技能因子τ相等，则执行模拟二进制交叉，否则分别对这两个个体执行变异；重复杂交或变异的步骤直至得到子代个数和种群个体数相等时停止，并进入步骤S2-6；S2-5. Generate offspring; randomly select two individuals a and b from the population as the parent, and generate a random number rand greater than 0 and less than 1. If rand is less than the algorithm parameter rmp or the skill factor τ of the two individuals is equal, Then perform the simulated binary crossover, otherwise perform mutation on the two individuals respectively; repeat the steps of crossover or mutation until the number of offspring is equal to the number of individuals in the population, stop, and enter step S2-6;

S2-6、计算生成的子代在各个聚类指标优化任务下的适应值；S2-6, calculating the fitness value of the generated offspring under each clustering index optimization task;

S2-7、将父代和子代合并组成新的种群，聚类后根据适应值重新计算更新其中所有个体的技能因子τ和标量适应值φ；S2-7, merge the parent and child to form a new population, and recalculate and update the skill factor τ and scalar fitness value φ of all individuals in it according to the fitness value after clustering;

S2-8、根据标量适应值φ对种群内个体排序，之后从优到劣依次选取个体进入下一代种群，φ更大的个体优先被选入下一代种群；k＝k+1；S2-8. Sort the individuals in the population according to the scalar fitness value φ, and then select individuals from superior to inferior to enter the next-generation population, and individuals with larger φ are preferentially selected into the next-generation population; k=k+1;

S2-9、若k<n，则返回步骤S2-3，否则停止迭代；S2-9, if k<n, return to step S2-3, otherwise stop the iteration;

S2-10、找出技能因子τ等于1的个体，记录作为当前各个任务最优的个体。S2-10, find out the individual whose skill factor τ is equal to 1, and record the individual as the current optimal individual for each task.

进一步地，所述步骤S2-2中，种群通过以下编码完成初始化：Further, in the step S2-2, the population is initialized by the following coding:

设定聚类类别数的最大值K_max，种群的个体编码为K_max+K_max*d维的向量

d为数据维度数，m_ij为簇中心的坐标向量，T_ij(j＝1,...,K_max)为得到类别质心点的激活阈值；Set the maximum value K _max of the number of cluster categories, and the individual coding of the population is a vector of K _max +K _max *d dimension

d is the number of data dimensions, m _ij is the coordinate vector of the cluster center, and T _ij (j=1,...,K _max ) is the activation threshold for obtaining the class centroid point;

激活得到质心点的定义具体如下：The definition of the activated centroid point is as follows:

若T_ij大于0.5，则对应簇的质心点m_ij被激活，否则不被激活；若T_ij大于1或为负数则重置为1或0，如果得到的质心数小于设置的最小类别数则随机选取几个激活阈值激活以满足最小类别数的要求。If T _ij is greater than 0.5, the centroid point m _ij of the corresponding cluster is activated, otherwise it is not activated; if T _ij is greater than 1 or a negative number, it is reset to 1 or 0, and if the number of centroids obtained is less than the minimum number of categories set, then Several activation threshold activations are randomly selected to satisfy the minimum number of classes.

进一步地，步骤S2-3中，所述NMP聚类规则具体如下：Further, in step S2-3, the NMP clustering rules are specifically as follows:

给定一个个数为N数据集，数据表示为X＝(x₁,...,x_N)，K个簇的质心为C＝{C₁,...,C_K}，D表示距离，那么，在NMP规则中样本点x_i＝(i＝0,...,N)和某一聚类类别C_h(h＝1,...,K)的距离定义如下：Given a data set with a number of N, the data is represented as X=(x ₁ ,...,x _N ), the centroids of K clusters are C={C ₁ ,...,C _K }, and D represents the distance , then, in the NMP rule, the distance between the sample point x _i =(i=0,...,N) and a certain clustering category C _h (h=1,...,K) is defined as follows:

D(x_i,C_h)＝min{D(x_i,x_j),D(x_i,m_h)|x_j∈C_h}D(x _i ,C _h )=min{D(x _i ,x _j ),D(x _i ,m _h )|x _j ∈C _h }

即样本到簇类别的距离是样本到簇各点中距离最小的那个距离；That is, the distance from the sample to the cluster category is the distance from the sample to the points of the cluster with the smallest distance;

每个样本被分配到最近的簇中，所有被分配到同一个簇的样本组成了一个候选样本集，这个簇称为这些样本的一个未确定簇；然后，对每一个簇，从候选样本集中选一个最近的样本，所有簇的这种最近样本点合并称为最近样本集；最后，在最近样本集中找到一个样本，这个样本离它所在的未确定簇距离是最近样本集中最小的，便将该样本分配到它的未确定簇中；不断重复上述步骤直到所有样本被分配完毕。Each sample is assigned to the nearest cluster, and all samples assigned to the same cluster form a candidate sample set, which is called an undetermined cluster of these samples; then, for each cluster, from the candidate sample set Select a nearest sample, and the nearest sample points of all clusters are combined as the nearest sample set; finally, find a sample in the nearest sample set, the distance between this sample and the undetermined cluster where it is located is the smallest in the nearest sample set. The sample is assigned to its undetermined cluster; the above steps are repeated until all samples have been assigned.

进一步地，步骤S2-3中，所述聚类指标包括CH指标、Dunn指标、SIL指标，分别对应一个优化任务；各指标具体如下：Further, in step S2-3, the clustering index includes CH index, Dunn index, and SIL index, respectively corresponding to an optimization task; each index is as follows:

CH指标：

CH indicator:

上式中，

表示类别间离差矩阵的迹，m表示整个数据集的平均值向量；In the above formula,

Represents the trace of the inter-class dispersion matrix, and m represents the mean vector of the entire dataset;

Dunn指标：

Dunn Metrics:

上式中，D(C_i,C_j)代表不同类别间的距离是两个最靠近的数据点之间的距离，公式表达如下：

δ(C_i)为这个类别的两个最远距离点间距离：

In the above formula, D(C _i , C _j ) represents that the distance between different categories is the distance between the two closest data points. The formula is expressed as follows:

δ(C _i ) is the distance between the two most distant points of this category:

SIL指标：

SIL indicator:

上式中，s_j＝(b_j-a_j)/max(a_j,b_j)代表数据点x_j的轮廓宽度；数据点x_j到它所属类别的其他数据点的平均距离a_j和到其他类别数据点的最小距离b_j的计算公式如下：In the above formula, s _j =(b _j -a _j )/max(a _j ,b _j ) represents the outline width of the data point x _j _{; the average distances a j} _and The formula for calculating the minimum distance b _j to other categories of data points is as follows:

进一步地，步骤S2-5中，所述模拟二进制交叉的操作如下：Further, in step S2-5, the operation of simulating binary crossover is as follows:

设有两个父代x_a＝[x_a(1),...,x_a(d)]和x_b＝[x_b(1),...,x_b(d)]，d为数据的维度，首先计算分布因子c(j)：Given two parents x _a =[x _a (1),...,x _a (d)] and x _b =[x _b (1),...,x _b (d)], d is The dimension of the data, first calculate the distribution factor c(j):

其中，β为一个大于0的系统参数，r在各个维度上都是大于0小于1的随机数；Among them, β is a system parameter greater than 0, and r is a random number greater than 0 and less than 1 in each dimension;

得到子代为：Get the offspring as:

x_e(j)＝[(1+c(j))x_a(j)+(1-c(j))x_b(j)]/2x _e (j)=[(1+c(j))x _a (j)+(1-c(j))x _b (j)]/2

x_f(j)＝[(1+c(j))x_b(j)+(1-c(j))x_a(j)]/2。 _xf (j)=[(1+c(j))xb(j)+(1- _c (j)) _xa (j)]/2.

与现有技术相比，本方案原理及优点如下：Compared with the prior art, the principle and advantages of this scheme are as follows:

1、通过多因子进化算法，针对CH指标、Dunn指标、SIL指标等聚类内部指标同时优化一个种群得到多个聚类结果，通过跨域交流更容易收敛到全局最优。1. Through the multi-factor evolution algorithm, a population is simultaneously optimized for the internal indicators of the cluster such as CH index, Dunn index, and SIL index to obtain multiple clustering results, and it is easier to converge to the global optimum through cross-domain communication.

2、传统的基于颜色、纹理和形状的图像特征挖掘方法往往忽视了医学图像ROI独特属性承载的信息，本发明使用的ROI属性为相对灰度用、相对面积、相对质心坐标、似圆性、角度和对称性，在考虑传统图像特征时也考虑到了医学图像特有的特征。2. The traditional image feature mining method based on color, texture and shape often ignores the information carried by the unique attributes of the medical image ROI. The ROI attributes used in the present invention are relative grayscale, relative area, relative centroid coordinates, circularity, Angle and symmetry, characteristics specific to medical images are also taken into account when considering traditional image features.

3、基于NMP(nearest multiple prototypes)的聚类规则：传统的粒子群优化方式应用于聚类优化时，对于聚类形态非圆形的聚类问题，效果不佳。同样是基于距离的聚类规则，NMP规则是一个动态的过程，更加灵活，通常有更好的聚类效果。3. Clustering rules based on NMP (nearest multiple prototypes): When the traditional particle swarm optimization method is applied to clustering optimization, the effect is not good for the clustering problem of non-circular cluster shape. It is also a distance-based clustering rule. NMP rule is a dynamic process, which is more flexible and usually has better clustering effect.

附图说明Description of drawings

为了更清楚地说明本发明实施例或现有技术中的技术方案，下面将对实施例或现有技术描述中所需要使用的服务作简单地介绍，显而易见地，下面描述中的附图仅仅是本发明的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他的附图。In order to illustrate the embodiments of the present invention or the technical solutions in the prior art more clearly, the following briefly introduces the services required in the description of the embodiments or the prior art. Obviously, the drawings in the following description are only For some embodiments of the present invention, for those of ordinary skill in the art, other drawings can also be obtained according to these drawings without creative efforts.

图1为本发明基于多任务进化算法的医学图像聚类方法的整体流程图；Fig. 1 is the overall flow chart of the medical image clustering method based on multi-task evolutionary algorithm of the present invention;

图2为医学图像ROI数据提取的流程图；Fig. 2 is the flow chart of medical image ROI data extraction;

图3为多任务聚类进化算法编码示意图。FIG. 3 is a schematic diagram of coding of multi-task cluster evolution algorithm.

具体实施方式Detailed ways

下面结合具体实施例对本发明作进一步说明：Below in conjunction with specific embodiment, the present invention will be further described:

如图1所示，本实施例所述的基于多任务进化算法的医学图像聚类方法，包括以下步骤：As shown in Figure 1, the medical image clustering method based on the multi-task evolutionary algorithm described in this embodiment includes the following steps:

S1、提取医学图像的ROI特征描述数据，其中包括相对灰度s1、相对面积s2、相对质心坐标s3、似圆性s4、角度s5以及对称性s6。S1. Extract ROI feature description data of the medical image, including relative grayscale s1, relative area s2, relative centroid coordinates s3, circularity s4, angle s5, and symmetry s6.

如图2所示，提取的具体过程如下：As shown in Figure 2, the specific process of extraction is as follows:

S1-1、读入医学图像；S1-1, read in medical images;

S1-3、检测医学图像的对称性s6：S1-3. Detect the symmetry of the medical image s6:

S1-6、计算ROI区域的特征描述数据：S1-6. Calculate the feature description data of the ROI area:

相对灰度s1：

S2、提取医学图像的ROI特征描述数据后，进行数据读入，运用NMP聚类规则，通过对多个聚类内部指标进行优化，在多任务框架下得到多个聚类结果；具体过程如下：S2. After extracting the ROI feature description data of the medical image, read in the data, use the NMP clustering rules, and obtain multiple clustering results under the multi-task framework by optimizing the internal indicators of multiple clusters; the specific process is as follows:

S2-2、初始化种群和最大迭代次数n，k＝0；其中，本步骤通过以下编码完成初始化：S2-2, initialize the population and the maximum number of iterations n, k=0; wherein, this step completes the initialization through the following coding:

若T_ij大于0.5，则对应簇的质心点m_ij被激活，否则不被激活；若T_ij大于1或为负数则重置为1或0，如果得到的质心数小于设置的最小类别数则随机选取几个激活阈值激活以满足最小类别数的要求。如附图的图3所示，一个维度d为2，最大聚类类别数Kmax为4的某个个体，前面为激活阈值，第二个位置0.4小于0.5，因此它对应的第二个质心为未激活状态，以此类推，其他位置的阈值大于0.5从而被激活。因此，这个个体的聚类数目为3。实际情况中，d被设置为6。If T _ij is greater than 0.5, the centroid point m _ij of the corresponding cluster is activated, otherwise it is not activated; if T _ij is greater than 1 or a negative number, it is reset to 1 or 0, and if the number of centroids obtained is less than the minimum number of categories set, then Several activation threshold activations are randomly selected to satisfy the minimum number of classes. As shown in Figure 3 of the accompanying drawings, an individual whose dimension d is 2 and the maximum number of cluster categories Kmax is 4, the front is the activation threshold, and the second position 0.4 is less than 0.5, so its corresponding second centroid is Inactive state, and so on, the threshold value of other positions is greater than 0.5 to be activated. Therefore, the number of clusters for this individual is 3. In practice, d is set to 6.

本步骤所述的NMP聚类规则具体如下：The NMP clustering rules described in this step are as follows:

而聚类指标包括CH指标、Dunn指标、SIL指标，分别对应一个优化任务；各指标具体如下：The clustering index includes CH index, Dunn index, and SIL index, which correspond to an optimization task respectively. The specific indicators are as follows:

CH指标：

CH indicator:

上式中，

Dunn指标：

Dunn Metrics:

δ(C_i)为这个类别的两个最远距离点间距离：

δ(C _i ) is the distance between the two most distant points of this category:

SIL指标：

SIL indicator:

S2-4、计算每个个体的技能因子τ，该值记录单个个体i在所有任务中表现最好的那一个优化任务。如个体i，在第j个任务的个体排序中最靠前，则τ_i＝j。在计算技能因子时，对种群中所有个体在某个任务下依照适应值大小排序后，在序列中对应的序号即为某个个体i；在对应任务j的阶乘等级r_i ^j。S2-4. Calculate the skill factor τ of each individual, and this value records the optimal task that a single individual i performs best in all tasks. For example, if the individual i is the most advanced in the individual ranking of the jth task, then τ _i =j. When calculating the skill factor, after sorting all individuals in the population according to the size of the fitness value under a certain task, the corresponding serial number in the sequence is an individual i; the factorial level r _i ^{j corresponding to the task j} .

S2-5、生成子代；从种群中随机选取两个个体a和b作为父代，产生一个大于0小于1的随机数rand，如果rand小于算法参数rmp(random mating probability)或者两个个体的技能因子τ相等，则执行模拟二进制交叉，否则分别对这两个个体执行变异；重复杂交或变异的步骤直至得到子代个数和种群个体数相等时停止，并进入步骤S2-6；S2-5. Generate offspring; randomly select two individuals a and b from the population as the parent, and generate a random number rand greater than 0 and less than 1, if rand is less than the algorithm parameter rmp (random mating probability) or the two individuals If the skill factor τ is equal, the simulated binary crossover is performed, otherwise, the mutation is performed on the two individuals respectively; the steps of crossover or mutation are repeated until the number of offspring is equal to the number of individuals in the population, and the process goes to step S2-6;

其中，模拟二进制交叉的操作如下：Among them, the operation of simulating binary crossover is as follows:

设有两个父代x_a＝[x_a(1),...,x_a(d)]和x_b＝[x_b(1),...,x_b(d)]，d为数据的维度，首先计算分布因子(spread factor)c(j)：Given two parents x _a =[x _a (1),...,x _a (d)] and x _b =[x _b (1),...,x _b (d)], d is The dimension of the data, first calculate the distribution factor (spread factor) c(j):

得到子代为：Get the offspring as:

S2-7、将父代和子代合并组成新的种群，聚类后根据适应值重新计算更新其中所有个体的技能因子τ和标量适应值φ；若第i个个体在所有K个任务中的阶乘等级分别为

那么该个体的标量适应值为

即该个体标量适应值由它在表现最好的那个任务中的阶乘等级决定；S2-7. Combine the parent and child to form a new population. After clustering, recalculate and update the skill factor τ and scalar fitness value φ of all individuals in it according to the fitness value; if the factorial of the i-th individual in all K tasks The grades are

Then the scalar fitness of the individual is

That is, the scalar fitness value of the individual is determined by its factorial level in the task that performs best;

S3、步骤S2结束后得到3个聚类结果，利用医生的专家知识从中选出一个最优的结果。S3. Three clustering results are obtained after step S2, and an optimal result is selected from the expert knowledge of the doctor.

实施例通过多任务聚类可以将对不同指标的优化视作不同的任务，分别找到各个任务下最优个体，并用专家知识找出最适合该类医学图像的那个聚类指标。聚类指标可以选择适合实际情况的指标，以达到更好的效果。该方法利用了多任务学习的迁移过程，其不同学习任务间的交流有利于突破局部最优以更加接近全局最优，而且其优化速度比用同一个方法对不同的优化目标接连运行多次更加有效率。In the embodiment, the optimization of different indicators can be regarded as different tasks through multi-task clustering, and the optimal individuals under each task can be found respectively, and expert knowledge can be used to find out which clustering indicator is most suitable for this type of medical image. Clustering indicators can choose indicators suitable for the actual situation to achieve better results. This method utilizes the migration process of multi-task learning, and the communication between different learning tasks is conducive to breaking through the local optimum to get closer to the global optimum, and its optimization speed is faster than using the same method to run multiple times for different optimization objectives. Efficient.

以上所述之实施例子只为本发明之较佳实施例，并非以此限制本发明的实施范围，故凡依本发明之形状、原理所作的变化，均应涵盖在本发明的保护范围内。The above-mentioned embodiments are only preferred embodiments of the present invention, and are not intended to limit the scope of implementation of the present invention. Therefore, any changes made according to the shape and principle of the present invention should be included within the protection scope of the present invention.

Claims

1. The medical image clustering method based on the multitask evolutionary algorithm is characterized by comprising the following steps of:

s1, extracting ROI feature description data of the medical image;

s2, reading ROI feature description data of the extracted medical image, and obtaining a plurality of clustering results under a multi-task framework by optimizing a plurality of clustering internal indexes by applying an NMP clustering rule;

and S3, selecting an optimal result from the results by using expert knowledge of the doctor.

2. The medical image clustering method based on the multi-task evolutionary algorithm as claimed in claim 1, wherein the ROI feature description data of the medical image extracted in step S1 comprises relative gray S1, relative area S2, relative centroid coordinates S3, circularity S4, angle S5 and symmetry S6.

3. The medical image clustering method based on the multitask evolutionary algorithm according to claim 2, wherein the specific process of extracting the ROI feature description data of the medical image in the step S1 is as follows:

s1-1, reading in a medical image;

s1-2, scanning the medical image to obtain the maximum value of the pixel number, the length, the width and the gray level of the image;

s1-3, detecting the symmetry of the medical image S6;

s1-4, extracting an ROI (region of interest) of the medical image according to the gray scale range;

s1-5, extracting a gray average value, the number of pixels, a longest axis, a shortest axis, a centroid coordinate and an angle from the ROI obtained in the step S1-4;

and S1-6, calculating the feature description data of the ROI area.

4. The medical image clustering method based on the multitask evolutionary algorithm according to the claim 3, wherein the step S1-3 is to detect the symmetry S6 of the medical image by:

folding the medical image for difference, performing binarization processing by using a gray threshold value, if the remaining pixel points are less than a set value, judging that the medical image is symmetrical, otherwise, judging that the medical image is asymmetrical;

the specific process of calculating the feature description data of the ROI region in step S1-6 is as follows:

relative gray s 1:

wherein, ROI.gray is the average gray of the ROI area, and IMAGE.gray is the average gray of the whole image;

relative area s 2: s2 is roi.area, which is the number of pixels in the ROI region, and image.area is the number of pixels in the entire image;

relative centroid coordinate s 3: s3 ═ roi.x/image.length, roi.y/image.height, roi.x is the abscissa of the ROI centroid, image.length is the length of the original image, roi.y is the ordinate of the ROI centroid, and image.height is the height of the original image;

circularity s 4: s4 ═ 4 × pi × roi²ROI is the number of pixels in the ROI region, and ROI is the number of pixels around the ROI region;

angle s 5: s5 (Orientation +90)/180, Orientation being the angle from the long axis of the ROI to the X-axis.

5. The medical image clustering method based on the multitask evolutionary algorithm according to the claim 1, wherein the specific process of the step S2 is as follows:

s2-1, reading ROI feature description data of the extracted medical image;

s2-2, initializing a population and setting the maximum iteration number n, wherein k is 0;

s2-3, combining the NMP clustering rule to execute clustering and evaluating the clustering indexes;

s2-4, calculating the skill factor tau of each individual;

s2-5, generating offspring; randomly selecting two individuals a and b from the population as parents, generating a random number rand which is greater than 0 and less than 1, if rand is less than an algorithm parameter rmp or skill factors tau of the two individuals are equal, executing analog binary crossing, otherwise, respectively executing variation on the two individuals; repeating the step of crossing or mutation until the number of filial generations is equal to the number of population individuals, and then entering the step S2-6;

s2-6, calculating the adaptive value of the generated filial generation under each clustering index optimization task;

s2-7, merging the parents and the offspring to form a new population, and recalculating and updating the skill factors tau and the scalar fitness values phi of all the individuals according to the fitness values after clustering;

s2-8, sorting individuals in the population according to the scalar fitness value phi, then sequentially selecting the individuals from good to bad to enter the next generation of population, and preferentially selecting the individuals with larger phi to enter the next generation of population; k is k + 1;

s2-9, if k is less than n, returning to the step S2-3, otherwise, stopping iteration;

and S2-10, finding out the individuals with the skill factor tau equal to 1, and recording the individuals as the individuals with the optimal current tasks.

6. The medical image clustering method based on the multitask evolution algorithm according to the claim 5, wherein in the step S2-2, the population is initialized by the following codes:

setting the maximum value K of the cluster category number_maxThe individual code of the population is K_max+K_maxVector of dimension x d

d is the number of dimensions of the data, m_ijCoordinate vector of cluster center, T_ij(j＝1,...,K_max) To obtain an activation threshold for a class centroid point;

the definition of the activation-derived centroid point is specifically as follows:

if T_ijIf the mass center point m is larger than 0.5, the corresponding cluster_ijActivated, otherwise not activated; if T_ijGreater than 1 or negative, reset to 1 or 0, if soAnd if the number of the centers of mass is less than the set minimum number of categories, randomly selecting a plurality of activation threshold values to activate so as to meet the requirement of the minimum number of categories.

7. The medical image clustering method based on the multitask evolutionary algorithm according to the claim 5, wherein in the step S2-3, the NMP clustering rule is as follows:

given a number N of data sets, the data is represented as X ═ X (X)₁,...,x_N) The centroid of the K clusters is C ═ C₁,...,C_KD denotes distance, then sample point x in the NMP rule_iA certain cluster class C and (i ═ 0.·, N)_hThe distance of (h ═ 1.., K) is defined as follows:

D(x_i,C_h)＝min{D(x_i,x_j),D(x_i,m_h)|x_j∈C_h}

that is, the distance from the sample to the cluster category is the distance from the sample to the cluster point with the smallest distance;

each sample is assigned to the nearest cluster, and all samples assigned to the same cluster constitute a candidate sample set, and the cluster is called an undetermined cluster of the samples; then, for each cluster, selecting a nearest sample from the candidate sample set, and merging the nearest sample points of all clusters to be called a nearest sample set; finally, finding a sample in the nearest sample set, wherein the distance between the sample and the undetermined cluster in which the sample is located is the smallest in the nearest sample set, and distributing the sample to the undetermined cluster; the above steps are repeated until all samples are allocated.

8. The medical image clustering method based on the multitask evolution algorithm according to the claim 5, wherein in the step S2-3, the clustering index comprises a CH index, a Dunn index and a SIL index, which respectively correspond to an optimization task; the indexes are as follows:

CH index:

in the above formula, the first and second carbon atoms are,

the trace of the inter-class dispersion matrix is represented, and m represents the mean vector of the whole data set;

dunn index:

in the above formula, D (C)_i,C_j) Representing the distance between the different classes as the distance between the two closest data points, the formula is expressed as follows:

(C_i) The two farthest distance points for this category are separated:

SIL index:

in the above formula, s_j＝(b_j-a_j)/max(a_j,b_j) Representing data point x_jThe width of the profile of (a); data point x_jAverage distance a to other data points of the class to which it belongs_jAnd minimum distance b to other category data points_jThe calculation formula of (a) is as follows:

9. the medical image clustering method based on the multitask evolutionary algorithm according to the claim 5, wherein in the step S2-5, the operation of simulating the binary crossing is as follows:

is provided with two parents x_a＝[x_a(1),...,x_a(d)]And x_b＝[x_b(1),...,x_b(d)]D is the dimension of the data, the distribution factor c (j) is first calculated:

wherein, beta is a system parameter larger than 0, and r is a random number larger than 0 and smaller than 1 in each dimension;

the obtained offspring is:

x_e(j)＝[(1+c(j))x_a(j)+(1-c(j))x_b(j)]/2

x_f(j)＝[(1+c(j))x_b(j)+(1-c(j))x_a(j)]/2。