CN111563549B

CN111563549B - Medical image clustering method based on multitasking evolutionary algorithm

Info

Publication number: CN111563549B
Application number: CN202010364563.4A
Authority: CN
Inventors: 胡晓敏; 颜志鹏; 陈伟能; 李敏
Original assignee: Guangdong University of Technology
Current assignee: Guangdong University of Technology
Priority date: 2020-04-30
Filing date: 2020-04-30
Publication date: 2023-07-28
Anticipated expiration: 2040-04-30
Also published as: CN111563549A

Abstract

The invention discloses a medical image clustering method based on a multitasking evolution algorithm, which comprises the following steps: s1, extracting ROI feature description data of a medical image; s2, reading in ROI feature description data of the extracted medical image, and obtaining a plurality of clustering results under a multi-task framework by optimizing a plurality of clustering internal indexes by using NMP clustering rules; s3, selecting an optimal result from the results by utilizing expert knowledge of doctors. The invention can fully express the connotation of the medical image, optimize one population to obtain a plurality of clustering results, and converge to global optimum more easily through cross-domain communication, so that the clustering effect is more obvious.

Description

Medical image clustering method based on multitasking evolutionary algorithm

Technical Field

The invention relates to the two fields of medical technology and intelligent computation, in particular to a medical image clustering method based on a multitasking evolution algorithm.

Background

In recent 20 years, medical imaging technology has become one of the areas of rapid development in medical technology, and medical images are becoming easier to acquire and store. Among the image data, medical image data occupies a large proportion. In medical systems, medical images play an important role. By observing the medical image, the lesion part is judged more directly and more clearly, so that a more accurate diagnosis result is obtained. The medical image clustering is accurately carried out, so that scientific reference can be better provided when medical staff judges and diagnoses the etiology of the illness, thereby greatly reducing the misdiagnosis rate caused by insufficient eyesight resolution of human beings or subjectively insufficient clinical experience of medical staff, and further improving the utilization rate of medical images.

At present, for medical image clustering, although traditional clustering algorithms are transplanted to medical images at home and abroad, such as a K-means-based method and a plurality of varieties thereof, the methods have the defects of sensitivity to an initial center, sensitivity to K values, sensitivity to density-based methods, sensitivity to field radius and MinPtr values, larger influence of noise and lack of accuracy of grid-based methods.

In addition, an evolution algorithm is used for clustering, and genetic algorithm or differential algorithm based on the like are commonly used, but the clustering is based on a single-task framework, and only a single target can be optimized in one operation, so that a single optimization result is obtained.

Also, not every pixel in a medical image is worth noting by medical personnel, who may be more interested in distinctive pixel areas, referred to as ROI (region of interest). In previous image studies, researchers have extracted ROIs based on features of conventional images such as color, texture, and shape. But this is not applicable to medical images. Medical images have many characteristics compared to common image data.

Disclosure of Invention

The invention aims to overcome the defects of the prior art, provide a medical image clustering method based on a multi-task evolutionary algorithm, which can more fully express the connotation of medical images, simultaneously optimize a population to obtain a plurality of clustering results, more easily converge to global optimum through cross-domain communication and has more obvious clustering effect,

in order to achieve the above purpose, the technical scheme provided by the invention is as follows:

the medical image clustering method based on the multitasking evolution algorithm comprises the following steps:

s1, extracting ROI feature description data of a medical image;

s2, reading in ROI feature description data of the extracted medical image, and obtaining a plurality of clustering results under a multi-task framework by optimizing a plurality of clustering internal indexes by using NMP clustering rules;

s3, selecting an optimal result from the results by utilizing expert knowledge of doctors.

Further, the ROI characterization data of the medical image extracted in the step S1 includes a relative gray scale S1, a relative area S2, a relative centroid coordinate S3, a circularity S4, an angle S5, and a symmetry S6.

Further, the specific process of extracting ROI feature description data of the medical image in step S1 is as follows:

s1-1, reading in a medical image;

s1-2, scanning a medical image to obtain the maximum value of the pixel number, the length, the width and the gray level of the image;

s1-3, detecting symmetry S6 of the medical image;

s1-4, extracting an ROI region of the medical image according to the gray scale range;

s1-5, extracting gray average values, the number of pixels, the longest axis, the shortest axis, centroid coordinates and angles from the ROI area obtained in the step S1-4;

s1-6, calculating feature description data of the ROI area.

Further, the step S1-3 is to detect the symmetry S6 of the medical image:

folding the medical image to a difference, performing binarization processing by using a gray threshold value, and judging that the medical image is symmetrical if the residual pixel points are less than a set value, otherwise, judging that the medical image is asymmetrical;

the specific process of calculating the feature description data of the ROI area in the step S1-6 is as follows:

relative gray s1:wherein roi.gray is the average gray of the ROI area, image.gray is the average gray of the whole image;

relative area s2: s2=roi.area/IMAGE.area, ROI.area is the number of pixels of the ROI area, and image.area is the number of pixels of the whole image;

relative centroid coordinates s3: s3= (ROI. X/IMAGE.length, ROI.y/image. Height), ROI. X is the abscissa of the ROI centroid, image. Length is the length of the original image, ROI. Y is the ordinate of the ROI centroid, image. Height is the height of the original image;

rounding s4: s4=4×pi×roi area/roi. Peripheral ² Roi.area is the number of pixels of the ROI area, and roi.perimeter is the number of pixels around the ROI area;

angle s5: s5= (orientation+90)/180, orientation is the angle from the long axis to the X axis of the ROI.

Further, the specific process of step S2 is as follows:

s2-1, reading in ROI feature description data of an extracted medical image;

s2-2, initializing a population and the maximum iteration number n, wherein k=0;

s2-3, clustering is executed by combining NMP clustering rules, and the clustering index is used for evaluation;

s2-4, calculating the skill factor tau of each individual;

s2-5, generating offspring; randomly selecting two individuals a and b from the population as father, generating a random number rand which is more than 0 and less than 1, if rand is less than an algorithm parameter rmp or the skill factors tau of the two individuals are equal, executing simulated binary crossover, otherwise, executing mutation on the two individuals respectively; repeating the steps of hybridization or mutation until the number of the obtained filial generation is equal to the number of the population individuals, and entering the step S2-6;

s2-6, calculating the adaptive value of the generated offspring under each clustering index optimization task;

s2-7, combining the father and the filial generation to form a new population, and recalculating and updating skill factors tau and scalar adaptation values phi of all individuals according to the adaptation values after clustering;

s2-8, sequencing individuals in the population according to a scalar adaptive value phi, and then sequentially selecting the individuals from top to bottom to enter the next generation population, wherein individuals with larger phi are preferentially selected into the next generation population; k=k+1;

s2-9, if k is less than n, returning to the step S2-3, otherwise stopping iteration;

and S2-10, finding out an individual with the skill factor tau equal to 1, and recording the individual as the optimal individual of each task at present.

Further, in the step S2-2, the population is initialized by the following codes:

setting the maximum value K of the clustering class number _max Individuals of the population are encoded as K _max +K _max * Vector of d dimensiond is the number of data dimensions, m _ij Is the coordinate vector of the cluster center, T _ij (j＝1,...,K _max ) To obtain an activation threshold of the class centroid point;

the definition of the centroid point obtained by activation is specifically as follows:

if T _ij Greater than 0.5, the centroid point m of the corresponding cluster _ij Activated, otherwise not activated; if T _ij And if the centroid number is larger than 1 or is a negative number, resetting to 1 or 0, and if the obtained centroid number is smaller than the set minimum category number, randomly selecting a plurality of activation thresholds to activate so as to meet the requirement of the minimum category number.

Further, in step S2-3, the NMP clustering rule is specifically as follows:

given a data set of number N, the data is represented as x= (X) ₁ ,...,x _N ) The centroid of K clusters is c= { C ₁ ,...,C _K And D represents distance, then sample point x in NMP rule _i = (i=0,., N) and a certain cluster category C _h (h=1,., K) is defined as follows:

D(x _i ,C _h )＝min{D(x _i ,x _j ),D(x _i ,m _h )|x _j ∈C _h }

i.e., the distance from the sample to the cluster category is the distance from the sample to the smallest distance among the points of the cluster;

each sample is assigned to the nearest cluster, all samples assigned to the same cluster forming a candidate sample set, this cluster being referred to as an undetermined cluster of the samples; then, for each cluster, selecting a nearest sample from the candidate sample set, such nearest sample points of all clusters being combined to be referred to as a nearest sample set; finally, finding a sample in the nearest sample set, wherein the distance between the sample and the undetermined cluster where the sample is located is the smallest in the nearest sample set, and then distributing the sample into the undetermined clusters; the above steps are repeated until all samples have been dispensed.

Further, in step S2-3, the cluster indexes include CH indexes, dunn indexes, and SIL indexes, which respectively correspond to one optimization task; the indexes are specifically as follows:

CH index:

in the above-mentioned method, the step of,a trace representing an inter-class dispersion matrix, m representing an average vector of the entire dataset;

dunn index:

in the above, D (C _i ,C _j ) Representing the distance between the different categories as the distance between the two closest data points, the formula is as follows:δ(C _i ) Distance between two furthest points for this category:

SIL index:

in the above, s _j ＝(b _j -a _j )/max(a _j ,b _j ) Representing data point x _j Is a contour width of (2); data point x _j Average distance a to other data points of the class to which it belongs _j And minimum distance b to other classes of data points _j The calculation formula of (2) is as follows:

further, in step S2-5, the operation of the analog binary crossover is as follows:

is provided with two father generations x _a ＝[x _a (1),...,x _a (d)]And x _b ＝[x _b (1),...,x _b (d)]D is the dimension of the data, and the distribution factor c (j) is calculated first:

wherein, beta is a system parameter larger than 0, and r is a random number larger than 0 and smaller than 1 in each dimension;

the obtained sub-generations are:

x _e (j)＝[(1+c(j))x _a (j)+(1-c(j))x _b (j)]/2

x _f (j)＝[(1+c(j))x _b (j)+(1-c(j))x _a (j)]/2。

compared with the prior art, the scheme has the following principle and advantages:

1. through a multi-factor evolution algorithm, a population is optimized simultaneously aiming at internal indexes of clusters such as CH indexes, dunn indexes and SIL indexes to obtain a plurality of clustering results, and cross-domain communication is easier to converge to global optimum.

2. The traditional image feature mining method based on color, texture and shape often ignores information borne by the unique attribute of the medical image ROI, the ROI attribute used by the method is relative gray scale, relative area, relative centroid coordinates, circularity, angle and symmetry, and the characteristic feature of the medical image is considered when the traditional image feature is considered.

3. Based on the clustering rules of NMP (nearest multiple prototypes): when the traditional particle swarm optimization mode is applied to cluster optimization, the effect is poor for the problem of non-circular cluster of the cluster shape. Also, the distance-based clustering rule, NMP rule is a dynamic process, more flexible and generally better clustering effect.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions of the prior art, the services required in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the figures in the following description are only some embodiments of the present invention, and that other figures can be obtained according to these figures without inventive effort to a person skilled in the art.

FIG. 1 is an overall flow chart of a medical image clustering method based on a multitasking evolutionary algorithm of the present invention;

FIG. 2 is a flow chart of ROI data extraction from a medical image;

FIG. 3 is a schematic diagram of a multitasking cluster evolution algorithm code.

Detailed Description

The invention is further illustrated by the following examples:

as shown in fig. 1, the medical image clustering method based on the multitasking evolution algorithm according to the embodiment includes the following steps:

s1, extracting ROI characteristic description data of a medical image, wherein the ROI characteristic description data comprise relative gray scale S1, relative area S2, relative centroid coordinates S3, circularity S4, angle S5 and symmetry S6.

As shown in fig. 2, the specific process of extraction is as follows:

s1-1, reading in a medical image;

s1-3, detecting symmetry S6 of the medical image:

s1-6, calculating feature description data of the ROI area:

S2, after ROI feature description data of the medical image are extracted, data are read in, and a plurality of clustering results are obtained under a multi-task frame by optimizing a plurality of clustering internal indexes through NMP clustering rules; the specific process is as follows:

s2-1, reading in ROI feature description data of an extracted medical image;

s2-2, initializing a population and the maximum iteration number n, wherein k=0; wherein, this step completes initialization by the following code:

if T _ij Greater than 0.5, the centroid point m of the corresponding cluster _ij Activated, otherwise not activated; if T _ij And if the centroid number is larger than 1 or is a negative number, resetting to 1 or 0, and if the obtained centroid number is smaller than the set minimum category number, randomly selecting a plurality of activation thresholds to activate so as to meet the requirement of the minimum category number. As shown in the accompanying drawingsIn fig. 3, one dimension d is 2, the maximum clustering category number Kmax is 4, the front is the activation threshold, the second position 0.4 is less than 0.5, so that the corresponding second centroid is in an inactive state, and the threshold of other positions is greater than 0.5, so that the individual is activated. Thus, the number of clusters for this individual is 3. In practice, d is set to 6.

the NMP clustering rule in the step is specifically as follows:

D(x _i ,C _h )＝min{D(x _i ,x _j ),D(x _i ,m _h )|x _j ∈C _h }

The clustering indexes comprise CH indexes, dunn indexes and SIL indexes, and correspond to one optimization task respectively; the indexes are specifically as follows:

CH index:

dunn index:

SIL index:

s2-4, calculating the skill factor tau of each individual, and recording the optimal task which is best represented by the single individual i in all tasks. If individual i is the forefront in the individual ranking of the jth task, τ _i =j. When the skill factor is calculated, sequencing all individuals in the population according to the size of the adaptive value under a certain task, and then obtaining a corresponding serial number in the sequence as a certain individual i; at the factorial level r corresponding to task j _i ^j 。

S2-5, generating offspring; randomly selecting two individuals a and b from the population as father, generating a random number rand which is more than 0 and less than 1, if rand is less than an algorithm parameter rmp (random mating probability) or the skill factors tau of the two individuals are equal, executing simulated binary crossover, otherwise, executing mutation on the two individuals respectively; repeating the steps of hybridization or mutation until the number of the obtained filial generation is equal to the number of the population individuals, and entering the step S2-6;

wherein the operation of simulating binary interleaving is as follows:

is provided with two father generations x _a ＝[x _a (1),...,x _a (d)]And x _b ＝[x _b (1),...,x _b (d)]D is the dimension of the data, and a distribution factor (scatter factor) c (j) is calculated first:

the obtained sub-generations are:

x _e (j)＝[(1+c(j))x _a (j)+(1-c(j))x _b (j)]/2

x _f (j)＝[(1+c(j))x _b (j)+(1-c(j))x _a (j)]/2。

s2-7, combining the father and the filial generation to form a new population, and recalculating and updating skill factors tau and scalar adaptation values phi of all individuals according to the adaptation values after clustering; if the order of the ith individual in all K tasks is respectivelyThen the scalar fitness value of the individual is +.>I.e. the individual scalar adaptation value is determined by its factorial level in the task that performs best;

S3, obtaining 3 clustering results after the step S2 is finished, and selecting an optimal result from the clustering results by using expert knowledge of doctors.

In the embodiment, the optimization of different indexes can be regarded as different tasks through multi-task clustering, the optimal individuals under each task are respectively found, and expert knowledge is used for finding out the clustering index which is most suitable for the medical images. The clustering index can select an index suitable for actual conditions so as to achieve a better effect. The method utilizes the migration process of multi-task learning, the communication among different learning tasks is favorable for breaking through the local optimum to be closer to the global optimum, and the optimization speed is more efficient than the optimization speed of the same method for running different optimization targets successively for a plurality of times.

The above embodiments are only preferred embodiments of the present invention, and are not intended to limit the scope of the present invention, so variations in shape and principles of the present invention should be covered.

Claims

1. The medical image clustering method based on the multitasking evolution algorithm is characterized by comprising the following steps of:

s1, extracting ROI feature description data of a medical image;

s3, selecting an optimal result from the expert knowledge of a doctor;

the specific process of the step S2 is as follows:

s2-1, reading in ROI feature description data of an extracted medical image;

s2-2, initializing a population and the maximum iteration number n, wherein the current iteration number k=0;

s2-4, calculating the skill factor tau of each individual;

s2-10, finding out an individual with a skill factor tau equal to 1, and recording the individual as the optimal individual of each task at present;

in the step S2-3, the clustering indexes comprise CH indexes, dunn indexes and SIL indexes, and respectively correspond to one optimization task; the indexes are specifically as follows:

CH index:

in the above-mentioned method, the step of,representing an inter-class dispersion matrix S _B M represents the average value vector of the whole data set, and K is the number of clustering categories;

dunn index:

in the above, D (C _i ,C _j ) Representing the distance between the different categories as two closest data points C _i And C _j The distance between them is expressed as follows:δ(C _i ) Distance between two furthest points for this category:

SIL index:

2. the medical image clustering method based on the multitasking evolution algorithm according to claim 1, wherein the ROI feature description data of the medical image extracted in the step S1 includes a relative gray scale S1, a relative area S2, a relative centroid coordinate S3, a circularity S4, an angle S5 and a symmetry S6.

3. The medical image clustering method based on the multitasking evolution algorithm according to claim 2, wherein the specific process of extracting ROI feature description data of the medical image in step S1 is as follows:

s1-1, reading in a medical image;

s1-3, detecting symmetry S6 of the medical image;

s1-6, calculating feature description data of the ROI area.

4. The medical image clustering method based on the multitasking evolution algorithm according to claim 3, wherein the step S1-3 of detecting the medical image symmetry S6 is:

5. The medical image clustering method based on the multitasking evolution algorithm according to claim 1, wherein in the step S2-2, the population is initialized by the following codes:

setting the maximum value K of the clustering class number _max Individuals of the population are encoded as K _max +K _max * Vector of d dimensiond is the number of data dimensions, m _ij Is the coordinate vector of the cluster center, T _ij To get an activation threshold for a class centroid point, j=1 _max ；

6. The medical image clustering method based on the multitasking evolution algorithm according to claim 1, wherein in step S2-3, the NMP clustering rule is specifically as follows:

D(x _i ,C _h )＝min{D(x _i ,x _j ),D(x _i ,m _h )|x _j ∈C _h }

7. The medical image clustering method based on a multitasking evolution algorithm according to claim 1, characterized in that in step S2-5 the operation of simulating a binary intersection is as follows:

the obtained sub-generations are:

x _e (j)＝[(1+c(j))x _a (j)+(1-c(j))x _b (j)]/2

x _f (j)＝[(1+c(j))x _b (j)+(1-c(j))x _a (j)]/2。