CN115131588A

CN115131588A - A Robust Image Clustering Method Based on Fuzzy Clustering

Info

Publication number: CN115131588A
Application number: CN202210665911.0A
Authority: CN
Inventors: 王靖宇; 张欣茹; 聂飞平; 李学龙
Original assignee: Northwestern Polytechnical University
Current assignee: Northwestern Polytechnical University
Priority date: 2022-06-13
Filing date: 2022-06-13
Publication date: 2022-09-30
Anticipated expiration: 2042-06-13
Also published as: CN115131588B

Abstract

The invention relates to an image robust clustering method based on fuzzy clustering, which screens out images polluted by noise while clustering image data, retains pure and pollution-free image data and has higher robustness to noise change. The invention is an unsupervised algorithm, does not need to use the label data, and reduces a large amount of time for acquiring the label data. The algorithm does not need to update the graph matrix in the solving process, so the calculation complexity of the algorithm is reduced, and the calculation speed is accelerated. Therefore, the fast, effective and robust clustering of the noise-polluted images can be realized. According to the method, the corresponding regularization parameter of each sample can be calculated in a self-adaptive manner through the regularization parameter of the iterative optimization objective function, so that the difficulty in adjusting the regularization parameter is greatly reduced in the application process, the labor cost is saved, and the image clustering accuracy is improved while the image data polluted by noise is robustly screened out.

Description

A Robust Image Clustering Method Based on Fuzzy Clustering

技术领域technical field

本发明属于图像识别与分类和模式识别领域，涉及一种基于模糊聚类的图像鲁棒聚类方法。The invention belongs to the field of image recognition, classification and pattern recognition, and relates to an image robust clustering method based on fuzzy clustering.

背景技术Background technique

随着计算机技术和数字成像系统的发展，人们通过图像传递信息越来越方便。然而，在真实环境下，图像信息容易受到噪声污染，使图像质量受到一定幅度的损失，对图像的有效识别造成困难。由于处理的图像信息日趋复杂且标签的获取难度越来越大，对无监督式图像聚类技术在信息时代应用受到广泛关注，图像聚类技术可以将图像数据库中的图片依据其相似性聚成簇，使得同一簇内的图片相似性尽量大，不同簇之间的相似性尽量小。而图像信息易受到噪声影响，如果对受噪声污染后的图像仍进行传统的无监督图像聚类，那么会极大地影响图像检索的准确率与可靠性。筛选出被噪声污染的图像，再将新的图像与数据库中相似度较高的簇逐一比对即可快速完成识别分类。因此，在图像检索之前对图像数据进行噪声抑制后聚类可以有效且快速实现高质量的图像数据检索。With the development of computer technology and digital imaging system, it is more and more convenient for people to transmit information through images. However, in the real environment, the image information is easily polluted by noise, which makes the image quality suffer to a certain extent and makes it difficult to effectively identify the image. Due to the increasingly complex image information processed and the difficulty of obtaining labels, the application of unsupervised image clustering technology in the information age has received extensive attention. cluster, so that the similarity of pictures in the same cluster is as large as possible, and the similarity between different clusters is as small as possible. However, image information is easily affected by noise. If traditional unsupervised image clustering is still performed on images contaminated by noise, the accuracy and reliability of image retrieval will be greatly affected. Filter out the images polluted by noise, and then compare the new images with the clusters with high similarity in the database one by one to quickly complete the recognition and classification. Therefore, post-clustering the image data with noise suppression before image retrieval can effectively and quickly achieve high-quality image data retrieval.

李康等人(《一种面向高光谱图像分类的模糊谱聚类算法》中国科技论文,2021,16(07):743-747.)一种面向高光谱遥感图像分类的模糊相似性度量谱聚类(FSMSC)算法，旨在通过引入模糊相似性度量与稳健锚图结构来构造有效的模糊相似度矩阵，提高聚类算法性能。然而更新图矩阵也会增加模糊聚类算法的时间复杂度，影响运算速度。虽然其将图学习和模糊聚类学习融入到一个联合学习框架中，但受限于传统的模糊聚类算法，其鲁棒性较差，无法有效的去除噪声数据，影响了后续的图像数据检索。Li Kang et al. ("A Fuzzy Spectral Clustering Algorithm for Hyperspectral Image Classification" China Science and Technology Paper, 2021, 16(07):743-747.) A fuzzy similarity metric spectrum for hyperspectral remote sensing image classification Clustering (FSMSC) algorithm aims to construct effective fuzzy similarity matrix by introducing fuzzy similarity measure and robust anchor graph structure, and improve the performance of clustering algorithm. However, updating the graph matrix will also increase the time complexity of the fuzzy clustering algorithm and affect the operation speed. Although it integrates graph learning and fuzzy clustering learning into a joint learning framework, it is limited by the traditional fuzzy clustering algorithm, its robustness is poor, and it cannot effectively remove noise data, which affects the subsequent image data retrieval. .

发明内容SUMMARY OF THE INVENTION

要解决的技术问题technical problem to be solved

为了避免现有技术的不足之处，本发明提出一种基于模糊聚类的图像鲁棒聚类方法，针对目前已有的有监督图像算法需要耗费大量的时间获取数据标签，以及无法有效的解决噪声对图像数据检索影响，鲁棒性较差的问题。In order to avoid the shortcomings of the prior art, the present invention proposes a robust image clustering method based on fuzzy clustering. For the existing supervised image algorithms, it takes a lot of time to obtain data labels, and cannot effectively solve the problem. The influence of noise on image data retrieval, and the problem of poor robustness.

技术方案Technical solutions

一种基于模糊聚类的图像鲁棒聚类方法，其特征在于步骤如下：A robust image clustering method based on fuzzy clustering is characterized in that the steps are as follows:

步骤1：对于n张u×v分辨率的图片数据，将每张图片拉长得到一个1×d的行向量，其中d＝u×v；将n张图片组成的图像数据转化为目标数据矩阵

其中矩阵的每一行

代表一张图像；每张图像则代表一个数据样本；给出目标数据所包含的真实类别数c，随机初始化c个簇的聚类中心，即获得初始的

是第j个簇的质心；Step 1: For n pictures of u×v resolution, stretch each picture to get a 1×d row vector, where d=u×v; convert the image data composed of n pictures into a target data matrix

where each row of the matrix

represents an image; each image represents a data sample; given the number of real categories c contained in the target data, randomly initialize the cluster centers of c clusters, that is, to obtain the initial

is the centroid of the jth cluster;

步骤2：建立进行噪声抑制的鲁棒模糊聚类RFCM框架Step 2: Establish a robust fuzzy clustering RFCM framework for noise suppression

其中

是模糊隶属度，聚类的质心矩阵为

矩阵Y代表每个元素y_ij表示第i个样本属于第j个簇的隶属度；

用来筛选n-k个噪声数据，s_i为s第i个元素的值；in

is the fuzzy membership degree, and the centroid matrix of the cluster is

The matrix Y represents that each element y _ij represents the membership degree of the i-th sample belonging to the j-th cluster;

Used to filter nk noise data, s _i is the value of the i-th element of s;

步骤3：交替迭代优化鲁棒模糊聚类RFCM框架Step 3: Alternate iterative optimization of the robust fuzzy clustering RFCM framework

求解步骤如下：The solution steps are as follows:

步骤3.1：随机初始化c个簇的聚类中心，获得初始的

初始化Y中所有元素为y_ij＝1/c；定义e_i为以一定的隶属度对样本x_i到所有簇中心距离进行加权并求和所得到的值，将e_i按照从小到大的顺序排列为e₁≤e₂≤...≤e_k≤...≤e_n，计算得到其对应s_i；Step 3.1: Randomly initialize the cluster centers of c clusters to obtain the initial

Initialize all elements in Y as y _ij =1/c; define e _i as the value obtained by weighting and summing the distances from sample x _i to all cluster centers with a certain degree of membership, and order e _i from small to large The arrangement is e ₁ ≤e ₂ ≤...≤e _k ≤... _≤en , and its corresponding s _i is obtained by calculation;

步骤3.2：针对真实样本与噪声数据分别进行不同优化得到YStep 3.2: Perform different optimizations on real samples and noise data to obtain Y

前k个距离所有聚类中心最近的样本对应的s_i＝1，其余样本对应的s_i＝0；将与顺序后的e_i对应的样本x_i进行排序，得到排序后的数据矩阵

其对应排序后隶属度矩阵

The first k samples that are closest to all cluster centers correspond to s _i =1, and the remaining samples correspond to s _i =0; sort the samples x _i corresponding to the ordered e _i to obtain the sorted data matrix

Its corresponding sorted membership matrix

步骤3.2.1：当样本对应的s_i＝1时，离真实样本最近的q个簇，稀疏化隶属度矩阵，限制y_i的l₀范数为q，定义

第i个真实样本点

到第j簇聚类中心距离的平方，RFCM框架等价转化为Step 3.2.1: When _si = 1 corresponding to the sample, the q clusters closest to the real sample, sparse the membership matrix, limit the l ₀ norm of _yi to q, define

i-th real sample point

The square of the distance to the cluster center of the jth cluster, the RFCM framework is equivalently transformed into

目标函数中的参数γ以自适应计算出最优的参数γ；The parameter γ in the objective function is adaptively calculated to calculate the optimal parameter γ;

优化得到所挑选真实样本对应最优的隶属度

i∈{1,2,...,k}Optimization to obtain the optimal membership degree corresponding to the selected real samples

i∈{1,2,...,k}

步骤3.2.2：对于优化过程中的噪声数据，当样本对应的s_i＝0时，由柯西不等式解得优化过程中噪声数据对应的最优隶属度值为Step 3.2.2: For the noise data in the optimization process, when _si = 0 corresponding to the sample, the optimal membership value corresponding to the noise data in the optimization process is obtained from the solution of the Cauchy inequality:

得到所有样本点隶属度的最优解

为Obtain the optimal solution for the membership of all sample points

for

步骤3.3：固定s和Y，得到RFCM框架的子问题Step 3.3: Fix s and Y to get subproblems of RFCM framework

子问题对m求偏导等于0，解得：The partial derivative of the subproblem with respect to m is equal to 0, and the solution is:

在每次优化过程中，m,Y,s不断更新，重新进行下一次迭代运算，直到m不再变化；一行样本数据对应一张图片，根据获得的隶属度矩阵Y，选择其每一行的最大值所对应的簇的标签作为这一图片所被划分的类别，即得到预测的标签向量，实现图像聚类簇划分；所获得的s向量中包含k个1，n-k个0，第i张图像对应的s_i＝0,那么这张图片被筛选为被噪声污染的图像。In each optimization process, m, Y, s are continuously updated, and the next iterative operation is performed again until m no longer changes; a row of sample data corresponds to a picture, and the maximum value of each row is selected according to the obtained membership matrix Y. The label of the cluster corresponding to the value is used as the category into which this image is divided, that is, the predicted label vector is obtained, and the image clustering is realized; the obtained s vector contains k 1s, nk 0s, and the i-th image Corresponding _si = 0, then this image is screened as a noise-contaminated image.

有益效果beneficial effect

本发明提出的一种基于模糊聚类的图像鲁棒聚类方法，在聚类图像数据的同时，筛选出被噪声污染的图像，保留了纯净无污染的图像数据，对噪声变化具有较高的鲁棒性。并且本发明是一种无监督的算法，不需要用到标签数据，减小了大量的获取标签数据所用到的时间。算法在求解过程中不需要更新图矩阵，因此降低了算法的计算复杂度，加快了运算速度。因此可以实现对被噪声污染图像进行快速、有效且鲁棒的聚类。A robust image clustering method based on fuzzy clustering proposed by the present invention filters out noise-contaminated images while clustering image data, retains pure and uncontaminated image data, and has high resistance to noise changes. robustness. In addition, the present invention is an unsupervised algorithm, which does not need to use label data, thereby reducing a large amount of time required for obtaining label data. The algorithm does not need to update the graph matrix during the solution process, so the computational complexity of the algorithm is reduced and the operation speed is accelerated. Therefore, fast, efficient and robust clustering of noise-contaminated images can be achieved.

本发明提是一种基于FCM的改进的鲁棒模糊聚类框架的噪声抑制图像聚类方法，在应用过程中既能鲁棒地筛选出被噪声污染图像数据，又能用于图像聚类，同时提高了算法鲁棒性和精度，对隶属度矩阵的稀疏化处理提高了数据处理速度。在该算法中，目标函数由鲁棒噪声抑制项与正则化项构成，通过迭代优化目标函数，为每个样本点添加自适应权重，据此筛选纯净样本与噪声污染样本，增强了算法的鲁棒性。通过优化目标函数可筛选被噪声污染数据样本，进而进行噪声抑制的图像聚类。The invention provides a noise-suppressed image clustering method based on an improved robust fuzzy clustering framework based on FCM, which can robustly screen out noise-contaminated image data during the application process, and can also be used for image clustering. At the same time, the robustness and accuracy of the algorithm are improved, and the sparse processing of the membership matrix improves the data processing speed. In this algorithm, the objective function is composed of a robust noise suppression term and a regularization term. By iteratively optimizing the objective function, an adaptive weight is added to each sample point, and pure samples and noise-contaminated samples are screened accordingly, which enhances the robustness of the algorithm. Awesomeness. The noise-contaminated data samples can be filtered by optimizing the objective function, and then noise-suppressed image clustering can be performed.

采用本发明的方法有益效果主要包括：The beneficial effects of the method of the present invention mainly include:

(1)提出了一种鲁棒模糊聚类算法，在该算法中，目标函数的鲁棒噪声抑制项使得算法通过迭代优化目标函数，为每个样本点添加自适应权重，据此筛选纯净样本与噪声污染样本，增强了算法的鲁棒性。(1) A robust fuzzy clustering algorithm is proposed. In this algorithm, the robust noise suppression term of the objective function enables the algorithm to iteratively optimize the objective function, add adaptive weights to each sample point, and filter pure samples accordingly. Contaminates samples with noise, enhancing the robustness of the algorithm.

(2)本发明提出的一种基于鲁棒模糊聚类的噪声抑制图像聚类方法，在进行鲁棒噪声抑制的同时稀疏化了隶属度矩阵，得到更加有效的样本与特征分布结构，既避免了噪声污染对图像数据聚类的影响，又降低数据的存储量，减小数据的计算量，提高计算效率。(2) A noise-suppressed image clustering method based on robust fuzzy clustering proposed by the present invention sparses the membership matrix while performing robust noise suppression, and obtains a more effective sample and feature distribution structure, which not only avoids The impact of noise pollution on image data clustering is reduced, and the storage capacity of data is reduced, the calculation amount of data is reduced, and the calculation efficiency is improved.

(3)本发明可以通过迭代优化目标函数的正则化参数，自适应地计算出每个样本相应正则化参数，的在应用过程中极大地降低了调节正则化参数难度，节省了人力成本，在鲁棒地筛选出被噪声污染图像数据同时，提高了图像聚类准确度。(3) The present invention can iteratively optimize the regularization parameters of the objective function, and adaptively calculate the corresponding regularization parameters of each sample, which greatly reduces the difficulty of adjusting the regularization parameters in the application process and saves labor costs. It can robustly filter out the image data polluted by noise and improve the accuracy of image clustering.

附图说明Description of drawings

图1：是方法流程图Figure 1: is the flow chart of the method

图2：是部分噪声污染图像示例Figure 2: An example of a partial noise-contaminated image

图3：是方法在具体的数据集上的检测结果图Figure 3: It is a graph of the detection results of the method on a specific data set

具体实施方式Detailed ways

现结合实施例、附图对本发明作进一步描述：The present invention will now be further described in conjunction with the embodiments and accompanying drawings:

本发明是通过以下技术方案实现的，基于鲁棒模糊聚类的噪声抑制图像聚类方法，其具体步骤如下：The present invention is realized by the following technical solutions, and the specific steps of the noise suppression image clustering method based on robust fuzzy clustering are as follows:

步骤1：获取图像数据信息构建数据矩阵Step 1: Obtain image data information to build a data matrix

对于n张u×v分辨率的图片数据，将每张图片拉长得到一个1×d的行向量，其中d＝u×v；将n张图片组成的图像数据转化为目标数据矩阵

其中矩阵的每一行

是第j个簇的质心。For n pictures of u×v resolution, each picture is elongated to obtain a 1×d row vector, where d=u×v; the image data composed of n pictures is converted into a target data matrix

where each row of the matrix

is the centroid of the jth cluster.

步骤2：建立可以进行噪声抑制的鲁棒模糊聚类(RFCM)框架Step 2: Build a robust fuzzy clustering (RFCM) framework for noise suppression

模糊C均值(Fuzzy C-means)算法简称FCM算法，为提高其噪声抑制鲁棒性，框架引入了噪声抑制项和正则化项。The Fuzzy C-means algorithm is referred to as the FCM algorithm. In order to improve its noise suppression robustness, the framework introduces a noise suppression term and a regularization term.

其中

是模糊隶属度，聚类的质心矩阵为

矩阵Y代表每个元素y_ij表示第i个样本属于第j个簇的隶属度。

用来筛选n-k个噪声数据，s_i为s第i个元素的值。in

is the fuzzy membership degree, and the centroid matrix of the cluster is

The matrix Y represents that each element y _ij represents the degree of membership of the i-th sample to the j-th cluster.

Used to filter nk noise data, si is the value of the _ith element of s.

步骤3：交替迭代优化目标函数：Step 3: Alternately iteratively optimize the objective function:

采用交替迭代优化的方法求解目标函数中的m,Y,s三个变量，首先初始化m和Y，根据公式计算得到s；再固定s和m，针对真实样本与噪声数据分别进行不同优化得到Y；然后固定s和Y，依据公式求解m，依次循环直至收敛；The method of alternating iterative optimization is used to solve the three variables m, Y, and s in the objective function. First, m and Y are initialized, and s are calculated according to the formula; then s and m are fixed, and Y is obtained by different optimizations for real samples and noise data. ; Then fix s and Y, solve m according to the formula, and cycle in turn until convergence;

求解步骤如下：The solution steps are as follows:

步骤3.1：根据随机初始化c个簇的聚类中心，获得初始的

初始化Y中所有元素为y_ij＝1/c。考虑使样本分布于每一个类的初始概率均等。定义e_i为以一定的隶属度对样本x_i到所有簇中心距离进行加权并求和所得到的值。将e_i按照从小到大的顺序排列为e₁≤e₂≤...≤e_k≤…≤e_n，可以计算得到其对应s_i Step 3.1: According to the random initialization of the cluster centers of c clusters, obtain the initial

Initialize all elements in Y to be y _ij =1/c. Consider equalizing the initial probability that the samples are distributed across each class. Define e _i as the value obtained by weighting and summing the distances from sample _xi to all cluster centers with a certain degree of membership. Arrange e _i in ascending order as e ₁ ≤e ₂ ≤...≤e _k ≤... _≤en , and its corresponding s _i can be calculated

步骤3.2：针对真实样本与噪声数据分别进行不同优化得到Y。Step 3.2: Perform different optimizations on real samples and noise data to obtain Y.

在步骤3.1中我们得到了前k个距离所有聚类中心最近的样本对应的s_i＝1，其余样本对应的s_i＝0。将e_i按照从小到大的顺序进行排序，将与其对应的样本x_i进行排序，可以得到排序后的数据矩阵

其对应排序后隶属度矩阵

In step 3.1, we obtained _si = 1 corresponding to the first k samples that are closest to all cluster centers, and _si = 0 corresponding to the remaining samples. Sort e _i in ascending order, and sort the corresponding samples x _i to get the sorted data matrix

Its corresponding sorted membership matrix

步骤3.2.1：对于真实样本。当样本对应的s_i＝1时，只考虑离真实样本最近的q个簇，稀疏化隶属度矩阵，限制y_i的l₀范数为q。定义

为挑选出的第i个真实样本点

到第j簇聚类中心距离的平方，问题可等价转化为Step 3.2.1: For real samples. When _si = 1 corresponding to the sample, only the q clusters closest to the real sample are considered, the membership matrix is sparse, and the l ₀ norm of _yi is limited to q. definition

is the i-th real sample point selected

The square of the distance to the cluster center of the jth cluster, the problem can be equivalently transformed into

目标函数中的参数γ通常需要适当调整以避免平凡解的出现，本发明可以自适应计算出最优的参数γ。The parameter γ in the objective function usually needs to be properly adjusted to avoid the appearance of trivial solutions, and the present invention can adaptively calculate the optimal parameter γ.

优化得到所挑选真实样本对应最优的隶属度

i∈{1,2,...,k}

步骤3.2.2：对于优化过程中的噪声数据。当样本对应的s_i＝0时，由柯西不等式解得优化过程中噪声数据对应的最优隶属度值为Step 3.2.2: For noisy data during optimization. When s _i = 0 corresponding to the sample, the optimal membership degree corresponding to the noise data in the optimization process is obtained by solving the Cauchy inequality as

可以得到所有样本点隶属度的最优解

为The optimal solution of membership degree of all sample points can be obtained

for

步骤3.3：Step 3.3:

将目标函数的子问题对m求偏导等于0，可以解得Taking the partial derivative of the sub-problem of the objective function with respect to m equals 0, it can be solved as

至此，m,Y,s更新完毕，接下来重新进行下一次迭代运算，直到m不再变化。一行样本数据对应一张图片，根据获得的隶属度矩阵Y，选择其每一行的最大值所对应的簇的标签作为这一图片所被划分的类别，即得到预测的标签向量，实现图像聚类簇划分。所获得的s向量中包含k个1，n-k个0，第i张图像对应的s_i＝0,那么这张图片被筛选为被噪声污染的图像。At this point, m, Y, and s are updated, and the next iterative operation is performed again until m no longer changes. A row of sample data corresponds to a picture. According to the obtained membership matrix Y, the label of the cluster corresponding to the maximum value of each row is selected as the category into which the picture is divided, that is, the predicted label vector is obtained to realize image clustering. Cluster division. The obtained s vector contains k 1s, nk 0s, and _si = 0 corresponding to the i-th image, then this image is screened as a noise-contaminated image.

具体实施例：Specific examples:

本发明基于鲁棒模糊聚类的噪声抑制图像聚类方法的综合模型求解过程如图1所示，选用ORL人脸图像数据集进行聚类示例，ORL数据集共包含400张人脸图像，分辨率为92×112，每张人脸图像对应一个样本，400张图像对应的真实标签为

由聚类算法得到的400张图像的预测标签为

其中真实标签z_t仅用于最终的聚类效果验证，不包含在聚类算法本身之中。为对算法进行测试，以添加40％比例的噪声为例，被噪声污染图像样本数p＝160。具体实施方式包括以下步骤：The comprehensive model solving process of the noise-suppressed image clustering method based on robust fuzzy clustering of the present invention is shown in Figure 1. An example of clustering is performed by selecting the ORL face image data set. The ORL data set contains a total of 400 face images. The rate is 92×112, each face image corresponds to one sample, and the real labels corresponding to 400 images are

The predicted labels for the 400 images obtained by the clustering algorithm are

The real label z _t is only used for the final clustering effect verification and is not included in the clustering algorithm itself. To test the algorithm, taking adding 40% of the noise as an example, the number of image samples polluted by noise is p=160. The specific implementation includes the following steps:

步骤一、输入ORL数据矩阵

真实类别数c，被噪声污染图像样本数p＝160以及隶属度稀疏化参数q，其中矩阵的每一行

为一个样本，n＝400为样本个数，d＝92×112＝10304为数据矩阵维度，c＝40为人脸数据包含的真实类别数。Step 1. Input ORL data matrix

The number of true categories c, the number of noise-contaminated image samples p=160, and the membership sparsity parameter q, where each row of the matrix

is a sample, n=400 is the number of samples, d=92×112=10304 is the dimension of the data matrix, and c=40 is the number of real categories contained in the face data.

随机初始化c个簇的聚类中心，即获得初始的m。初始化Y中所有元素为y_ij＝1/c＝1/40，k＝n-p＝240。考虑使样本分布于每一个类的初始概率均等。即可以固定m和Y，计算s。固定m和Y，目标函数转化为Randomly initialize the cluster centers of c clusters, that is, obtain the initial m. All elements in Y are initialized as y _ij =1/c=1/40, k=np=240. Consider equalizing the initial probability that the samples are distributed across each class. That is, m and Y can be fixed, and s can be calculated. Fixing m and Y, the objective function is transformed into

定义e_i为以一定的隶属度对样本x_i到所有簇中心距离进行加权并求和所得到的值。Define e _i as the value obtained by weighting and summing the distances from sample _xi to all cluster centers with a certain degree of membership.

将e_i按照从小到大的顺序排列为e₁≤e₂≤...≤e_k≤…≤e_n，可以计算得到目标函数的最优解在约束s^T1＝k条件下其对应的s_i Arrange e _i in ascending order as e ₁ ≤e ₂ ≤...≤e _k ≤... _≤en , the optimal solution of the objective function can be calculated under the condition of constraint s ^T 1=k and its corresponding s _i

据此可以筛选得到未被噪声污染的样本数据与被噪声污染样本，并在后续迭代过程中不断优化。According to this, the sample data that is not polluted by noise and the samples polluted by noise can be screened and optimized continuously in the subsequent iteration process.

步骤二：针对真实样本与噪声数据分别进行不同优化得到Y。Step 2: Perform different optimizations on real samples and noise data to obtain Y.

在步骤一中得到了前k个距离所有聚类中心最近的样本对应的s_i＝1，其余样本对应的s_i＝0。将e_i按照从小到大的顺序进行排序，将与其对应的样本x_i进行排序，可以得到排序后的数据矩阵

其对应排序后隶属度矩阵

In step 1, _si = 1 corresponding to the first k samples that are closest to all cluster centers, and _si = 0 corresponding to the remaining samples. Sort e _i in ascending order, and sort the corresponding samples x _i to get the sorted data matrix

Its corresponding sorted membership matrix

步骤2.1：对于真实样本。定义

是排序后的数据矩阵

第i行元素组成的向量。

是排序后隶属度矩阵

第i行的第j个元素。当样本对应的s_i＝1时，目标函数的子问题为Step 2.1: For real samples. definition

is the sorted data matrix

A vector of elements in row i.

is the sorted membership matrix

The jth element of row i. When s _i = 1 corresponding to the sample, the sub-problem of the objective function is

只考虑离真实样本最近的q个簇，稀疏化隶属度矩阵，限制y_i的l₀范数为q。定义

为挑选出的第i个真实样本点

到第j簇聚类中心距离的平方，问题可等价转化为Consider only the q clusters closest to the real sample, sparse the membership matrix, and limit the l ₀ norm of y _i to q. definition

is the i-th real sample point selected

原始目标函数的第二项为正则化项是为了避免两种极端情况的平凡解：只有最近邻的样本相似度为1与所有样本的相似度都为1/n。通过拉格朗日乘子法、KKT条件和约束

The second term of the original objective function is a regularization term to avoid trivial solutions in two extreme cases: only the samples with the nearest neighbor have a similarity of 1 and all samples have a similarity of 1/n. Via Lagrange Multipliers, KKT Conditions and Constraints

由于问题的每一项对于i都独立，因此我们可以对每一个

单独求解，采用拉格朗日乘子法求解目标函数等价子问题，优化得到所挑选真实样本对应最优的隶属度

i∈{1,2,...,k}Since each term of the problem is independent of i, we can

Solve separately, use the Lagrangian multiplier method to solve the equivalent sub-problem of the objective function, and optimize to obtain the optimal membership degree corresponding to the selected real samples

i∈{1,2,...,k}

步骤2.2：对于优化过程中的噪声数据。当样本对应的s_i＝0时，由柯西不等式解得优化过程中噪声数据对应的最优隶属度值为Step 2.2: For noisy data during optimization. When s _i = 0 corresponding to the sample, the optimal membership degree corresponding to the noise data in the optimization process is obtained by solving the Cauchy inequality as

可以得到所有样本点隶属度的最优解

for

步骤三：固定s和Y，求解mStep 3: Fix s and Y, and solve for m

此时目标函数子问题改写为At this time, the objective function subproblem is rewritten as

至此，s，Y和m更新完毕，接下来重新进行下一次迭代运算，直至聚类中心m不再更新，即其变化小于某一阈值。被污染样本对应s_i＝0，求解结束后，将优化得到的s_i＝0对应的第i个样本筛选为被噪声污染的图像样本，共可以筛选出160个被噪声污染的图像样本。最终能够直接获取400张人脸图像的聚类预测标签

以达到可观的聚类效果。区别于分类，聚类方法得到的预测标签只能在无监督的情况下达到分组的效果，因此预测标签中的数字虽然与真实类别标签一一对应，但却无法知道具体的对应关系，因此无监督分组后的人脸数据图像可以对无标签信息的人脸数据图像分组归类，以辅助人脸检索并大幅度提升检索精度和速度。通过对比真实标签z_t和聚类算法得到的预测标签z_p，计算图像聚类的准确度。So far, s, Y, and m have been updated, and the next iterative operation is performed again until the cluster center m is no longer updated, that is, its change is less than a certain threshold. The contaminated sample corresponds to _si = 0. After the solution is completed, the ith sample corresponding to _si = 0 obtained by optimization is selected as the image sample contaminated by noise, and a total of 160 image samples contaminated by noise can be screened. Finally, the cluster prediction labels of 400 face images can be directly obtained

In order to achieve a considerable clustering effect. Different from classification, the predicted labels obtained by the clustering method can only achieve the effect of grouping without supervision. Therefore, although the numbers in the predicted labels correspond one-to-one with the real category labels, the specific correspondence cannot be known, so there is no The face data images after supervised grouping can group and classify the face data images without label information to assist face retrieval and greatly improve the retrieval accuracy and speed. The accuracy of image clustering is calculated by comparing the true label z _t with the predicted label z _p obtained by the clustering algorithm.

以ORL人脸图像数据集(400张图片，每张图片的像素为92×112)为例。在添加40％噪声时，FCM在ORL人脸图像数据上的聚类准确率仅为8.61％，聚类归一化互信息为14.13％。而在添加40％噪声时，本发明提出的基于鲁棒模糊聚类的噪声抑制图像聚类(RFCM)在ORL人脸图像数据集上的聚类准确率是64.83％，聚类归一化互信息为71.29％，分别提升了56.22％和57.16％，显著提高了对人脸图像数据的聚类精度。Take the ORL face image dataset (400 images, each image has a pixel size of 92×112) as an example. When adding 40% noise, the clustering accuracy of FCM on ORL face image data is only 8.61%, and the clustering normalized mutual information is 14.13%. When adding 40% noise, the clustering accuracy of the robust fuzzy clustering-based noise suppression image clustering (RFCM) proposed in the present invention is 64.83% on the ORL face image data set. The information is 71.29%, which is improved by 56.22% and 57.16% respectively, which significantly improves the clustering accuracy of face image data.

Claims

1. an image robust clustering method based on fuzzy clustering is characterized in that the steps are as follows:

Step 1: For n pictures of u×v resolution, stretch each picture to get a 1×d row vector, where d=u×v; convert the image data composed of n pictures into a target data matrix

where each row of the matrix

is the centroid of the jth cluster;

Step 2: Establish a robust fuzzy clustering RFCM framework for noise suppression

in

is the fuzzy membership degree, and the centroid matrix of the cluster is

Used to filter nk noise data, s _i is the value of the i-th element of s;

Step 3: Alternate iterative optimization of the robust fuzzy clustering RFCM framework

The solution steps are as follows:

Step 3.1: Randomly initialize the cluster centers of c clusters to obtain the initial

Step 3.2: Perform different optimizations on real samples and noise data to obtain Y

Its corresponding sorted membership matrix

Step 3.2.1: When _si = 1 corresponding to the sample, the q clusters closest to the real sample, sparse the membership matrix, limit the l ₀ norm of _yi to q, define

i-th real sample point

The parameter γ in the objective function is adaptively calculated to calculate the optimal parameter γ;

Optimization to obtain the optimal membership degree corresponding to the selected real samples

Step 3.2.2: For the noise data in the optimization process, when _si = 0 corresponding to the sample, the optimal membership value corresponding to the noise data in the optimization process is obtained from the solution of the Cauchy inequality:

Obtain the optimal solution for the membership of all sample points

for

Step 3.3: Fix s and Y to get subproblems of RFCM framework

The partial derivative of the subproblem with respect to m is equal to 0, and the solution is:

In each optimization process, m, Y, s are continuously updated, and the next iterative operation is performed again until m no longer changes; a row of sample data corresponds to a picture, and the maximum value of each row is selected according to the obtained membership matrix Y. The label of the cluster corresponding to the value is used as the category into which this image is divided, that is, the predicted label vector is obtained, and the image clustering is realized; the obtained s vector contains k 1s, nk 0s, and the i-th image Corresponding _si = 0, then this image is screened as a noise-contaminated image.