CN115131588B

CN115131588B - Image robust clustering method based on fuzzy clustering

Info

Publication number: CN115131588B
Application number: CN202210665911.0A
Authority: CN
Inventors: 王靖宇; 张欣茹; 聂飞平; 李学龙
Original assignee: Northwestern Polytechnical University
Current assignee: Northwestern Polytechnical University
Priority date: 2022-06-13
Filing date: 2022-06-13
Publication date: 2024-02-23
Anticipated expiration: 2042-06-13
Also published as: CN115131588A

Abstract

The present invention relates to an image robust clustering method based on fuzzy clustering. While clustering image data, it filters out images contaminated by noise, retains pure and pollution-free image data, and has high robustness to noise changes. Great sex. Moreover, the present invention is an unsupervised algorithm, does not require the use of label data, and reduces a large amount of time spent in obtaining label data. The algorithm does not need to update the graph matrix during the solution process, thus reducing the computational complexity of the algorithm and speeding up the operation. Therefore, fast, effective and robust clustering of noise-contaminated images can be achieved. The present invention can adaptively calculate the corresponding regularization parameters of each sample by iteratively optimizing the regularization parameters of the objective function, which greatly reduces the difficulty of adjusting the regularization parameters during the application process, saves labor costs, and robustly While filtering out image data contaminated by noise, the accuracy of image clustering is improved.

Description

A robust image clustering method based on fuzzy clustering

技术领域Technical field

本发明属于图像识别与分类和模式识别领域，涉及一种基于模糊聚类的图像鲁棒聚类方法。The invention belongs to the field of image recognition and classification and pattern recognition, and relates to a robust image clustering method based on fuzzy clustering.

背景技术Background technique

随着计算机技术和数字成像系统的发展，人们通过图像传递信息越来越方便。然而，在真实环境下，图像信息容易受到噪声污染，使图像质量受到一定幅度的损失，对图像的有效识别造成困难。由于处理的图像信息日趋复杂且标签的获取难度越来越大，对无监督式图像聚类技术在信息时代应用受到广泛关注，图像聚类技术可以将图像数据库中的图片依据其相似性聚成簇，使得同一簇内的图片相似性尽量大，不同簇之间的相似性尽量小。而图像信息易受到噪声影响，如果对受噪声污染后的图像仍进行传统的无监督图像聚类，那么会极大地影响图像检索的准确率与可靠性。筛选出被噪声污染的图像，再将新的图像与数据库中相似度较高的簇逐一比对即可快速完成识别分类。因此，在图像检索之前对图像数据进行噪声抑制后聚类可以有效且快速实现高质量的图像数据检索。With the development of computer technology and digital imaging systems, it is becoming more and more convenient for people to transmit information through images. However, in real environments, image information is easily contaminated by noise, which causes a certain loss of image quality and makes effective image recognition difficult. As the image information being processed becomes increasingly complex and it becomes increasingly difficult to obtain labels, the application of unsupervised image clustering technology in the information age has received widespread attention. Image clustering technology can cluster pictures in image databases into groups based on their similarities. Clusters, so that the similarity of pictures in the same cluster is as large as possible and the similarity between different clusters is as small as possible. Image information is easily affected by noise. If traditional unsupervised image clustering is still performed on images contaminated by noise, the accuracy and reliability of image retrieval will be greatly affected. Screen out the images contaminated by noise, and then compare the new images one by one with the clusters with higher similarity in the database to quickly complete the identification and classification. Therefore, clustering image data after noise suppression before image retrieval can effectively and quickly achieve high-quality image data retrieval.

李康等人(《一种面向高光谱图像分类的模糊谱聚类算法》中国科技论文,2021,16(07):743-747.)一种面向高光谱遥感图像分类的模糊相似性度量谱聚类(FSMSC)算法，旨在通过引入模糊相似性度量与稳健锚图结构来构造有效的模糊相似度矩阵，提高聚类算法性能。然而更新图矩阵也会增加模糊聚类算法的时间复杂度，影响运算速度。虽然其将图学习和模糊聚类学习融入到一个联合学习框架中，但受限于传统的模糊聚类算法，其鲁棒性较差，无法有效的去除噪声数据，影响了后续的图像数据检索。Li Kang et al. ("A fuzzy spectral clustering algorithm for hyperspectral image classification" Chinese Science and Technology Papers, 2021,16(07):743-747.) A fuzzy similarity measure spectrum for hyperspectral remote sensing image classification The clustering (FSMSC) algorithm aims to construct an effective fuzzy similarity matrix and improve the performance of the clustering algorithm by introducing fuzzy similarity measurement and robust anchor graph structure. However, updating the graph matrix will also increase the time complexity of the fuzzy clustering algorithm and affect the operation speed. Although it integrates graph learning and fuzzy clustering learning into a joint learning framework, it is limited by the traditional fuzzy clustering algorithm, which has poor robustness and cannot effectively remove noise data, which affects subsequent image data retrieval. .

发明内容Contents of the invention

要解决的技术问题Technical issues to be solved

为了避免现有技术的不足之处，本发明提出一种基于模糊聚类的图像鲁棒聚类方法，针对目前已有的有监督图像算法需要耗费大量的时间获取数据标签，以及无法有效的解决噪声对图像数据检索影响，鲁棒性较差的问题。In order to avoid the shortcomings of the existing technology, the present invention proposes a robust image clustering method based on fuzzy clustering. The existing supervised image algorithms need to spend a lot of time to obtain data labels and cannot effectively solve the problem. The impact of noise on image data retrieval is a problem of poor robustness.

技术方案Technical solutions

一种基于模糊聚类的图像鲁棒聚类方法，其特征在于步骤如下：An image robust clustering method based on fuzzy clustering, characterized by the following steps:

步骤1：对于n张u×v分辨率的图片数据，将每张图片拉长得到一个1×d的行向量，其中d＝u×v；将n张图片组成的图像数据转化为目标数据矩阵其中矩阵的每一行/>代表一张图像；每张图像则代表一个数据样本；给出目标数据所包含的真实类别数c，随机初始化c个簇的聚类中心，即获得初始的/>是第j个簇的质心；Step 1: For n picture data with u×v resolution, stretch each picture to obtain a 1×d row vector, where d=u×v; convert the image data composed of n pictures into a target data matrix Each row of the matrix/> Represents an image; each image represents a data sample; given the number of real categories c contained in the target data, randomly initialize the cluster centers of c clusters, that is, obtain the initial /> is the centroid of the jth cluster;

步骤2：建立进行噪声抑制的鲁棒模糊聚类RFCM框架Step 2: Establish a robust fuzzy clustering RFCM framework for noise suppression

其中是模糊隶属度，聚类的质心矩阵为/>矩阵Y代表每个元素y_ij表示第i个样本属于第j个簇的隶属度；/>用来筛选n-k个噪声数据，s_i为s第i个元素的值；in is the fuzzy membership degree, and the centroid matrix of clustering is/> The matrix Y represents each element y _ij represents the membership degree of the i-th sample belonging to the j-th cluster;/> Used to filter nk noise data, s _i is the value of the i-th element of s;

步骤3：交替迭代优化鲁棒模糊聚类RFCM框架Step 3: Alternate iterative optimization of robust fuzzy clustering RFCM framework

求解步骤如下：The solution steps are as follows:

步骤3.1：随机初始化c个簇的聚类中心，获得初始的初始化Y中所有元素为y_ij＝1/c；定义e_i为以一定的隶属度对样本x_i到所有簇中心距离进行加权并求和所得到的值，将e_i按照从小到大的顺序排列为e₁≤e₂≤...≤e_k≤...≤e_n，计算得到其对应s_i；Step 3.1: Randomly initialize the cluster centers of c clusters to obtain the initial Initialize all elements in Y to y _ij =1/c; define e _i as the value obtained by weighting and summing the distances from sample x _i to all cluster centers with a certain degree of membership, and arrange e _i in order from small to large The arrangement is e ₁ ≤e ₂ ≤...≤e _k ≤...≤e _n , and the corresponding s _i is calculated;

步骤3.2：针对真实样本与噪声数据分别进行不同优化得到YStep 3.2: Perform different optimizations on real samples and noise data to obtain Y

前k个距离所有聚类中心最近的样本对应的s_i＝1，其余样本对应的s_i＝0；将与顺序后的e_i对应的样本x_i进行排序，得到排序后的数据矩阵其对应排序后隶属度矩阵/> The first k samples closest to all cluster centers correspond to s _i = 1, and the remaining samples correspond to s _i = 0; sort the samples _xi corresponding to the ordered e _i to obtain the sorted data matrix Its corresponding sorted membership matrix/>

步骤3.2.1：当样本对应的s_i＝1时，离真实样本最近的q个簇，稀疏化隶属度矩阵，限制y_i的l₀范数为q，定义第i个真实样本点/>到第j簇聚类中心距离的平方，RFCM框架等价转化为Step 3.2.1: When the s _i corresponding to the sample = 1, the q clusters closest to the real sample, sparse the membership matrix, and limit the l ₀ norm of y _i to q, defined The i-th real sample point/> The square of the distance to the jth cluster center, the RFCM framework is equivalent to

目标函数中的参数γ以自适应计算出最优的参数γ；The parameter γ in the objective function is adaptively calculated to calculate the optimal parameter γ;

优化得到所挑选真实样本对应最优的隶属度i∈{1,2,...,k}Optimize to obtain the optimal membership degree corresponding to the selected real sample i∈{1,2,...,k}

步骤3.2.2：对于优化过程中的噪声数据，当样本对应的s_i＝0时，由柯西不等式解得优化过程中噪声数据对应的最优隶属度值为Step 3.2.2: For the noise data in the optimization process, when the s _i corresponding to the sample = 0, the optimal membership value corresponding to the noise data in the optimization process is solved by Cauchy's inequality:

得到所有样本点隶属度的最优解为Obtain the optimal solution for the membership degree of all sample points for

步骤3.3：固定s和Y，得到RFCM框架的子问题Step 3.3: Fix s and Y to obtain the sub-problems of the RFCM framework

子问题对m求偏导等于0，解得：The partial derivative of the sub-problem with respect to m is equal to 0, and the solution is:

在每次优化过程中，m,Y,s不断更新，重新进行下一次迭代运算，直到m不再变化；一行样本数据对应一张图片，根据获得的隶属度矩阵Y，选择其每一行的最大值所对应的簇的标签作为这一图片所被划分的类别，即得到预测的标签向量，实现图像聚类簇划分；所获得的s向量中包含k个1，n-k个0，第i张图像对应的s_i＝0,那么这张图片被筛选为被噪声污染的图像。During each optimization process, m, Y, s are constantly updated, and the next iteration operation is performed again until m no longer changes; one row of sample data corresponds to one picture, and according to the obtained membership matrix Y, the maximum value of each row is selected The label of the cluster corresponding to the value is used as the category into which this picture is divided, that is, the predicted label vector is obtained to implement image clustering; the obtained s vector contains k 1s, nk 0s, and the i-th image The corresponding s _i =0, then this image is filtered as an image contaminated by noise.

有益效果beneficial effects

本发明提出的一种基于模糊聚类的图像鲁棒聚类方法，在聚类图像数据的同时，筛选出被噪声污染的图像，保留了纯净无污染的图像数据，对噪声变化具有较高的鲁棒性。并且本发明是一种无监督的算法，不需要用到标签数据，减小了大量的获取标签数据所用到的时间。算法在求解过程中不需要更新图矩阵，因此降低了算法的计算复杂度，加快了运算速度。因此可以实现对被噪声污染图像进行快速、有效且鲁棒的聚类。The invention proposes a robust image clustering method based on fuzzy clustering. While clustering image data, it filters out images contaminated by noise, retains pure and pollution-free image data, and has a high tolerance to noise changes. robustness. Moreover, the present invention is an unsupervised algorithm, does not require the use of label data, and reduces a large amount of time spent in obtaining label data. The algorithm does not need to update the graph matrix during the solution process, thus reducing the computational complexity of the algorithm and speeding up the operation. Therefore, fast, effective and robust clustering of noise-contaminated images can be achieved.

本发明提是一种基于FCM的改进的鲁棒模糊聚类框架的噪声抑制图像聚类方法，在应用过程中既能鲁棒地筛选出被噪声污染图像数据，又能用于图像聚类，同时提高了算法鲁棒性和精度，对隶属度矩阵的稀疏化处理提高了数据处理速度。在该算法中，目标函数由鲁棒噪声抑制项与正则化项构成，通过迭代优化目标函数，为每个样本点添加自适应权重，据此筛选纯净样本与噪声污染样本，增强了算法的鲁棒性。通过优化目标函数可筛选被噪声污染数据样本，进而进行噪声抑制的图像聚类。The present invention provides a noise suppression image clustering method based on an improved robust fuzzy clustering framework based on FCM. During the application process, it can not only robustly filter out noise-contaminated image data, but also be used for image clustering. At the same time, the robustness and accuracy of the algorithm are improved, and the sparse processing of the membership matrix improves the data processing speed. In this algorithm, the objective function consists of a robust noise suppression term and a regularization term. By iteratively optimizing the objective function, an adaptive weight is added to each sample point, and pure samples and noise-contaminated samples are screened accordingly, which enhances the robustness of the algorithm. Great sex. By optimizing the objective function, data samples contaminated by noise can be screened, and then noise-suppressed image clustering can be performed.

采用本发明的方法有益效果主要包括：The beneficial effects of adopting the method of the present invention mainly include:

(1)提出了一种鲁棒模糊聚类算法，在该算法中，目标函数的鲁棒噪声抑制项使得算法通过迭代优化目标函数，为每个样本点添加自适应权重，据此筛选纯净样本与噪声污染样本，增强了算法的鲁棒性。(1) A robust fuzzy clustering algorithm is proposed. In this algorithm, the robust noise suppression term of the objective function allows the algorithm to iteratively optimize the objective function, add adaptive weights to each sample point, and filter pure samples accordingly. Contaminated samples with noise enhance the robustness of the algorithm.

(2)本发明提出的一种基于鲁棒模糊聚类的噪声抑制图像聚类方法，在进行鲁棒噪声抑制的同时稀疏化了隶属度矩阵，得到更加有效的样本与特征分布结构，既避免了噪声污染对图像数据聚类的影响，又降低数据的存储量，减小数据的计算量，提高计算效率。(2) The present invention proposes a noise suppression image clustering method based on robust fuzzy clustering. While performing robust noise suppression, the membership matrix is sparsely obtained to obtain a more effective sample and feature distribution structure, which avoids It eliminates the impact of noise pollution on image data clustering, reduces the amount of data storage, reduces the amount of data calculation, and improves calculation efficiency.

(3)本发明可以通过迭代优化目标函数的正则化参数，自适应地计算出每个样本相应正则化参数，的在应用过程中极大地降低了调节正则化参数难度，节省了人力成本，在鲁棒地筛选出被噪声污染图像数据同时，提高了图像聚类准确度。(3) The present invention can adaptively calculate the corresponding regularization parameters of each sample by iteratively optimizing the regularization parameters of the objective function, which greatly reduces the difficulty of adjusting the regularization parameters during the application process and saves labor costs. Robustly filter out noise-contaminated image data while improving image clustering accuracy.

附图说明Description of drawings

图1：是方法流程图Figure 1: Method flow chart

图2：是部分噪声污染图像示例Figure 2: An example of some noise-contaminated images

图3：是方法在具体的数据集上的检测结果图Figure 3: Is the detection result of the method on a specific data set

具体实施方式Detailed ways

现结合实施例、附图对本发明作进一步描述：The present invention will now be further described with reference to the embodiments and drawings:

本发明是通过以下技术方案实现的，基于鲁棒模糊聚类的噪声抑制图像聚类方法，其具体步骤如下：The present invention is realized through the following technical solutions, a noise suppression image clustering method based on robust fuzzy clustering, the specific steps of which are as follows:

步骤1：获取图像数据信息构建数据矩阵Step 1: Obtain image data information and construct a data matrix

对于n张u×v分辨率的图片数据，将每张图片拉长得到一个1×d的行向量，其中d＝u×v；将n张图片组成的图像数据转化为目标数据矩阵其中矩阵的每一行/>代表一张图像；每张图像则代表一个数据样本；给出目标数据所包含的真实类别数c，随机初始化c个簇的聚类中心，即获得初始的/>是第j个簇的质心。For n picture data with u×v resolution, stretch each picture to obtain a 1×d row vector, where d=u×v; convert the image data composed of n pictures into a target data matrix Each row of the matrix/> Represents an image; each image represents a data sample; given the number of real categories c contained in the target data, randomly initialize the cluster centers of c clusters, that is, obtain the initial /> is the centroid of the jth cluster.

步骤2：建立可以进行噪声抑制的鲁棒模糊聚类(RFCM)框架Step 2: Establish a robust fuzzy clustering (RFCM) framework that can perform noise suppression

模糊C均值(Fuzzy C-means)算法简称FCM算法，为提高其噪声抑制鲁棒性，框架引入了噪声抑制项和正则化项。The Fuzzy C-means algorithm is referred to as the FCM algorithm. In order to improve its noise suppression robustness, the framework introduces noise suppression terms and regularization terms.

其中是模糊隶属度，聚类的质心矩阵为/>矩阵Y代表每个元素y_ij表示第i个样本属于第j个簇的隶属度。/>用来筛选n-k个噪声数据，s_i为s第i个元素的值。in is the fuzzy membership degree, and the centroid matrix of clustering is/> The matrix Y represents that each element y _ij represents the membership degree of the i-th sample belonging to the j-th cluster. /> Used to filter nk noise data, s _i is the value of the i-th element of s.

步骤3：交替迭代优化目标函数：Step 3: Alternate iteration to optimize the objective function:

采用交替迭代优化的方法求解目标函数中的m,Y,s三个变量，首先初始化m和Y，根据公式计算得到s；再固定s和m，针对真实样本与噪声数据分别进行不同优化得到Y；然后固定s和Y，依据公式求解m，依次循环直至收敛；The alternating iterative optimization method is used to solve the three variables m, Y, and s in the objective function. First, m and Y are initialized, and s is calculated according to the formula; then s and m are fixed, and different optimizations are performed for real samples and noise data to obtain Y. ; Then fix s and Y, solve m according to the formula, and loop until convergence;

求解步骤如下：The solution steps are as follows:

步骤3.1：根据随机初始化c个簇的聚类中心，获得初始的初始化Y中所有元素为y_ij＝1/c。考虑使样本分布于每一个类的初始概率均等。定义e_i为以一定的隶属度对样本x_i到所有簇中心距离进行加权并求和所得到的值。将e_i按照从小到大的顺序排列为e₁≤e₂≤...≤e_k≤…≤e_n，可以计算得到其对应s_i Step 3.1: Obtain the initial clustering center according to the random initialization of c clusters. Initialize all elements in Y to y _ij =1/c. Consider distributing samples to each class with equal initial probability. Define e _i as the value obtained by weighting and summing the distances from sample x _i to all cluster centers with a certain degree of membership. Arrange e _i in order from small to large as e ₁ ≤e ₂ ≤...≤e _k ≤...≤e _n , and its corresponding s _i can be calculated

步骤3.2：针对真实样本与噪声数据分别进行不同优化得到Y。Step 3.2: Perform different optimizations on real samples and noise data to obtain Y.

在步骤3.1中我们得到了前k个距离所有聚类中心最近的样本对应的s_i＝1，其余样本对应的s_i＝0。将e_i按照从小到大的顺序进行排序，将与其对应的样本x_i进行排序，可以得到排序后的数据矩阵其对应排序后隶属度矩阵/> In step 3.1, we obtained s _i =1 corresponding to the first k samples closest to all cluster centers, and s _i =0 corresponding to the remaining samples. Sort e _i in order from small to large, and sort the corresponding samples x _i to get the sorted data matrix. Its corresponding sorted membership matrix/>

步骤3.2.1：对于真实样本。当样本对应的s_i＝1时，只考虑离真实样本最近的q个簇，稀疏化隶属度矩阵，限制y_i的l₀范数为q。定义为挑选出的第i个真实样本点/>到第j簇聚类中心距离的平方，问题可等价转化为Step 3.2.1: For real samples. When the s _i corresponding to the sample = 1, only the q clusters closest to the real sample are considered, the membership matrix is sparse, and the l ₀ norm of y _i is limited to q. definition is the selected i-th real sample point/> The square of the distance to the jth cluster center, the problem can be equivalently transformed into

目标函数中的参数γ通常需要适当调整以避免平凡解的出现，本发明可以自适应计算出最优的参数γ。The parameter γ in the objective function usually needs to be adjusted appropriately to avoid the occurrence of trivial solutions. The present invention can adaptively calculate the optimal parameter γ.

步骤3.2.2：对于优化过程中的噪声数据。当样本对应的s_i＝0时，由柯西不等式解得优化过程中噪声数据对应的最优隶属度值为Step 3.2.2: For noisy data during optimization. When the s _i corresponding to the sample = 0, the optimal membership value corresponding to the noise data in the optimization process is obtained by solving Cauchy's inequality:

可以得到所有样本点隶属度的最优解为The optimal solution of the membership degree of all sample points can be obtained for

步骤3.3：Step 3.3:

将目标函数的子问题对m求偏导等于0，可以解得The partial derivative of the sub-problem of the objective function with respect to m is equal to 0, which can be solved

至此，m,Y,s更新完毕，接下来重新进行下一次迭代运算，直到m不再变化。一行样本数据对应一张图片，根据获得的隶属度矩阵Y，选择其每一行的最大值所对应的簇的标签作为这一图片所被划分的类别，即得到预测的标签向量，实现图像聚类簇划分。所获得的s向量中包含k个1，n-k个0，第i张图像对应的s_i＝0,那么这张图片被筛选为被噪声污染的图像。At this point, m, Y, s have been updated, and then the next iteration operation is performed again until m no longer changes. One row of sample data corresponds to one picture. According to the obtained membership matrix Y, select the label of the cluster corresponding to the maximum value of each row as the category into which this picture is divided. That is, the predicted label vector is obtained to implement image clustering. Cluster partitioning. The obtained s vector contains k 1s, nk 0s, and _si = 0 corresponding to the i-th image, then this image is screened as an image contaminated by noise.

具体实施例：Specific examples:

本发明基于鲁棒模糊聚类的噪声抑制图像聚类方法的综合模型求解过程如图1所示，选用ORL人脸图像数据集进行聚类示例，ORL数据集共包含400张人脸图像，分辨率为92×112，每张人脸图像对应一个样本，400张图像对应的真实标签为由聚类算法得到的400张图像的预测标签为/>其中真实标签z_t仅用于最终的聚类效果验证，不包含在聚类算法本身之中。为对算法进行测试，以添加40％比例的噪声为例，被噪声污染图像样本数p＝160。具体实施方式包括以下步骤：The comprehensive model solution process of the noise suppression image clustering method based on robust fuzzy clustering of the present invention is shown in Figure 1. The ORL face image data set is selected for clustering example. The ORL data set contains a total of 400 face images. The rate is 92×112, each face image corresponds to a sample, and the real labels corresponding to 400 images are The predicted labels of 400 images obtained by the clustering algorithm are/> The real label z _t is only used for the final clustering effect verification and is not included in the clustering algorithm itself. In order to test the algorithm, taking the addition of 40% noise as an example, the number of noise-contaminated image samples p=160. The specific implementation includes the following steps:

步骤一、输入ORL数据矩阵真实类别数c，被噪声污染图像样本数p＝160以及隶属度稀疏化参数q，其中矩阵的每一行/>为一个样本，n＝400为样本个数，d＝92×112＝10304为数据矩阵维度，c＝40为人脸数据包含的真实类别数。Step 1. Enter the ORL data matrix The number of real categories c, the number of noise-contaminated image samples p=160 and the membership sparsification parameter q, where each row of the matrix/> is a sample, n=400 is the number of samples, d=92×112=10304 is the data matrix dimension, and c=40 is the number of real categories contained in the face data.

随机初始化c个簇的聚类中心，即获得初始的m。初始化Y中所有元素为y_ij＝1/c＝1/40，k＝n-p＝240。考虑使样本分布于每一个类的初始概率均等。即可以固定m和Y，计算s。固定m和Y，目标函数转化为Randomly initialize the cluster centers of c clusters, that is, obtain the initial m. Initialize all elements in Y to y _ij =1/c=1/40, k=np=240. Consider distributing samples to each class with equal initial probability. That is, you can fix m and Y and calculate s. Fixed m and Y, the objective function is transformed into

定义e_i为以一定的隶属度对样本x_i到所有簇中心距离进行加权并求和所得到的值。Define e _i as the value obtained by weighting and summing the distances from sample x _i to all cluster centers with a certain degree of membership.

将e_i按照从小到大的顺序排列为e₁≤e₂≤...≤e_k≤…≤e_n，可以计算得到目标函数的最优解在约束s^T1＝k条件下其对应的s_i Arrange e _i in order from small to large as e ₁ ≤ e ₂ ≤...≤e _k ≤...≤e _n , we can calculate the optimal solution of the objective function under the constraint s ^T 1=k and its corresponding _i

据此可以筛选得到未被噪声污染的样本数据与被噪声污染样本，并在后续迭代过程中不断优化。Based on this, sample data that are not contaminated by noise and samples that are contaminated by noise can be screened out, and continuously optimized in the subsequent iteration process.

步骤二：针对真实样本与噪声数据分别进行不同优化得到Y。Step 2: Perform different optimizations on real samples and noise data to obtain Y.

在步骤一中得到了前k个距离所有聚类中心最近的样本对应的s_i＝1，其余样本对应的s_i＝0。将e_i按照从小到大的顺序进行排序，将与其对应的样本x_i进行排序，可以得到排序后的数据矩阵其对应排序后隶属度矩阵/> In step 1, the s _i =1 corresponding to the first k samples closest to all cluster centers are obtained, and the s _i =0 corresponding to the remaining samples are obtained. Sort e _i in order from small to large, and sort the corresponding samples x _i to get the sorted data matrix. Its corresponding sorted membership matrix/>

步骤2.1：对于真实样本。定义是排序后的数据矩阵/>第i行元素组成的向量。/>是排序后隶属度矩阵/>第i行的第j个元素。当样本对应的s_i＝1时，目标函数的子问题为Step 2.1: For real samples. definition is the sorted data matrix/> A vector composed of elements in row i. /> is the sorted membership matrix/> The j-th element of the i-th row. When s _i =1 corresponding to the sample, the sub-problem of the objective function is

只考虑离真实样本最近的q个簇，稀疏化隶属度矩阵，限制y_i的l₀范数为q。定义为挑选出的第i个真实样本点/>到第j簇聚类中心距离的平方，问题可等价转化为Only consider the q clusters closest to the real samples, sparse the membership matrix, and limit the l ₀ norm of y _i to q. definition is the selected i-th real sample point/> The square of the distance to the jth cluster center, the problem can be equivalently transformed into

原始目标函数的第二项为正则化项是为了避免两种极端情况的平凡解：只有最近邻的样本相似度为1与所有样本的相似度都为1/n。通过拉格朗日乘子法、KKT条件和约束 The second term of the original objective function is the regularization term in order to avoid the trivial solutions of two extreme cases: only the nearest neighbor sample has a similarity of 1 and the similarity of all samples is 1/n. Through Lagrange multiplier method, KKT conditions and constraints

由于问题的每一项对于i都独立，因此我们可以对每一个单独求解，采用拉格朗日乘子法求解目标函数等价子问题，优化得到所挑选真实样本对应最优的隶属度/>i∈{1,2,...,k}Since each term of the problem is independent for i, we can Solve separately, use the Lagrange multiplier method to solve the equivalent sub-problem of the objective function, and optimize to obtain the optimal membership degree corresponding to the selected real sample/> i∈{1,2,...,k}

步骤2.2：对于优化过程中的噪声数据。当样本对应的s_i＝0时，由柯西不等式解得优化过程中噪声数据对应的最优隶属度值为Step 2.2: For noisy data during optimization. When the s _i corresponding to the sample = 0, the optimal membership value corresponding to the noise data in the optimization process is obtained by solving Cauchy's inequality:

步骤三：固定s和Y，求解mStep 3: Fix s and Y and solve for m

此时目标函数子问题改写为At this time, the objective function sub-problem is rewritten as

至此，s，Y和m更新完毕，接下来重新进行下一次迭代运算，直至聚类中心m不再更新，即其变化小于某一阈值。被污染样本对应s_i＝0，求解结束后，将优化得到的s_i＝0对应的第i个样本筛选为被噪声污染的图像样本，共可以筛选出160个被噪声污染的图像样本。最终能够直接获取400张人脸图像的聚类预测标签以达到可观的聚类效果。区别于分类，聚类方法得到的预测标签只能在无监督的情况下达到分组的效果，因此预测标签中的数字虽然与真实类别标签一一对应，但却无法知道具体的对应关系，因此无监督分组后的人脸数据图像可以对无标签信息的人脸数据图像分组归类，以辅助人脸检索并大幅度提升检索精度和速度。通过对比真实标签z_t和聚类算法得到的预测标签z_p，计算图像聚类的准确度。At this point, s, Y and m have been updated, and then the next iteration operation is performed again until the cluster center m is no longer updated, that is, its change is less than a certain threshold. The contaminated sample corresponds to s _i =0. After the solution is completed, the i-th sample corresponding to s _i =0 obtained through optimization is screened as an image sample contaminated by noise. A total of 160 image samples contaminated by noise can be screened out. Finally, we can directly obtain the cluster prediction labels of 400 face images. to achieve considerable clustering effects. Different from classification, the predicted labels obtained by the clustering method can only achieve the effect of grouping without supervision. Therefore, although the numbers in the predicted labels correspond to the real category labels one-to-one, the specific correspondence cannot be known, so there is no Supervising the grouped face data images can group and classify the face data images without label information to assist face retrieval and greatly improve the retrieval accuracy and speed. The accuracy of image clustering is calculated by comparing the true label z _t with the predicted label z _p obtained by the clustering algorithm.

以ORL人脸图像数据集(400张图片，每张图片的像素为92×112)为例。在添加40％噪声时，FCM在ORL人脸图像数据上的聚类准确率仅为8.61％，聚类归一化互信息为14.13％。而在添加40％噪声时，本发明提出的基于鲁棒模糊聚类的噪声抑制图像聚类(RFCM)在ORL人脸图像数据集上的聚类准确率是64.83％，聚类归一化互信息为71.29％，分别提升了56.22％和57.16％，显著提高了对人脸图像数据的聚类精度。Take the ORL face image data set (400 pictures, each picture has 92×112 pixels) as an example. When adding 40% noise, the clustering accuracy of FCM on ORL face image data is only 8.61%, and the clustering normalized mutual information is 14.13%. When adding 40% noise, the clustering accuracy of the noise-suppressed image clustering (RFCM) based on robust fuzzy clustering proposed by the present invention on the ORL face image data set is 64.83%, and the clustering normalized interaction is 64.83%. The information is 71.29%, which is an improvement of 56.22% and 57.16% respectively, which significantly improves the clustering accuracy of face image data.

Claims

1. The image robust clustering method based on fuzzy clustering is characterized by comprising the following steps:

step 1: for n-sheet uXv resolutionDrawing each picture to obtain a row vector of 1×d, wherein d=u×v; converting image data composed of n pictures into target data matrixWherein each row of the matrix is>Representing an image; each image represents a data sample; giving the true class number c contained in the target data, randomly initializing the clustering centers of c clusters to obtain initial +.>Is the centroid of the j-th cluster;

step 2: establishing robust fuzzy clustering RFCM framework for noise suppression

Wherein the method comprises the steps ofIs fuzzy membership, and the centroid matrix of the cluster is +.>The matrix Y represents each element Y _ij Representing the membership degree of the ith sample belonging to the jth cluster; />For screening n-k noise data s _i Values for the ith element of s;

step 3: alternating iterative optimization robust fuzzy clustering RFCM framework

The solving steps are as follows:

step 3.1: randomly initializing the cluster centers of c clusters to obtain an initial clusterA kind of electronic deviceInitializing all elements in Y to Y _ij =1/c; definition e _i To a certain membership degree to sample x _i Weighting and summing the obtained values to all cluster center distances, and adding e _i Arranged in order from small to large as e ₁ ≤e ₂ ≤...≤e _k ≤...≤e _n Calculating to obtain the corresponding s _i ；

Step 3.2: different optimizations are respectively carried out on the real sample and the noise data to obtain Y

The first k samples nearest to all cluster centers correspond to s _i =1, s corresponding to the remaining samples _i =0; e after the sequence of the two steps _i Corresponding sample x _i Sequencing to obtain a sequenced data matrixCorresponding to the ordered membership matrix

Step 3.2.1: when the sample corresponds to s _i When=1, the q clusters nearest to the real sample, the membership matrix is thinned, and y is limited _i L of (2) ₀ The norm is q, defined asThe i-th real sample point->The distance squared to the j-th cluster center, the RFCM frame equivalent translates to

The parameter gamma in the objective function is used for adaptively calculating the optimal parameter gamma;

optimizing to obtain the optimal membership degree corresponding to the selected real sample

Step 3.2.2: for noise data in the optimization process, when the sample corresponds to s _i When=0, the optimal membership value corresponding to the noise data in the optimization process is obtained by solving the cauchy inequality

Obtaining the optimal solution of all sample point membership degreesIs that

Step 3.3: fixing s and Y, obtaining the sub-problem of the RFCM framework

The sub-problem is solved by solving the m bias guide equal to 0:

in each optimization process, m, Y and s are continuously updated, and the next iteration operation is carried out again until m is not changed; one row of sample data corresponds to one picture, and according to the obtained membership matrix Y, the label of the cluster corresponding to the maximum value of each row is selected as the classified type of the picture, namely a predicted label vector is obtained, and image clustering cluster division is realized; the obtained s vector contains k 1, n-k 0 s corresponding to the ith image _i =0, then this picture is screened as an image contaminated with noise.