CN113610139A - Multi-view-angle intensified image clustering method - Google Patents

Multi-view-angle intensified image clustering method Download PDF

Info

Publication number
CN113610139A
CN113610139A CN202110879412.7A CN202110879412A CN113610139A CN 113610139 A CN113610139 A CN 113610139A CN 202110879412 A CN202110879412 A CN 202110879412A CN 113610139 A CN113610139 A CN 113610139A
Authority
CN
China
Prior art keywords
clustering
view
feature
encoder
self
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110879412.7A
Other languages
Chinese (zh)
Inventor
高静
刘晨欣
金珊
陈志奎
李朋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dalian University of Technology
Original Assignee
Dalian University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dalian University of Technology filed Critical Dalian University of Technology
Priority to CN202110879412.7A priority Critical patent/CN113610139A/en
Publication of CN113610139A publication Critical patent/CN113610139A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a multi-view intensified image clustering method, which belongs to the field of image clustering and intensified learning and comprises the following steps: 1) pre-training a feature extraction network independent of each visual angle, and initializing a potential feature space of each visual angle; 2) pre-training a multi-view feature fusion network, and initializing fusion feature spaces of all views; 3) initializing a clustering environment by adopting a K-means method, and distributing a Bernoulli unit for a clustering prototype in the environment; 4) distributing random rewards in real time by using an online reward strategy, and dynamically updating Bernoulli units in the environment; 5) and updating parameters, and iteratively optimizing the clustering prototype until a convergence condition is met, thereby completing the multi-view reinforced clustering process. The online reward strategy is adopted for combined learning, integrating characterization and cluster adjustment, complementary information between visual angles and interaction information between a sample and a clustering prototype are fully acted on the clustering analysis process, and clustering performance is effectively improved.

Description

Multi-view-angle intensified image clustering method
Technical Field
The invention belongs to the field of image clustering and reinforcement learning, and relates to a multi-view reinforced image clustering method.
Background
With the wide application of technologies such as network information and electronic commerce, the ways of acquiring data information by human beings are more and more abundant, the collectable data volume is more and more, the data structure is more and more complex, and the data dimension is more and more high. The multi-view image data usually comes from different fields of data objects or measurement results of multiple angles, and contains abundant complementary information, which can effectively enhance the data analysis effect, but is affected by multi-source heterogeneity and data dimensionality of the multi-view image data, and the complementary information in the multi-view image data is difficult to be fully utilized. Therefore, a new method is urgently needed to be researched to deeply mine complementary information among massive multi-view image data.
Clustering is an important data analysis and processing technology in the field of machine learning and data mining, and aims to divide homogeneous data into the same subsets and heterogeneous data into different subsets. The multi-view clustering breaks through the limitation of insufficient data information in a single view on the clustering effect, considers the consistency and complementarity among multiple views, combines the characteristic information of multiple views, and improves the final clustering result. The early multi-view clustering method fuses multi-view information, associates the characteristics among views, and shows a better result compared with single-view clustering. However, the early methods generally assumed that data exists in only two views, and it is difficult to deal with multi-view data existing in three or more views and prone to the problem of multi-view data loss. Therefore, a researcher is inspired by the fact that the depth generation model can reason missing data in single-view clustering, a multi-view variational self-encoder (MVAE) model is provided, an independent Variational Automatic Encoder (VAE) of each view is combined through an expert network, joint distribution of multiple views is learned, deeper and effective multi-view features are obtained, and multi-view clustering performance is improved.
Although the current multi-view clustering method utilizes a depth generation model to capture complementary information among multiple views, and obtains a better clustering result, the existing multi-view clustering method only considers the inherent attribute of multi-view data, but neglects the relevance between the multi-view data and a clustering center, and is easy to make the clustering edge gradually fuzzified. The reinforcement learning utilizes a spontaneous learning strategy in interaction with the environment to achieve the maximum return, so that on the premise of obtaining effective information of multi-view image data, the information of data points and cluster points in the multi-view clustering environment is associated based on the reinforcement learning idea, the accuracy of the multi-view clustering effect is improved, and the method is a content worthy of research.
Disclosure of Invention
In order to solve the problems, the invention provides a multi-view enhanced image clustering method, which considers the use problem of interactive information of a clustering center and multi-view data in a clustering iteration process on the basis of using the consistency and complementarity of the multi-view data, and improves the clustering effect.
Firstly, the invention utilizes a depth self-encoder to perform dimensionality reduction on original multi-view high-dimensional image data and captures potential feature representation of each view of the data. Secondly, the invention designs a multi-view characteristic fusion strategy, which fuses potential characteristic representations of multiple views of data and acquires high-order complementary information of the multi-view data. Finally, the invention provides an online reward strategy based on reinforcement learning, so that real-time interaction of data points and cluster points in a clustering environment is realized, fusion characteristic representation is fully utilized, and a more accurate clustering result is obtained. In summary, the invention provides a multi-view enhanced image clustering method, which adopts an online reward learning mode to learn fusion representation information from multiple views of large-scale image data and instantly adjust cluster distribution so as to improve the multi-view image clustering performance, and adopts three cluster evaluation indexes: adjusting the Lande index (ARI), Normalized Mutual Information (NMI) and Accuracy (ACC) to verify the model effect.
In order to achieve the purpose, the technical scheme of the invention is as follows:
a multi-view intensified image clustering method comprises the following steps:
step 1, pre-training a feature extraction network independent of each visual angle, and acquiring potential feature representation of each visual angle;
step 2, pre-training a multi-view feature fusion network to obtain fusion feature representation of each view;
step 3, initializing a clustering environment by adopting a K-means method, and distributing a Bernoulli unit for a clustering prototype in the environment;
step 4, distributing random rewards in real time by using an online reward strategy, and dynamically updating the Bernoulli unit in the environment;
and 5, updating parameters, and iteratively optimizing the clustering prototype until a convergence condition is met, thereby completing the multi-view reinforced clustering process.
The invention has the beneficial effects that: the invention designs a multi-view intensified image clustering method aiming at image data, mainly considers that complementary information in the multi-view image data is utilized to learn high-efficiency fusion characteristic representation, improves the image clustering and characteristic learning effects, designs a intensified learning framework based on Bernoulli units for the purpose, fully utilizes the information of the whole clustering environment, and improves the performance of a clustering algorithm. The method measures the adjusted Lande index (ARI), the Normalized Mutual Information (NMI) and the Accuracy (ACC) of the clustering evaluation index, and shows that the method can effectively improve the image clustering performance.
Drawings
FIG. 1 is a frame diagram of the multi-view enhanced image clustering method of the present invention;
FIG. 2 is a flowchart illustrating an overall process of the multi-view enhanced image clustering method according to the present invention.
Detailed Description
The following further describes embodiments of the present invention with reference to the drawings.
FIG. 1 is a frame diagram of the multi-view enhanced image clustering method of the present invention. Firstly, respectively reducing the dimension of the high-dimensional features of V visual angles of the original data through a depth self-encoder, and acquiring the potential low-dimensional features of each visual angle. And secondly, fusing potential low-dimensional features of the V visual angles by adopting a multi-visual angle fusion feature network to generate multi-visual angle fusion features, and combining the consistent information and the complementary information of each visual angle. And then, mining the clustering centroid of the multi-view data as a clustering prototype by adopting a K-means method, and constructing a corresponding Bernoulli unit for the clustering centroid, wherein the Bernoulli unit is used for storing clustering information in the iterative optimization process, so that the initialization of a clustering environment is completed. And then, an online reward strategy is utilized to learn interactive information between the multi-view fusion feature and the clustering prototype, and the Bernoulli unit is changed in real time through reward and punishment signals, so that the dynamic updating of the clustering environment is realized. And finally, iteratively optimizing the clustering environment by using an enhanced learning algorithm in a mode of jointly learning the Bernoulli unit and the reward and punishment signals until a convergence condition is met.
The method comprises the following steps:
step 1, pre-training a feature extraction network with independent view angles, and acquiring potential feature representation of each view angle
The original multi-view image data has the characteristics of complex structure and high dimensionality, so that the understandability and usability of the data are reduced, and the problem of mode collapse is easily caused. In contrast, the invention adopts a feature extraction network to perform dimension reduction on the V visual angle features of the same object. Specifically, the feature extraction network is formed by stacking n self-encoder networks, and the self-encoder network is formed by an encoding layer and a decoding layer of a symmetrical network structure. The encoding layer compresses input high-dimensional data to a low-dimensional feature space layer by layer, and the decoding layer reconstructs the data in the low-dimensional feature space. In the training process, the last layer of the coding layer is used as a hidden layer, the error between input and reconstruction is minimized, and low-dimensional feature representation containing the data latent structure is obtained.
For the v-th view, a feature extraction network is constructed based on multiple self-encoder networks by taking the output of the previous self-encoder hidden layer as the input of the next self-encoder. Suppose that
Figure BDA0003191527810000031
For the jth input image data of this view, the calculation procedure of the first self-encoder network for the current view is as follows:
Figure BDA0003191527810000032
Figure BDA0003191527810000033
wherein the content of the first and second substances,
Figure BDA0003191527810000034
latent features output for the hidden layerThe representation of the sign is that,
Figure BDA0003191527810000035
is the reconstructed data of the self-encoder.
Figure BDA0003191527810000036
And
Figure BDA0003191527810000037
the activation functions of the coding layer and the decoding layer of the self-encoder respectively,
Figure BDA0003191527810000038
the training of the self-encoder employs reconstruction losses for encoding and decoding layer parameters
Figure BDA0003191527810000039
For the coding layer of the nth self-encoder of the network, when n is 1, the input of the self-encoder is the original image data, and the calculation process is shown as the formula (1) (2). When n is greater than 1, the input of the first self-encoder is
Figure BDA00031915278100000310
The calculation process is as shown in (1) and (2), the input from the encoder is the hidden layer characteristic of the (n-1) th encoding layer
Figure BDA00031915278100000311
The calculation process can be expressed as:
Figure BDA00031915278100000312
Figure BDA00031915278100000313
wherein, like the first self-encoder,
Figure BDA00031915278100000314
a potential feature representation output for the hidden layer,
Figure BDA00031915278100000315
is a reconstructed representation of the self-encoder.
Figure BDA00031915278100000316
And
Figure BDA00031915278100000317
the activation functions of the coding layer and the decoding layer of the self-encoder respectively,
Figure BDA00031915278100000318
parameters of the coding layer and the decoding layer. The training of the self-encoder employs reconstruction loss
Figure BDA00031915278100000319
Finally, splitting the n self-encoders of the v-th view into n encoding layers
Figure BDA00031915278100000320
And n decoding layers
Figure BDA00031915278100000321
And rearranged in a symmetrical manner
Figure BDA00031915278100000322
Constructing a feature extraction network of the perspective,
Figure BDA0003191527810000041
the part is an encoder, and the part is an encoder,
Figure BDA0003191527810000042
and part is a decoder. The feature extraction network is then trained with a minimum reconstruction loss and a stochastic gradient descent algorithm.
And according to the steps, respectively constructing a feature extraction network corresponding to each view angle, and generating potential feature representations of each view angle for the multi-view feature fusion network.
Step 2, pre-training a multi-view feature fusion network to obtain fusion feature representation of each view;
information in image data can be represented from different perspectives, and clustering using a single perspective for cluster analysis limits the clustering effect. In contrast, the invention designs a multi-view feature fusion network, and learns the fusion feature representation of all views in an end-to-end manner by combining feature extraction networks of different views, thereby enhancing the clustering effect.
Firstly, the potential features of the various visual angles of the same sample obtained in the step 1 are represented as H according to the complementary characteristics of the multiple visual anglesvCarrying out serial splicing:
Hf=cat(H1...Hv) (5)
where V is the number of viewing angles, H1,…,HvFor the potential feature representation of each view, cat (-) represents the stitching operation, HfAre fused feature representations.
The multi-view feature fusion network is formed by stacking n self-encoder networks, and fusion features are expressed as H in the pre-training processfAs input, a fused feature representation of the global association is learned. Specifically, the calculation process of the outermost network self-encoder is as follows:
Figure BDA0003191527810000043
Figure BDA0003191527810000044
wherein the content of the first and second substances,
Figure BDA0003191527810000045
potential features output for the outer coding layer, the feature dimension being less than HfDimension (d); hofReconstruction features for the inner decoder output, the feature dimensions being equivalent to
Figure BDA0003191527810000046
Dimension (d);
Figure BDA0003191527810000047
for the reconstructed representation of the outer decoding layer output, the characteristic dimension is equal to HfOf (c) is calculated. When the network consists of only one self-encoder, HofIs that
Figure BDA0003191527810000048
Ge,f(. and G)d,f() are the activation functions of the coding layer and the decoding layer respectively,
Figure BDA0003191527810000049
are model parameters.
For the constructed multi-view fusion network, an end-to-end learning mode is adopted, and a random gradient descent algorithm and a minimized reconstruction loss are adopted
Figure BDA00031915278100000410
The network is pre-trained. Then, the decoder input dimension of the feature extraction network of each view is set as the spliced potential feature HfOf the input data of
Figure BDA00031915278100000411
Finally, the trained multi-view fusion network takes the J multi-view image data as input to generate a globally relevant multi-view fusion feature H, namely a potential feature stacked from the innermost coding layer of the coding machine
Figure BDA00031915278100000412
Step 3, initializing a clustering environment and distributing Bernoulli units;
multi-view fusion characteristics of J multi-view image data obtained in step 2
Figure BDA0003191527810000051
And randomly select from themTaking K points as initial clustering centroid set
Figure BDA0003191527810000052
And updating the clustering centroid set C on the fusion characteristics H by adopting a K-means method until the convergence condition is met to obtain K initial virtual prototypes and realize the initialization of the clustering environment.
Specifically, the K-means method is based on the fact that the smaller the distance, the greater the similarity; and dividing the data points into K clusters according to the standard that the larger the distance is, the smaller the similarity is. And calculating the distance between the data points by adopting the Euclidean distance, wherein the calculation formula is as follows:
Figure BDA0003191527810000053
wherein h isiAnd hjMulti-view fusion features representing two different data points, Dist (-) represents the distance between the two.
In the dividing process of the clusters, the average value of the multi-view data point fusion features in the same cluster needs to be recalculated every iteration and is used as the centroid C ═ Ck1, 2., K }, and the calculation formula is:
Figure BDA0003191527810000054
wherein, ckRepresenting the centroid of the kth cluster, hiA multi-view fusion feature representing data points located at the cluster.
Firstly, randomly selecting K points from a sample set as an initial clustering centroid set
Figure BDA0003191527810000055
And (3) calculating the distance between each data point and each clustering centroid by adopting a formula (8), and allocating each data point to the clustering centroid with the nearest distance.
Finally, updating the clustering mass centers to be convergent through a heuristic iteration method to obtain K optimized clustering mass centers serving as data point clustersCentroid collection
Figure BDA0003191527810000056
And taking the optimized centroid set C as an initial virtual prototype in the enhanced clustering process, constructing a corresponding Bernoulli unit for each prototype, and storing corresponding information under the current clustering environment:
Bunit={w,p,dist,F} (10)
wherein w is the weight of the virtual prototype, p is the indicator variable of the virtual prototype, dist is the distance from the current data point to the virtual prototype, and F is the state parameter. Under the initial condition, w is set as the centroid weight after the K-means algorithm is updated, p, dist and F are set as 0, and the three change along with the iteration of the reinforced clustering process.
Through the above process, the initialization of the clustering environment is completed.
Step 4, multi-view angle enhanced clustering;
and after the multi-view fusion features of the input image data are obtained and the initialization of the clustering environment is completed, executing the online interaction of the multi-view fusion features and the virtual prototype in the clustering environment, and completing the multi-view intensified clustering process. Specifically, the multi-view enhanced clustering process includes the following two steps. Step 4-1, calculating state parameters between multi-view fusion features of input image data and virtual prototypes, measuring indication degrees of the input image data to the virtual prototypes in the current clustering environment, and selecting adjacent virtual prototypes as reinforcement objects; step 4-2, an online reward strategy is implemented, assigning a reward signal r, i.e. a reward or penalty, to the adjacent virtual prototype. And updating the clustering environment through the step 4-1 and the step 4-2, and iteratively converging to obtain an optimal virtual prototype as a clustering center.
Step 4-1 selecting the reinforcement object
The method adopts Bernoulli distribution as auxiliary distribution for activating the target unit, namely measures the indicativity of the input image to the virtual prototype through the Bernoulli distribution, selects an accurate reinforced object and ensures the confidence coefficient of a clustering result.
Firstly, randomly selecting a feature vector H from a fusion feature set H of an input imagejCalculating h using the above Euclidean distancejAnd virtual prototype ckA distance d betweenk(ii) a Secondly, calculating a state coefficient between the two through a Sigmoid function, wherein the calculation process is as follows:
dk=Dist(hj,ck) (11)
Figure BDA0003191527810000061
after obtaining the state coefficients, a cost function is used to measure the indication degree of the latent features of the multi-view image to the virtual prototype. The calculation formula is as follows:
pk=J(dk)=2×(1-F(dk)) (13)
wherein p iskFor indicating variables, representing the input image and the virtual prototype ckIndicates the degree. Ideally, p iskThe larger the value, the smaller the distance between the fusion feature and the clustering prototype is, the higher the similarity between the fusion feature and the clustering prototype is, and the stronger the indicativity of the input image to the virtual prototype is; otherwise, the weaker the signal is. And selecting the virtual prototype with stronger indicativity as a strengthening object.
Because the Bernoulli distribution is discrete and has uncertainty, in order to ensure the effectiveness of the selected reinforced object, the invention sets the random seed p and the indicator variable p of the selected virtual prototypekAnd comparing to obtain a calibration variable y, and correcting the negative influence of the invalid virtual prototype on the whole clustering result. The calculation process is as follows:
Figure BDA0003191527810000062
during the process of executing the step 4-1, the relevant information of the corresponding unit in the clustering environment is updated in real time,
4-2 Online reward policy
In order to efficiently utilize consistent information and complementary information among multiple visual angles of image data and to highlight the interactive information between the image data and a clustering environment, the invention adopts the idea of reinforcement learning, and for a Bernoulli unit under current setting, the effect on the clustering environment is determined through a proper online reward strategy, so as to obtain an optimal virtual prototype. In contrast, after the selection of the enhanced object is completed according to the indicative strength, a decision signal is applied to the Bernoulli unit corresponding to the enhanced object by using an online reward strategy so as to feed back the behavior generated after interaction of the input image and the clustering prototype in real time, and the invalid enhanced object is punished while the reward is performed on the effective enhanced object, so that the problem that the relevance between the data and the clusters is not fully considered in the multi-view image clustering process is solved.
Firstly, a reward and punishment factor r is distributed to a selected strengthening object according to a calibration variable yj
Figure BDA0003191527810000071
Wherein when calibrating the variable
Figure BDA0003191527810000072
If 1, the selected enhanced object is an effective object, the virtual prototype corresponding to the unit is close to the input image and conforms to the ideal condition, a forward decision should be applied, i.e. a reward factor is assigned to the unit, and then
Figure BDA0003191527810000073
Otherwise, the selected reinforcement object is an invalid object, and the virtual prototype corresponding to the unit is far away from the input image and goes against the ideal situation, and a reverse decision should be applied, that is, a penalty factor is assigned to the unit.
In the course of executing the above strategy, the weight of the virtual prototype does not change.
Step 5, updating parameters and optimizing clustering results;
after the multi-view reinforced clustering step is executed, the method adopts a strategy gradient algorithm to update the weight parameters of the selected reinforced objects, and the following formula is shown.
Figure BDA0003191527810000074
Wherein α represents the learning rate, which should be greater than 0; r is a reward and punishment factor obtained in the multi-view angle reinforced clustering step, bj,kTo strengthen the baseline. gj=gj(yj;wj,hj) As a function of probability density, the value is fused by the multi-view of the input imagejAnd weighted by wjThe selected reinforcement object of (a) calibrates the influence of the variable under the current clustering environment.
Figure BDA0003191527810000075
The method is used for measuring the feature transformation degree in the strategy gradient updating process and is along with the probability density function gjThe value of (c) is changed.
According to the indication degree of the multi-view intensified clustering, combining the calibration variables and the reward factors, setting an intensified baseline bj,kAnd (5) under the condition of 0, obtaining a weight updating formula of the final virtual prototype:
Figure BDA0003191527810000076
during iterative optimization, virtual prototypes
Figure BDA0003191527810000077
And (4) updating through a formula (16), and completing the multi-view intensified image clustering task when the clustering result reaches the preset training times.
The method comprises the following steps:
the whole process of the invention is divided into four parts: the method comprises the steps of preprocessing each visual angle feature, fusing multi-visual angle features, initializing a clustering environment and performing multi-visual angle intensified clustering. Firstly, a deep self-encoder network is constructed for each view of image data, dimension reduction processing is carried out on high-dimensional image data of each view by adopting a stacked self-encoder, and potential feature representation of each view is obtained. And secondly, constructing a multi-view fusion network, fusing the potential feature representations of all the views, and acquiring the fusion feature representations of all the views. Then, based on the fusion characteristic expression extracted in the multi-view characteristic fusion process, a clustering centroid is mined by adopting a K-means method, the clustering centroid of the band is stored to a corresponding Bernoulli unit as a virtual prototype, and the initialization of a clustering environment is completed. And finally, updating parameters in the clustering environment until convergence by adopting an online reward strategy and combining the multi-view fusion characteristics and the real-time interaction of the virtual prototype with the feedback interaction information of the Bernoulli unit. The specific flow is shown in figure 2.
And (4) verification result:
in the experiments of the present invention, two general image data sets were selected to verify the effectiveness of the present invention, wherein the detailed information of the data sets is shown in table 1.
MNIST handwritten digit data set: consisting of 70000 handwritten digital images, each data sample being a 28 x 28 pixel grayscale image. The present invention reconstructs each image into a 784-dimensional vector.
The fast-MNIST dataset: consisting of 70000 garment images, each data sample is a 28 x 28 pixel grayscale image. The present invention reconstructs each image into a 784-dimensional vector.
Table 1 details of the data set
Data set Number of samples Sample dimension Number of categories
MNIST 70000 784 10
Fashion-MNIST 70000 784 10
The evaluation criteria of the present invention are clustering Accuracy (ACC), Adjusted Land Index (ARI) and Normalized Mutual Information (NMI).
To verify the performance of the present invention, a depth multi-view clustering method (MAE + K-means) was chosen for comparison.
The results of the ACC, ARI and NMI performance of the methods proposed by the present invention and comparative methods on MNIST and fashin-MNIST datasets are shown in tables 2 and 3.
Table 2 comparison of results on MNIST dataset for each experiment
Experiments ACC ARI NMI
MAE+K-means 0.99985 0.99967 0.99951
The invention 0.9999 0.9999 0.9998
From tables 2 and 3, it can be observed that the method proposed by the present invention outperforms the comparative baseline method in all three evaluation indices ACC, ARI and NMI of the MNIST and Fashion-MNIST datasets, demonstrating the effectiveness of the present invention. Specifically, compared with the MAE + K-means method, the method has the advantages that the Bernoulli unit and the online reward strategy are adopted, correct decision behaviors are fed back through interaction of input image data and clustering prototypes, full utilization of multi-view complementary information and outstanding consideration of clustering environment are achieved, and clustering performance is improved.
TABLE 3 comparison of results on the Fashinon-MNIST dataset for each experiment
Experiments ACC ARI NMI
MAE+K-means 0.49443 0.36653 0.54398
The invention 0.5646 0.4393 0.5754

Claims (2)

1. A multi-view intensified image clustering method is characterized by comprising the following steps:
step 1, pre-training a feature extraction network independent of each visual angle, and acquiring potential feature representation of each visual angle;
the feature extraction network is formed by stacking n self-encoder networks, and each self-encoder network is formed by an encoding layer and a decoding layer of a symmetrical network structure; in the training process, the last layer of the coding layer is used as a hidden layer, the error between input and reconstruction is minimized, and low-dimensional feature representation of the latent structure of the data is obtained;
suppose that
Figure FDA0003191527800000011
J (th) input image data of a v (th) view, when n is 1, the input from the encoder is
Figure FDA0003191527800000012
The calculation process is as follows:
Figure FDA0003191527800000013
Figure FDA0003191527800000014
wherein the content of the first and second substances,
Figure FDA0003191527800000015
a potential feature representation output for the hidden layer,
Figure FDA0003191527800000016
reconstructing data for the self-encoder;
Figure FDA0003191527800000017
and
Figure FDA0003191527800000018
the activation functions of the coding layer and the decoding layer of the self-encoder respectively,
Figure FDA0003191527800000019
the training of the self-encoder employs reconstruction losses for encoding and decoding layer parameters
Figure FDA00031915278000000110
When n is greater than 1, the input of the first self-encoder is
Figure FDA00031915278000000111
The calculation process is as shown in (1) and (2), and the input of the rest self-encoders is the hidden layer characteristics of the (n-1) th encoding layer
Figure FDA00031915278000000112
The calculation process is as follows:
Figure FDA00031915278000000113
Figure FDA00031915278000000114
wherein, like the first self-encoder,
Figure FDA00031915278000000115
a potential feature representation output for the hidden layer,
Figure FDA00031915278000000116
is a reconstructed feature representation of the self-encoder;
Figure FDA00031915278000000117
and
Figure FDA00031915278000000118
the activation functions of the coding layer and the decoding layer of the self-encoder respectively,
Figure FDA00031915278000000119
parameters of an encoding layer and a decoding layer; the training of the self-encoder employs reconstruction loss
Figure FDA00031915278000000120
Finally, splitting the n self-encoders of the v-th view into n encoding layers
Figure FDA00031915278000000121
And n decoding layers
Figure FDA00031915278000000122
And rearranged in a symmetrical manner
Figure FDA00031915278000000123
Constructing a feature extraction network of the perspective,
Figure FDA00031915278000000124
the part is an encoder, and the part is an encoder,
Figure FDA00031915278000000125
the part is a decoder; then, training the feature extraction network by a minimum reconstruction loss and random gradient descent algorithm;
step 2, pre-training a multi-view feature fusion network to obtain fusion feature representation;
training the feature extraction network corresponding to each visual angle in the step 1 to generate an input sample xjIs potentially characteristic of the respective view of HvAnd according to each other of a plurality of viewing anglesAnd (3) performing series splicing on complementary characteristics:
Hf=cat(H1...Hv) (5)
where V is the number of viewing angles, H1,…,HvFor the potential feature representation of each view, cat (-) represents the stitching operation, HfIs a fused feature representation;
the multi-view feature fusion network is formed by stacking n self-encoder networks; during the training process, the network represents the fusion characteristics HfAs input, the calculation process of the outermost self-encoder is as follows:
Figure FDA0003191527800000021
Figure FDA0003191527800000022
wherein the content of the first and second substances,
Figure FDA0003191527800000023
potential features output for the outer coding layer, the feature dimension being less than HfDimension (d); hofReconstruction features for the inner decoder output, the feature dimensions being equivalent to
Figure FDA0003191527800000024
Dimension (d);
Figure FDA0003191527800000025
for the reconstructed representation of the outer decoding layer output, the characteristic dimension is equal to HfDimension (d); when the network consists of only one self-encoder, HofIs that
Figure FDA0003191527800000026
Ge,f(. and G)d,f() are the activation functions of the coding layer and the decoding layer respectively,
Figure FDA0003191527800000027
is a model parameter;
for the constructed multi-view fusion network, an end-to-end learning mode is adopted, and a random gradient descent algorithm and a minimized reconstruction loss are adopted
Figure FDA0003191527800000028
Pre-training a network; then, the decoder input dimension of the feature extraction network of each view is set as the spliced potential feature HfOf the input data of
Figure FDA0003191527800000029
Step 3, initializing a clustering environment and distributing Bernoulli units;
generating multi-view fusion characteristics of J multi-view image data by using the multi-view characteristic fusion network trained in the step 2
Figure FDA00031915278000000210
And randomly selecting K points from the K points as an initial clustering centroid set
Figure FDA00031915278000000211
Updating a clustering centroid set C on the fusion characteristic representation H by adopting a K-means algorithm to obtain K optimized clustering centroids;
clustering centroid set optimized by using K-means algorithm
Figure FDA00031915278000000212
Initializing a clustering environment as a virtual prototype, constructing a corresponding Bernoulli unit (w, p, dist, F), and storing information of the current clustering environment, wherein w is the weight of the virtual prototype, p is an indicator variable of the virtual prototype, dist is the distance from a current data point to the virtual prototype, and F is a state parameter; under the initial condition, w is set as the centroid weight, p and di, updated by the K-means algorithmst, F are set to 0;
step 4, multi-view angle enhanced clustering;
step 4-1, calculating state parameters between multi-view fusion features of input image data and virtual prototypes, measuring indication degrees of the input image data to the virtual prototypes in the current clustering environment, and selecting adjacent virtual prototypes as reinforcement objects;
firstly, randomly selecting a feature vector H from a fusion feature set H of an input imagejCalculating h by using Euclidean distancejAnd virtual prototype ckA distance d betweenk=Dist(hj,ck) (ii) a Secondly, calculating a state coefficient between the two through a Sigmoid function, wherein the calculation process is as follows:
Figure FDA0003191527800000031
after the state coefficients are obtained, measuring the indication degree of the potential features of the multi-view image to the virtual prototype by using a cost function; the calculation formula is as follows:
pk=J(dk)=2×(1-F(dk)) (12)
wherein p iskFor indicating variables, representing the input image and the virtual prototype ckThe degree of inter-indication; ideally, p iskThe value is large, namely the smaller the distance between the fusion feature and the clustering prototype is, the higher the similarity between the fusion feature and the clustering prototype is, and the stronger the indicativity of the input image to the virtual prototype is; otherwise, the weaker the result is; selecting a virtual prototype with strong indicativity as a strengthening object;
at the same time, a random seed p is setsIndicating variable p with selected virtual prototypekComparing to obtain a calibration variable y, and correcting the negative influence of the invalid virtual prototype on the whole clustering result; the calculation process is as follows:
Figure FDA0003191527800000032
during the process of executing the step 4-1, the relevant information of the corresponding unit in the clustering environment is updated in real time,
step 4-2, executing an online reward strategy, and distributing a reward signal r to the adjacent virtual prototype;
firstly, a reward and punishment factor r is distributed to a selected strengthening object according to a calibration variable yj
Figure FDA0003191527800000033
Wherein when calibrating the variable yj*If 1, the selected enhanced object is an effective object, the virtual prototype corresponding to the unit is close to the input image and conforms to the ideal condition, a forward decision should be applied, i.e. a reward factor is assigned to the unit, and then
Figure FDA0003191527800000034
Otherwise, the selected reinforced object is an invalid object, and the virtual prototype corresponding to the unit is far away from the input image and violates the ideal condition, and a reverse decision should be applied, namely a penalty factor is distributed to the unit;
updating the clustering environment through the step 4-1 and the step 4-2, and iteratively converging to obtain an optimal virtual prototype as a clustering center;
step 5, updating parameters and optimizing clustering results;
updating the weight parameters of the selected reinforced object by adopting a strategy gradient algorithm, wherein the formula is shown as the following formula;
Figure FDA0003191527800000035
wherein α represents the learning rate, which should be greater than 0; r is a reward and punishment factor obtained in the multi-view angle reinforced clustering step, bj,kA reinforcement baseline; gj=gj(yj;wj,hj) As a function of probability density, the value is fused by the multi-view of the input imagejAnd weighted by wjIs selected fromReinforcing the influence of the calibration variables of the object in the current clustering environment;
Figure FDA0003191527800000036
the method is used for measuring the feature transformation degree in the strategy gradient updating process and is along with the probability density function gjThe value of (a) changes;
according to the indication degree of the multi-view intensified clustering, combining the calibration variables and the reward factors, setting an intensified baseline bj,kAnd (5) under the condition of 0, obtaining a weight updating formula of the final virtual prototype:
Figure FDA0003191527800000041
during iterative optimization, virtual prototypes
Figure FDA0003191527800000042
And (4) updating through a formula (16), and finishing the multi-view intensified image clustering task when the clustering result reaches the preset training times.
2. The multi-view enhanced image clustering method according to claim 1, wherein the K-means clustering algorithm in step 3 is based on that the smaller the distance, the greater the similarity; dividing the data points into K clusters according to the standard that the distance is larger and the similarity is smaller; and calculating the distance between the data points by adopting the Euclidean distance, wherein the calculation formula is as follows:
Figure FDA0003191527800000043
wherein h isiAnd hjA multi-view fusion feature representing two different data points, Dist (-) representing the distance between the two;
in the dividing process of the clusters, the average value of the features in the same cluster needs to be recalculated every iteration and is used as the centroid C ═ Ck|k=1,2,...,K},The calculation formula is as follows:
Figure FDA0003191527800000044
wherein, ckRepresenting the centroid of the kth cluster, hiFeatures representing data points located in the cluster;
firstly, randomly selecting K points from a sample set as an initial clustering centroid set
Figure FDA0003191527800000045
Calculating the distance between each data point and each clustering centroid by adopting a formula (8), allocating each data point to the clustering centroid with the closest distance, updating the clustering centroids to be convergent by a heuristic iteration method, and obtaining K optimized clustering centroids serving as a data point clustering centroid set
Figure FDA0003191527800000046
CN202110879412.7A 2021-08-02 2021-08-02 Multi-view-angle intensified image clustering method Pending CN113610139A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110879412.7A CN113610139A (en) 2021-08-02 2021-08-02 Multi-view-angle intensified image clustering method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110879412.7A CN113610139A (en) 2021-08-02 2021-08-02 Multi-view-angle intensified image clustering method

Publications (1)

Publication Number Publication Date
CN113610139A true CN113610139A (en) 2021-11-05

Family

ID=78339034

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110879412.7A Pending CN113610139A (en) 2021-08-02 2021-08-02 Multi-view-angle intensified image clustering method

Country Status (1)

Country Link
CN (1) CN113610139A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114936615A (en) * 2022-07-25 2022-08-23 南京大数据集团有限公司 Small sample log information anomaly detection method based on characterization consistency correction
CN116522143A (en) * 2023-05-08 2023-08-01 深圳市大数据研究院 Model training method, clustering method, equipment and medium
CN117542057A (en) * 2024-01-09 2024-02-09 南京信息工程大学 Multi-view clustering method based on relationship among modular network modeling views

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114936615A (en) * 2022-07-25 2022-08-23 南京大数据集团有限公司 Small sample log information anomaly detection method based on characterization consistency correction
CN116522143A (en) * 2023-05-08 2023-08-01 深圳市大数据研究院 Model training method, clustering method, equipment and medium
CN116522143B (en) * 2023-05-08 2024-04-05 深圳市大数据研究院 Model training method, clustering method, equipment and medium
CN117542057A (en) * 2024-01-09 2024-02-09 南京信息工程大学 Multi-view clustering method based on relationship among modular network modeling views
CN117542057B (en) * 2024-01-09 2024-04-05 南京信息工程大学 Multi-view clustering method based on relationship among modular network modeling views

Similar Documents

Publication Publication Date Title
CN113610139A (en) Multi-view-angle intensified image clustering method
CN110674850A (en) Image description generation method based on attention mechanism
CN112464005B (en) Depth-enhanced image clustering method
CN108921047B (en) Multi-model voting mean value action identification method based on cross-layer fusion
CN110544297A (en) Three-dimensional model reconstruction method for single image
CN112308961B (en) Robot rapid robust three-dimensional reconstruction method based on layered Gaussian mixture model
CN107330902B (en) Chaotic genetic BP neural network image segmentation method based on Arnold transformation
CN103942571B (en) Graphic image sorting method based on genetic programming algorithm
CN108897791B (en) Image retrieval method based on depth convolution characteristics and semantic similarity measurement
CN112488070A (en) Neural network compression method for remote sensing image target detection
CN106971197A (en) The Subspace clustering method of multi-view data based on otherness and consistency constraint
CN109214429A (en) Localized loss multiple view based on matrix guidance regularization clusters machine learning method
CN102930275A (en) Remote sensing image feature selection method based on Cramer's V index
CN112417289A (en) Information intelligent recommendation method based on deep clustering
CN106355210B (en) Insulator Infrared Image feature representation method based on depth neuron response modes
CN107146241A (en) A kind of point cloud registration method based on differential evolution algorithm and TrimmedICP algorithms
CN114580525A (en) Gesture action classification method for data containing missing
CN115546525A (en) Multi-view clustering method and device, electronic equipment and storage medium
CN115496144A (en) Power distribution network operation scene determining method and device, computer equipment and storage medium
CN115203631A (en) Multi-modal data analysis method and system based on improved genetic algorithm
Tan et al. Deep adaptive fuzzy clustering for evolutionary unsupervised representation learning
CN114743058A (en) Width learning image classification method and device based on mixed norm regular constraint
CN113239199B (en) Credit classification method based on multi-party data set
CN117078312B (en) Advertisement putting management method and system based on artificial intelligence
CN111353525A (en) Modeling and missing value filling method for unbalanced incomplete data set

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination