CN113610139A

CN113610139A - Multi-view-angle intensified image clustering method

Info

Publication number: CN113610139A
Application number: CN202110879412.7A
Authority: CN
Inventors: 高静; 刘晨欣; 金珊; 陈志奎; 李朋
Original assignee: Dalian University of Technology
Current assignee: Dalian University of Technology
Priority date: 2021-08-02
Filing date: 2021-08-02
Publication date: 2021-11-05

Abstract

The invention provides a multi-view intensified image clustering method, which belongs to the field of image clustering and intensified learning and comprises the following steps: 1) pre-training a feature extraction network independent of each visual angle, and initializing a potential feature space of each visual angle; 2) pre-training a multi-view feature fusion network, and initializing fusion feature spaces of all views; 3) initializing a clustering environment by adopting a K-means method, and distributing a Bernoulli unit for a clustering prototype in the environment; 4) distributing random rewards in real time by using an online reward strategy, and dynamically updating Bernoulli units in the environment; 5) and updating parameters, and iteratively optimizing the clustering prototype until a convergence condition is met, thereby completing the multi-view reinforced clustering process. The online reward strategy is adopted for combined learning, integrating characterization and cluster adjustment, complementary information between visual angles and interaction information between a sample and a clustering prototype are fully acted on the clustering analysis process, and clustering performance is effectively improved.

Description

Multi-view-angle intensified image clustering method

Technical Field

The invention belongs to the field of image clustering and reinforcement learning, and relates to a multi-view reinforced image clustering method.

Background

With the wide application of technologies such as network information and electronic commerce, the ways of acquiring data information by human beings are more and more abundant, the collectable data volume is more and more, the data structure is more and more complex, and the data dimension is more and more high. The multi-view image data usually comes from different fields of data objects or measurement results of multiple angles, and contains abundant complementary information, which can effectively enhance the data analysis effect, but is affected by multi-source heterogeneity and data dimensionality of the multi-view image data, and the complementary information in the multi-view image data is difficult to be fully utilized. Therefore, a new method is urgently needed to be researched to deeply mine complementary information among massive multi-view image data.

Clustering is an important data analysis and processing technology in the field of machine learning and data mining, and aims to divide homogeneous data into the same subsets and heterogeneous data into different subsets. The multi-view clustering breaks through the limitation of insufficient data information in a single view on the clustering effect, considers the consistency and complementarity among multiple views, combines the characteristic information of multiple views, and improves the final clustering result. The early multi-view clustering method fuses multi-view information, associates the characteristics among views, and shows a better result compared with single-view clustering. However, the early methods generally assumed that data exists in only two views, and it is difficult to deal with multi-view data existing in three or more views and prone to the problem of multi-view data loss. Therefore, a researcher is inspired by the fact that the depth generation model can reason missing data in single-view clustering, a multi-view variational self-encoder (MVAE) model is provided, an independent Variational Automatic Encoder (VAE) of each view is combined through an expert network, joint distribution of multiple views is learned, deeper and effective multi-view features are obtained, and multi-view clustering performance is improved.

Although the current multi-view clustering method utilizes a depth generation model to capture complementary information among multiple views, and obtains a better clustering result, the existing multi-view clustering method only considers the inherent attribute of multi-view data, but neglects the relevance between the multi-view data and a clustering center, and is easy to make the clustering edge gradually fuzzified. The reinforcement learning utilizes a spontaneous learning strategy in interaction with the environment to achieve the maximum return, so that on the premise of obtaining effective information of multi-view image data, the information of data points and cluster points in the multi-view clustering environment is associated based on the reinforcement learning idea, the accuracy of the multi-view clustering effect is improved, and the method is a content worthy of research.

Disclosure of Invention

In order to solve the problems, the invention provides a multi-view enhanced image clustering method, which considers the use problem of interactive information of a clustering center and multi-view data in a clustering iteration process on the basis of using the consistency and complementarity of the multi-view data, and improves the clustering effect.

Firstly, the invention utilizes a depth self-encoder to perform dimensionality reduction on original multi-view high-dimensional image data and captures potential feature representation of each view of the data. Secondly, the invention designs a multi-view characteristic fusion strategy, which fuses potential characteristic representations of multiple views of data and acquires high-order complementary information of the multi-view data. Finally, the invention provides an online reward strategy based on reinforcement learning, so that real-time interaction of data points and cluster points in a clustering environment is realized, fusion characteristic representation is fully utilized, and a more accurate clustering result is obtained. In summary, the invention provides a multi-view enhanced image clustering method, which adopts an online reward learning mode to learn fusion representation information from multiple views of large-scale image data and instantly adjust cluster distribution so as to improve the multi-view image clustering performance, and adopts three cluster evaluation indexes: adjusting the Lande index (ARI), Normalized Mutual Information (NMI) and Accuracy (ACC) to verify the model effect.

In order to achieve the purpose, the technical scheme of the invention is as follows:

a multi-view intensified image clustering method comprises the following steps:

step 1, pre-training a feature extraction network independent of each visual angle, and acquiring potential feature representation of each visual angle;

step 2, pre-training a multi-view feature fusion network to obtain fusion feature representation of each view;

step 3, initializing a clustering environment by adopting a K-means method, and distributing a Bernoulli unit for a clustering prototype in the environment;

step 4, distributing random rewards in real time by using an online reward strategy, and dynamically updating the Bernoulli unit in the environment;

and 5, updating parameters, and iteratively optimizing the clustering prototype until a convergence condition is met, thereby completing the multi-view reinforced clustering process.

The invention has the beneficial effects that: the invention designs a multi-view intensified image clustering method aiming at image data, mainly considers that complementary information in the multi-view image data is utilized to learn high-efficiency fusion characteristic representation, improves the image clustering and characteristic learning effects, designs a intensified learning framework based on Bernoulli units for the purpose, fully utilizes the information of the whole clustering environment, and improves the performance of a clustering algorithm. The method measures the adjusted Lande index (ARI), the Normalized Mutual Information (NMI) and the Accuracy (ACC) of the clustering evaluation index, and shows that the method can effectively improve the image clustering performance.

Drawings

FIG. 1 is a frame diagram of the multi-view enhanced image clustering method of the present invention;

FIG. 2 is a flowchart illustrating an overall process of the multi-view enhanced image clustering method according to the present invention.

Detailed Description

The following further describes embodiments of the present invention with reference to the drawings.

FIG. 1 is a frame diagram of the multi-view enhanced image clustering method of the present invention. Firstly, respectively reducing the dimension of the high-dimensional features of V visual angles of the original data through a depth self-encoder, and acquiring the potential low-dimensional features of each visual angle. And secondly, fusing potential low-dimensional features of the V visual angles by adopting a multi-visual angle fusion feature network to generate multi-visual angle fusion features, and combining the consistent information and the complementary information of each visual angle. And then, mining the clustering centroid of the multi-view data as a clustering prototype by adopting a K-means method, and constructing a corresponding Bernoulli unit for the clustering centroid, wherein the Bernoulli unit is used for storing clustering information in the iterative optimization process, so that the initialization of a clustering environment is completed. And then, an online reward strategy is utilized to learn interactive information between the multi-view fusion feature and the clustering prototype, and the Bernoulli unit is changed in real time through reward and punishment signals, so that the dynamic updating of the clustering environment is realized. And finally, iteratively optimizing the clustering environment by using an enhanced learning algorithm in a mode of jointly learning the Bernoulli unit and the reward and punishment signals until a convergence condition is met.

The method comprises the following steps:

step 1, pre-training a feature extraction network with independent view angles, and acquiring potential feature representation of each view angle

The original multi-view image data has the characteristics of complex structure and high dimensionality, so that the understandability and usability of the data are reduced, and the problem of mode collapse is easily caused. In contrast, the invention adopts a feature extraction network to perform dimension reduction on the V visual angle features of the same object. Specifically, the feature extraction network is formed by stacking n self-encoder networks, and the self-encoder network is formed by an encoding layer and a decoding layer of a symmetrical network structure. The encoding layer compresses input high-dimensional data to a low-dimensional feature space layer by layer, and the decoding layer reconstructs the data in the low-dimensional feature space. In the training process, the last layer of the coding layer is used as a hidden layer, the error between input and reconstruction is minimized, and low-dimensional feature representation containing the data latent structure is obtained.

For the v-th view, a feature extraction network is constructed based on multiple self-encoder networks by taking the output of the previous self-encoder hidden layer as the input of the next self-encoder. Suppose that

For the jth input image data of this view, the calculation procedure of the first self-encoder network for the current view is as follows:

wherein,

latent features output for the hidden layerThe representation of the sign is that,

is the reconstructed data of the self-encoder.

And

the activation functions of the coding layer and the decoding layer of the self-encoder respectively,

the training of the self-encoder employs reconstruction losses for encoding and decoding layer parameters

For the coding layer of the nth self-encoder of the network, when n is 1, the input of the self-encoder is the original image data, and the calculation process is shown as the formula (1) (2). When n is greater than 1, the input of the first self-encoder is

The calculation process is as shown in (1) and (2), the input from the encoder is the hidden layer characteristic of the (n-1) th encoding layer

The calculation process can be expressed as:

wherein, like the first self-encoder,

a potential feature representation output for the hidden layer,

is a reconstructed representation of the self-encoder.

And

parameters of the coding layer and the decoding layer. The training of the self-encoder employs reconstruction loss

Finally, splitting the n self-encoders of the v-th view into n encoding layers

And n decoding layers

And rearranged in a symmetrical manner

Constructing a feature extraction network of the perspective,

the part is an encoder, and the part is an encoder,

and part is a decoder. The feature extraction network is then trained with a minimum reconstruction loss and a stochastic gradient descent algorithm.

And according to the steps, respectively constructing a feature extraction network corresponding to each view angle, and generating potential feature representations of each view angle for the multi-view feature fusion network.

information in image data can be represented from different perspectives, and clustering using a single perspective for cluster analysis limits the clustering effect. In contrast, the invention designs a multi-view feature fusion network, and learns the fusion feature representation of all views in an end-to-end manner by combining feature extraction networks of different views, thereby enhancing the clustering effect.

Firstly, the potential features of the various visual angles of the same sample obtained in the step 1 are represented as H according to the complementary characteristics of the multiple visual angles_vCarrying out serial splicing:

H_f＝cat(H₁...H_v) (5)

where V is the number of viewing angles, H₁,…,H_vFor the potential feature representation of each view, cat (-) represents the stitching operation, H_fAre fused feature representations.

The multi-view feature fusion network is formed by stacking n self-encoder networks, and fusion features are expressed as H in the pre-training process_fAs input, a fused feature representation of the global association is learned. Specifically, the calculation process of the outermost network self-encoder is as follows:

wherein,

potential features output for the outer coding layer, the feature dimension being less than H_fDimension (d); ho_fReconstruction features for the inner decoder output, the feature dimensions being equivalent to

Dimension (d);

for the reconstructed representation of the outer decoding layer output, the characteristic dimension is equal to H_fOf (c) is calculated. When the network consists of only one self-encoder, Ho_fIs that

G_e,f(. and G)_d,f() are the activation functions of the coding layer and the decoding layer respectively,

are model parameters.

For the constructed multi-view fusion network, an end-to-end learning mode is adopted, and a random gradient descent algorithm and a minimized reconstruction loss are adopted

The network is pre-trained. Then, the decoder input dimension of the feature extraction network of each view is set as the spliced potential feature H_fOf the input data of

Finally, the trained multi-view fusion network takes the J multi-view image data as input to generate a globally relevant multi-view fusion feature H, namely a potential feature stacked from the innermost coding layer of the coding machine

Step 3, initializing a clustering environment and distributing Bernoulli units;

multi-view fusion characteristics of J multi-view image data obtained in step 2

And randomly select from themTaking K points as initial clustering centroid set

And updating the clustering centroid set C on the fusion characteristics H by adopting a K-means method until the convergence condition is met to obtain K initial virtual prototypes and realize the initialization of the clustering environment.

Specifically, the K-means method is based on the fact that the smaller the distance, the greater the similarity; and dividing the data points into K clusters according to the standard that the larger the distance is, the smaller the similarity is. And calculating the distance between the data points by adopting the Euclidean distance, wherein the calculation formula is as follows:

wherein h is_iAnd h_jMulti-view fusion features representing two different data points, Dist (-) represents the distance between the two.

In the dividing process of the clusters, the average value of the multi-view data point fusion features in the same cluster needs to be recalculated every iteration and is used as the centroid C ═ C_k1, 2., K }, and the calculation formula is:

wherein, c_kRepresenting the centroid of the kth cluster, h_iA multi-view fusion feature representing data points located at the cluster.

Firstly, randomly selecting K points from a sample set as an initial clustering centroid set

And (3) calculating the distance between each data point and each clustering centroid by adopting a formula (8), and allocating each data point to the clustering centroid with the nearest distance.

Finally, updating the clustering mass centers to be convergent through a heuristic iteration method to obtain K optimized clustering mass centers serving as data point clustersCentroid collection

And taking the optimized centroid set C as an initial virtual prototype in the enhanced clustering process, constructing a corresponding Bernoulli unit for each prototype, and storing corresponding information under the current clustering environment:

Bunit＝{w,p,dist,F} (10)

wherein w is the weight of the virtual prototype, p is the indicator variable of the virtual prototype, dist is the distance from the current data point to the virtual prototype, and F is the state parameter. Under the initial condition, w is set as the centroid weight after the K-means algorithm is updated, p, dist and F are set as 0, and the three change along with the iteration of the reinforced clustering process.

Through the above process, the initialization of the clustering environment is completed.

Step 4, multi-view angle enhanced clustering;

and after the multi-view fusion features of the input image data are obtained and the initialization of the clustering environment is completed, executing the online interaction of the multi-view fusion features and the virtual prototype in the clustering environment, and completing the multi-view intensified clustering process. Specifically, the multi-view enhanced clustering process includes the following two steps. Step 4-1, calculating state parameters between multi-view fusion features of input image data and virtual prototypes, measuring indication degrees of the input image data to the virtual prototypes in the current clustering environment, and selecting adjacent virtual prototypes as reinforcement objects; step 4-2, an online reward strategy is implemented, assigning a reward signal r, i.e. a reward or penalty, to the adjacent virtual prototype. And updating the clustering environment through the step 4-1 and the step 4-2, and iteratively converging to obtain an optimal virtual prototype as a clustering center.

Step 4-1 selecting the reinforcement object

The method adopts Bernoulli distribution as auxiliary distribution for activating the target unit, namely measures the indicativity of the input image to the virtual prototype through the Bernoulli distribution, selects an accurate reinforced object and ensures the confidence coefficient of a clustering result.

Firstly, randomly selecting a feature vector H from a fusion feature set H of an input image_jCalculating h using the above Euclidean distance_jAnd virtual prototype c_kA distance d between_k(ii) a Secondly, calculating a state coefficient between the two through a Sigmoid function, wherein the calculation process is as follows:

d_k＝Dist(h_j,c_k) (11)

after obtaining the state coefficients, a cost function is used to measure the indication degree of the latent features of the multi-view image to the virtual prototype. The calculation formula is as follows:

p_k＝J(d_k)＝2×(1-F(d_k)) (13)

wherein p is_kFor indicating variables, representing the input image and the virtual prototype c_kIndicates the degree. Ideally, p is_kThe larger the value, the smaller the distance between the fusion feature and the clustering prototype is, the higher the similarity between the fusion feature and the clustering prototype is, and the stronger the indicativity of the input image to the virtual prototype is; otherwise, the weaker the signal is. And selecting the virtual prototype with stronger indicativity as a strengthening object.

Because the Bernoulli distribution is discrete and has uncertainty, in order to ensure the effectiveness of the selected reinforced object, the invention sets the random seed p and the indicator variable p of the selected virtual prototype_kAnd comparing to obtain a calibration variable y, and correcting the negative influence of the invalid virtual prototype on the whole clustering result. The calculation process is as follows:

during the process of executing the step 4-1, the relevant information of the corresponding unit in the clustering environment is updated in real time,

4-2 Online reward policy

In order to efficiently utilize consistent information and complementary information among multiple visual angles of image data and to highlight the interactive information between the image data and a clustering environment, the invention adopts the idea of reinforcement learning, and for a Bernoulli unit under current setting, the effect on the clustering environment is determined through a proper online reward strategy, so as to obtain an optimal virtual prototype. In contrast, after the selection of the enhanced object is completed according to the indicative strength, a decision signal is applied to the Bernoulli unit corresponding to the enhanced object by using an online reward strategy so as to feed back the behavior generated after interaction of the input image and the clustering prototype in real time, and the invalid enhanced object is punished while the reward is performed on the effective enhanced object, so that the problem that the relevance between the data and the clusters is not fully considered in the multi-view image clustering process is solved.

Firstly, a reward and punishment factor r is distributed to a selected strengthening object according to a calibration variable y_j：

Wherein when calibrating the variable

If 1, the selected enhanced object is an effective object, the virtual prototype corresponding to the unit is close to the input image and conforms to the ideal condition, a forward decision should be applied, i.e. a reward factor is assigned to the unit, and then

Otherwise, the selected reinforcement object is an invalid object, and the virtual prototype corresponding to the unit is far away from the input image and goes against the ideal situation, and a reverse decision should be applied, that is, a penalty factor is assigned to the unit.

In the course of executing the above strategy, the weight of the virtual prototype does not change.

Step 5, updating parameters and optimizing clustering results;

after the multi-view reinforced clustering step is executed, the method adopts a strategy gradient algorithm to update the weight parameters of the selected reinforced objects, and the following formula is shown.

Wherein α represents the learning rate, which should be greater than 0; r is a reward and punishment factor obtained in the multi-view angle reinforced clustering step, b_j,kTo strengthen the baseline. g_j＝g_j(y_j；w_j,h_j) As a function of probability density, the value is fused by the multi-view of the input image_jAnd weighted by w_jThe selected reinforcement object of (a) calibrates the influence of the variable under the current clustering environment.

The method is used for measuring the feature transformation degree in the strategy gradient updating process and is along with the probability density function g_jThe value of (c) is changed.

According to the indication degree of the multi-view intensified clustering, combining the calibration variables and the reward factors, setting an intensified baseline b_j,kAnd (5) under the condition of 0, obtaining a weight updating formula of the final virtual prototype:

during iterative optimization, virtual prototypes

And (4) updating through a formula (16), and completing the multi-view intensified image clustering task when the clustering result reaches the preset training times.

The method comprises the following steps:

the whole process of the invention is divided into four parts: the method comprises the steps of preprocessing each visual angle feature, fusing multi-visual angle features, initializing a clustering environment and performing multi-visual angle intensified clustering. Firstly, a deep self-encoder network is constructed for each view of image data, dimension reduction processing is carried out on high-dimensional image data of each view by adopting a stacked self-encoder, and potential feature representation of each view is obtained. And secondly, constructing a multi-view fusion network, fusing the potential feature representations of all the views, and acquiring the fusion feature representations of all the views. Then, based on the fusion characteristic expression extracted in the multi-view characteristic fusion process, a clustering centroid is mined by adopting a K-means method, the clustering centroid of the band is stored to a corresponding Bernoulli unit as a virtual prototype, and the initialization of a clustering environment is completed. And finally, updating parameters in the clustering environment until convergence by adopting an online reward strategy and combining the multi-view fusion characteristics and the real-time interaction of the virtual prototype with the feedback interaction information of the Bernoulli unit. The specific flow is shown in figure 2.

And (4) verification result:

in the experiments of the present invention, two general image data sets were selected to verify the effectiveness of the present invention, wherein the detailed information of the data sets is shown in table 1.

MNIST handwritten digit data set: consisting of 70000 handwritten digital images, each data sample being a 28 x 28 pixel grayscale image. The present invention reconstructs each image into a 784-dimensional vector.

The fast-MNIST dataset: consisting of 70000 garment images, each data sample is a 28 x 28 pixel grayscale image. The present invention reconstructs each image into a 784-dimensional vector.

Table 1 details of the data set

Data set	Number of samples	Sample dimension	Number of categories
				MNIST	70000	784	10
Fashion-MNIST	70000	784	10

The evaluation criteria of the present invention are clustering Accuracy (ACC), Adjusted Land Index (ARI) and Normalized Mutual Information (NMI).

To verify the performance of the present invention, a depth multi-view clustering method (MAE + K-means) was chosen for comparison.

The results of the ACC, ARI and NMI performance of the methods proposed by the present invention and comparative methods on MNIST and fashin-MNIST datasets are shown in tables 2 and 3.

Table 2 comparison of results on MNIST dataset for each experiment

Experiments	ACC	ARI	NMI
				MAE+K-means	0.99985	0.99967	0.99951
The invention	0.9999	0.9999	0.9998

From tables 2 and 3, it can be observed that the method proposed by the present invention outperforms the comparative baseline method in all three evaluation indices ACC, ARI and NMI of the MNIST and Fashion-MNIST datasets, demonstrating the effectiveness of the present invention. Specifically, compared with the MAE + K-means method, the method has the advantages that the Bernoulli unit and the online reward strategy are adopted, correct decision behaviors are fed back through interaction of input image data and clustering prototypes, full utilization of multi-view complementary information and outstanding consideration of clustering environment are achieved, and clustering performance is improved.

TABLE 3 comparison of results on the Fashinon-MNIST dataset for each experiment

Experiments	ACC	ARI	NMI
				MAE+K-means	0.49443	0.36653	0.54398
The invention	0.5646	0.4393	0.5754

Claims

1. A multi-view intensified image clustering method is characterized by comprising the following steps:

the feature extraction network is formed by stacking n self-encoder networks, and each self-encoder network is formed by an encoding layer and a decoding layer of a symmetrical network structure; in the training process, the last layer of the coding layer is used as a hidden layer, the error between input and reconstruction is minimized, and low-dimensional feature representation of the latent structure of the data is obtained;

suppose that

J (th) input image data of a v (th) view, when n is 1, the input from the encoder is

The calculation process is as follows:

wherein,

a potential feature representation output for the hidden layer,

reconstructing data for the self-encoder;

and

When n is greater than 1, the input of the first self-encoder is

The calculation process is as shown in (1) and (2), and the input of the rest self-encoders is the hidden layer characteristics of the (n-1) th encoding layer

The calculation process is as follows:

wherein, like the first self-encoder,

a potential feature representation output for the hidden layer,

is a reconstructed feature representation of the self-encoder;

and

parameters of an encoding layer and a decoding layer; the training of the self-encoder employs reconstruction loss

Finally, splitting the n self-encoders of the v-th view into n encoding layers

And n decoding layers

And rearranged in a symmetrical manner

Constructing a feature extraction network of the perspective,

the part is an encoder, and the part is an encoder,

the part is a decoder; then, training the feature extraction network by a minimum reconstruction loss and random gradient descent algorithm;

step 2, pre-training a multi-view feature fusion network to obtain fusion feature representation;

training the feature extraction network corresponding to each visual angle in the step 1 to generate an input sample x^jIs potentially characteristic of the respective view of H_vAnd according to each other of a plurality of viewing anglesAnd (3) performing series splicing on complementary characteristics:

H_f＝cat(H₁...H_v) (5)

where V is the number of viewing angles, H₁,…,H_vFor the potential feature representation of each view, cat (-) represents the stitching operation, H_fIs a fused feature representation;

the multi-view feature fusion network is formed by stacking n self-encoder networks; during the training process, the network represents the fusion characteristics H_fAs input, the calculation process of the outermost self-encoder is as follows:

wherein,

Dimension (d);

for the reconstructed representation of the outer decoding layer output, the characteristic dimension is equal to H_fDimension (d); when the network consists of only one self-encoder, Ho_fIs that

is a model parameter;

Pre-training a network; then, the decoder input dimension of the feature extraction network of each view is set as the spliced potential feature H_fOf the input data of

Step 3, initializing a clustering environment and distributing Bernoulli units;

generating multi-view fusion characteristics of J multi-view image data by using the multi-view characteristic fusion network trained in the step 2

And randomly selecting K points from the K points as an initial clustering centroid set

Updating a clustering centroid set C on the fusion characteristic representation H by adopting a K-means algorithm to obtain K optimized clustering centroids;

clustering centroid set optimized by using K-means algorithm

Initializing a clustering environment as a virtual prototype, constructing a corresponding Bernoulli unit (w, p, dist, F), and storing information of the current clustering environment, wherein w is the weight of the virtual prototype, p is an indicator variable of the virtual prototype, dist is the distance from a current data point to the virtual prototype, and F is a state parameter; under the initial condition, w is set as the centroid weight, p and di, updated by the K-means algorithmst, F are set to 0;

step 4, multi-view angle enhanced clustering;

step 4-1, calculating state parameters between multi-view fusion features of input image data and virtual prototypes, measuring indication degrees of the input image data to the virtual prototypes in the current clustering environment, and selecting adjacent virtual prototypes as reinforcement objects;

firstly, randomly selecting a feature vector H from a fusion feature set H of an input image_jCalculating h by using Euclidean distance_jAnd virtual prototype c_kA distance d between_k＝Dist(h_j,c_k) (ii) a Secondly, calculating a state coefficient between the two through a Sigmoid function, wherein the calculation process is as follows:

after the state coefficients are obtained, measuring the indication degree of the potential features of the multi-view image to the virtual prototype by using a cost function; the calculation formula is as follows:

p_k＝J(d_k)＝2×(1-F(d_k)) (12)

wherein p is_kFor indicating variables, representing the input image and the virtual prototype c_kThe degree of inter-indication; ideally, p is_kThe value is large, namely the smaller the distance between the fusion feature and the clustering prototype is, the higher the similarity between the fusion feature and the clustering prototype is, and the stronger the indicativity of the input image to the virtual prototype is; otherwise, the weaker the result is; selecting a virtual prototype with strong indicativity as a strengthening object;

at the same time, a random seed p is set_sIndicating variable p with selected virtual prototype_kComparing to obtain a calibration variable y, and correcting the negative influence of the invalid virtual prototype on the whole clustering result; the calculation process is as follows:

step 4-2, executing an online reward strategy, and distributing a reward signal r to the adjacent virtual prototype;

Wherein when calibrating the variable y_j*If 1, the selected enhanced object is an effective object, the virtual prototype corresponding to the unit is close to the input image and conforms to the ideal condition, a forward decision should be applied, i.e. a reward factor is assigned to the unit, and then

Otherwise, the selected reinforced object is an invalid object, and the virtual prototype corresponding to the unit is far away from the input image and violates the ideal condition, and a reverse decision should be applied, namely a penalty factor is distributed to the unit;

updating the clustering environment through the step 4-1 and the step 4-2, and iteratively converging to obtain an optimal virtual prototype as a clustering center;

step 5, updating parameters and optimizing clustering results;

updating the weight parameters of the selected reinforced object by adopting a strategy gradient algorithm, wherein the formula is shown as the following formula;

wherein α represents the learning rate, which should be greater than 0; r is a reward and punishment factor obtained in the multi-view angle reinforced clustering step, b_j,kA reinforcement baseline; g_j＝g_j(y_j；w_j,h_j) As a function of probability density, the value is fused by the multi-view of the input image_jAnd weighted by w_jIs selected fromReinforcing the influence of the calibration variables of the object in the current clustering environment;

the method is used for measuring the feature transformation degree in the strategy gradient updating process and is along with the probability density function g_jThe value of (a) changes;

during iterative optimization, virtual prototypes

And (4) updating through a formula (16), and finishing the multi-view intensified image clustering task when the clustering result reaches the preset training times.

2. The multi-view enhanced image clustering method according to claim 1, wherein the K-means clustering algorithm in step 3 is based on that the smaller the distance, the greater the similarity; dividing the data points into K clusters according to the standard that the distance is larger and the similarity is smaller; and calculating the distance between the data points by adopting the Euclidean distance, wherein the calculation formula is as follows:

wherein h is_iAnd h_jA multi-view fusion feature representing two different data points, Dist (-) representing the distance between the two;

in the dividing process of the clusters, the average value of the features in the same cluster needs to be recalculated every iteration and is used as the centroid C ═ C_k|k＝1,2,...,K}，The calculation formula is as follows:

wherein, c_kRepresenting the centroid of the kth cluster, h_iFeatures representing data points located in the cluster;

Calculating the distance between each data point and each clustering centroid by adopting a formula (8), allocating each data point to the clustering centroid with the closest distance, updating the clustering centroids to be convergent by a heuristic iteration method, and obtaining K optimized clustering centroids serving as a data point clustering centroid set