CN114519086A

CN114519086A - Incremental interactive clustering visualization method and system for credible cloud data sharing

Info

Publication number: CN114519086A
Application number: CN202210145820.4A
Authority: CN
Inventors: 金福生; 韩华旭; 黄罡; 陈朔鹰; 张舒汇
Original assignee: Peking University Shenzhen Graduate School; Beijing Institute of Technology BIT
Current assignee: Peking University Shenzhen Graduate School; Beijing Institute of Technology BIT
Priority date: 2022-02-17
Filing date: 2022-02-17
Publication date: 2022-05-20

Abstract

The invention discloses an incremental interactive clustering visualization method and system for credible cloud data sharing, wherein the method comprises the following steps: compiling a data sharing intelligent contract and operating the intelligent contract; extracting target data of the data set according to the written data sharing intelligent contract; clustering the extracted target data by adopting a clustering algorithm, and outputting a clustering result; performing multi-dimensional scale dimension reduction on the clustering result to realize projection in a two-dimensional space, and performing visual display; and performing corresponding projection interaction, cluster analysis and visualization operation according to the change requirement of the user on the target data. On the basis of block chain data credible sharing, incremental interactive analysis and mining are carried out on data, data safety is effectively guaranteed, data processing efficiency is improved, and meanwhile a user can conveniently and visually analyze and mine the data.

Description

Incremental interactive clustering visualization method and system for credible cloud data sharing

Technical Field

The invention relates to the technical field of system engineering, in particular to an incremental interactive clustering visualization method and system for trusted sharing of cloud data.

Background

Data visualization is one of important ways of data application, and data is visually presented in a chart or graph mode after being converted, so that people can better understand and analyze hidden information in the data. Clustering is a common mining method for data analysis, and is commonly used for mining of multivariate relational data, sequence and anomaly detection and the like. Currently, many data analysis and mining platforms provide a clustering algorithm to perform data analysis and mining and perform visual display of results. In most cases, the general process is to collect all the data to be analyzed from each system, then process all the data simultaneously in a batch processing manner, and finally show the clustering result. The above processing flow and manner have many problems in the data analysis and mining process. Firstly, in the aspect of data processing, all data needs to be gathered and then the subsequent steps are carried out, and the data gathering time is long. Particularly, in a data-distributed sharing system, in each data processing process, it is necessary to wait for the aggregation of all data and then perform subsequent operations. Meanwhile, in the process of collecting and processing all the original data, the risk of data leakage exists, and the data safety can not be well protected. Secondly, in the above-mentioned process, the whole process of data analysis by using clustering is a staticized process, and after setting parameters each time, the algorithm runs to obtain a result and shows the result to the user, so that the user cannot adjust the clustering parameters more intuitively.

Therefore, on the basis of the existing data analysis and mining technology, how to provide an incremental interactive clustering visualization method and system for trusted sharing of cloud data to greatly improve the processing efficiency of data, guarantee the safety and the credibility of the data, and perform visual operation on the data processing process becomes a problem to be solved by technical personnel in the field.

Disclosure of Invention

In view of the above problems, the present invention provides an incremental interactive clustering visualization method and system for trusted cloud data sharing, which at least solve some of the above technical problems, so as to effectively ensure data security, improve data processing efficiency, and facilitate users to perform analysis and mining of data more intuitively.

The embodiment of the invention provides an incremental interactive clustering visualization method for credible cloud data sharing, which is characterized by comprising the following steps of:

s1, compiling a data sharing intelligent contract and operating the intelligent contract;

s2, extracting target data of the data set according to the written data sharing intelligent contract;

s3, clustering the extracted target data by adopting a clustering algorithm, and outputting a clustering result;

s4, carrying out multi-dimensional scale dimensionality reduction on the clustering result, realizing the projection of the clustering result in a two-dimensional space, and carrying out visual display;

And S5, performing corresponding projection interaction, cluster analysis and visualization operation according to the change requirement of the user on the target data.

Further, in step S4, performing dimensionality reduction on the clustering result by using a multidimensional scale, includes:

s41, calculating a distance matrix according to the clustering result and the set low-dimensional space dimension;

s42, calculating an inner product matrix according to the distance matrix;

s43, performing eigenvalue decomposition on the inner product matrix, calculating the first n maximum eigenvalues and eigenvectors thereof, and generating a diagonal matrix and an eigenvector matrix which are formed by the first n maximum eigenvalues;

and S44, calculating a matrix after dimensionality reduction according to the diagonal matrix and the eigenvector matrix formed by the first n maximum eigenvalues, and outputting the low-dimensional representation of the clustering result.

Further, the step S5 includes:

s51, if the user changes the attribute of the target data, constructing forward projection, and displaying the result after data attribute transformation on a visual interface to realize forward projection interaction;

s52, if the user drags and drops the visual data points to the target data, constructing a backward projection, calculating the attribute of the data points after adjustment, and realizing backward projection interaction;

And S53, if the user adds the incremental data on the basis of the target data, performing cluster analysis and visualization operation on the incremental data.

Further, the S51 includes:

s511, acquiring data points with changed attributes and the mass center of each cluster in the current clustering model;

s512, respectively calculating Euclidean distances from the data points to the mass center of each cluster;

s513, acquiring the shortest distance in the Euclidean distances, and classifying the data points into the cluster corresponding to the shortest distance;

and S514, projecting the data points in a scatter diagram, and carrying out color coding by using the same color of the corresponding clusters to realize forward projection interaction.

Further, in S511, an average method is used to obtain the centroid of each cluster in the current clustering model.

Further, the S52 includes:

s521, acquiring low-dimensional data of a projection point after dragging and dropping the visual data point;

s522, calculating a change vector from the original projection point to the new projection point according to the low-dimensional data;

s523, calculating delta x by adopting a PCA dimension reduction method:

Δx[e₀ e₁]＝Δy

wherein, Δ x is a characteristic change vector of the original data point; [ e ] a₀ e₁]A feature vector matrix of the original data points; Δ y is a position change vector;

And S524, solving the delta x optimal solution according to a regularized least square method to obtain the attribute of the data point after adjustment, and realizing back projection interaction.

Further, the S53 includes:

s531, acquiring data points of the newly added incremental data, reading the data points into a processing system, and setting a K-L divergence threshold;

s532, calculating the K-L divergence of the data set after the data points are increased and the original data set;

and S533, comparing the K-L divergence with a K-L divergence threshold, and respectively adopting a sample external expansion mode and a re-clustering mode to realize cluster analysis and visualization operation on the incremental data.

Further, in S533, the cluster analysis and visualization operation of the incremental data is implemented in an ex-sample expansion manner, including: when the K-L divergence is smaller than a K-L divergence threshold value, dividing the clusters of the newly added data points according to the distance from the points to the cluster center, and updating the cluster center; and projecting the newly added data points into a scatter diagram by adopting a sample external expansion mode.

Further, in S533, a re-clustering manner is adopted to implement cluster analysis and visualization operation of the incremental data, including: when the K-L divergence is larger than a K-L divergence threshold value, re-clustering the data set, finding the optimal overlapping position of the target data dimension reduction result, and projecting; and carrying out color coding on the projection points subjected to dimension reduction according to the clustering result.

The embodiment of the invention also provides an incremental interactive clustering visualization system for trusted cloud data sharing, which comprises:

the compiling and running module is used for compiling the data sharing intelligent contract and running the intelligent contract;

the extraction module is used for extracting target data of the data set according to the written data sharing intelligent contract;

the clustering module is used for clustering the extracted target data by adopting a clustering algorithm and outputting a clustering result;

the visualization module is used for carrying out multi-dimensional scale dimension reduction on the clustering result, realizing the projection of the clustering result in a two-dimensional space and carrying out visual display;

and performing corresponding projection interaction, cluster analysis and visualization operation according to the change requirement of the user on the target data.

The technical scheme provided by the embodiment of the invention has the beneficial effects that at least:

the embodiment of the invention provides an incremental interactive clustering visualization method for trusted sharing of cloud data, which comprises the following steps: compiling a data sharing intelligent contract and operating the intelligent contract; extracting target data of the data set according to the written data sharing intelligent contract; clustering the extracted target data by adopting a clustering algorithm, and outputting a clustering result; performing multi-dimensional scale dimension reduction on the clustering result to realize projection in a two-dimensional space, and performing visual display; and performing corresponding projection interaction, cluster analysis and visualization operation according to the change requirement of the user on the target data. On the basis of block chain data credible sharing, incremental interactive analysis and mining are carried out on data, data safety is effectively guaranteed, data processing efficiency is improved, and meanwhile a user can conveniently and visually analyze and mine the data.

Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by the practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.

The technical solution of the present invention is further described in detail by the accompanying drawings and embodiments.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention. In the drawings:

fig. 1 is a flowchart of an incremental interactive clustering visualization method for trusted sharing of cloud data according to an embodiment of the present invention.

Detailed Description

Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

The embodiment of the invention provides an incremental interactive clustering visualization method for trusted sharing of cloud data, which is shown by reference to fig. 1 and comprises the following steps:

According to the embodiment, incremental interactive analysis and mining are performed on the data on the basis of block chain data trusted sharing, so that the data are analyzed and mined more intuitively by a user while the data security is effectively guaranteed and the data processing efficiency is improved.

The following specifically describes each of the above steps:

specifically, steps S1 and S2 are executed to compile the data sharing intelligent contract; and extracting target data of the data set according to the written data sharing intelligent contract. The block chain technology is adopted, so that the trusted sharing of data can be effectively realized, the distributed data sharing is realized, and the safety and the credibility of the shared data are guaranteed.

Further, in step S3, a clustering algorithm is used to cluster the extracted target data, and a clustering result is output. The incremental clustering analysis mining method is adopted, so that the data processing efficiency is greatly improved, and the response time of data analysis mining is shortened. The method specifically comprises the following steps:

s31, selecting a clustering algorithm;

s32, setting clustering parameters;

and S33, clustering the extracted target data and outputting a clustering result X.

For example:

1. selecting a k-means algorithm;

2. setting input of k-means, including a data set and a k value;

3. clustering is performed by using an algorithm, and the data set is divided into k classes.

Further, step S4 is to perform multidimensional dimension reduction on the clustering result, so as to realize projection (projection into low-dimensional data) of the target data (which is high-dimensional data) clustering result in a two-dimensional space, and perform visual display. And the visual display is carried out on the interface, so that the clustering parameters can be adjusted more visually. The method specifically comprises the following steps:

s41, calculating a distance matrix D according to the clustering result X and the set low-dimensional space dimension n, wherein the element D of the ith row and j column_ijIs the distance of the sample;

s42, calculating an inner product matrix B according to the distance matrix D and the principle that the distances before and after dimension reduction are the same as much as possible;

S43, carrying out eigenvalue decomposition on the inner product matrix B, calculating the first n maximum eigenvalues and corresponding eigenvectors thereof, and generating a diagonal matrix U consisting of the first n maximum eigenvalues_nAnd eigenvector matrix V_n；

S44, according to the diagonal matrix U_nAnd eigenvector matrix V_nAnd calculating a matrix Z after dimensionality reduction, and outputting a low-dimensional representation of a clustering result, namely the low-dimensional representation of initial target data.

Further, step S5 is to perform corresponding projection interaction, cluster analysis and visualization operation according to the requirement of the user for changing the target data. The visualization is that after the multi-dimension is reduced into two dimensions, the two dimensions are points which generate the abscissa and the ordinate, and the points are directly displayed on the graph. And when the user adds new data and changes the data set of the target data, performing corresponding operation on the visually displayed interface. The method specifically comprises the following steps:

s51, if the user changes the data attribute, constructing a forward projection, and displaying the result after the data attribute transformation on a visual interface to realize forward projection interaction;

s52, if the user drags and drops the visual data points, constructing a backward projection, calculating the property of the data points after adjustment, and realizing backward projection interaction;

And S53, if the incremental data are added by the user, performing cluster analysis and visualization operation on the incremental data.

Specifically, step S51 includes:

s511, the user interacts, the attribute value of the data point is changed, the data point x after the attribute is changed is obtained, and the centroid m of k clusters in the current clustering model is obtained by adopting an average value method_j，0<j<k；

S512, respectively calculating Euclidean distances l from the data points x to the mass center of k clusters_j；

S513, obtaining the shortest distance min l in the Euclidean distance_jClassifying the data point x into a cluster corresponding to the shortest distance;

and S514, projecting the data point x in a scatter diagram, and carrying out color coding by using the same color of the corresponding cluster to realize forward projection interaction.

Step S52 includes:

s521, dragging a node to a new position in a projection space by a user on a front-end interface to acquire low-dimensional data y of the projection point;

s522, calculating a change vector delta y from the original projection point to the new projection point according to the low-dimensional data y;

s523, calculating delta x by adopting a PCA dimension reduction method:

Δx[e₀ e₁]＝Δy

wherein, Δ x is a characteristic change vector of the original data point; [ e ] a₀ e₁]Is a raw numberA feature vector matrix of the base points; Δ y is a position change vector;

and S524, solving the delta x optimal solution according to a regularized least square method to obtain the attribute of the data points after adjustment, and realizing back projection interaction.

Wherein, in step S522, the original projection point refers to a point before drag and drop; the new projection point refers to the point after drag and drop; i.e. a point is moved to another place by drag and drop. The original data point in step S523 refers to a point before scrub.

Furthermore, in order to ensure accurate and rapid response of incremental data change in the visualization process, two different modes are adopted, and when the influence of the newly added incremental data on the current clustering result is small, a sample external expansion method is adopted for projection of the newly added data. And when the influence of the newly added incremental data on the current clustering result is large, visually constructing by adopting a re-clustering mode, and finding the optimal overlap with the projection position of the data before re-clustering by utilizing Procrustes transformation. The specific step S53 includes:

s531, acquiring a data point x of the newly added incremental data, reading the data point x (from a database and the like) into a processing system, and setting a K-L divergence threshold value K;

s532, calculating the K-L divergence of the data set with the added data points and the original data set (referring to the data set in the step S2);

and S533, comparing the K-L divergence with a K-L divergence threshold value K, and respectively adopting a sample external expansion mode and a re-clustering mode to realize cluster analysis and visualization operation of incremental data. The method specifically comprises the following steps:

When the K-L divergence is smaller than a K-L divergence threshold value K, dividing clusters of the newly added data points according to the distance from the points to the cluster center, and updating the cluster center; projecting the newly added data point x into a scatter diagram by adopting a sample external expansion mode;

when the K-L divergence is larger than a K-L divergence threshold value K, re-clustering the data set, applying Procrustes geometric transformation to find the optimal overlapping position with the last target data dimension reduction result, and projecting; and carrying out color coding on the projection points subjected to dimension reduction according to the clustering result.

The above steps are illustrated below by a specific application example:

the method comprises the following steps: clustering data in different systems and performing interactive operation;

the process is as follows:

1. compiling a contract;

2. extracting data;

3. clustering is carried out;

4. reducing the dimension to two dimensions;

5. displaying to a visual interface;

6. establishing forward interaction from data to an interface, namely: manually changing the attribute of the data, and then displaying the attribute on an interface;

7. establishing backward interaction from the interface to the data, namely: dragging and dropping the points of the interface, and then obtaining the result after the attribute is changed;

8. establishing a visualization scheme of the newly added data, namely: how to combine the original scheme after adding one data.

The incremental interactive clustering visualization method for trusted cloud data sharing further optimizes the clustering data analysis and mining process, and optimizes the clustering data analysis and mining process in three aspects: firstly, the data is converted into incremental calculation from the calculation of full data, and clustering analysis is carried out according to the time sequence of data arrival while the data operation result is not changed; secondly, in the data sharing process, the centralized sharing of the original data is avoided, a distributed strategy is adopted, the data is transformed by using the modes of encryption, dimension reduction and the like, and the transformed data is shared on the basis of not influencing the data processing; and thirdly, the data of the processing result is not only displayed, but the analysis and the processing are directly carried out on the interface, and the processing result is displayed, so that the data value is more intuitively mined. Therefore, the data processing efficiency can be effectively improved; the problem of data leakage caused by centralized data sharing and processing is avoided; and interactive visual clustering data analysis and mining are realized.

On the other hand, the embodiment of the invention also provides an incremental interactive clustering visualization system for trusted cloud data sharing, which is suitable for the above incremental interactive clustering visualization method for trusted cloud data sharing, and comprises the following steps:

the visualization module is used for carrying out multi-dimensional scale dimension reduction on the clustering result, realizing the projection of the clustering result in a two-dimensional space and carrying out visualization display;

The cloud data trusted sharing incremental interactive clustering visualization system is suitable for the cloud data trusted sharing incremental interactive clustering visualization method, so that the implementation of the system can refer to the implementation of the method, and repeated parts are not repeated.

The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. For the system disclosed by the embodiment, as the system is suitable for the method disclosed by the embodiment, the description is simple, and the relevant points can be referred to the description of the method part.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims

1. The cloud data trusted sharing incremental interactive clustering visualization method is characterized by comprising the following steps:

2. The incremental interactive cluster visualization method for trusted sharing of cloud data according to claim 1, wherein in the step S4, performing dimensionality reduction on the clustering result by using a multidimensional scale includes:

s42, calculating an inner product matrix according to the distance matrix;

3. The cloud-based data trusted sharing incremental interactive cluster visualization method according to claim 1, wherein the step S5 includes:

4. The incremental interactive cluster visualization method for trusted sharing of cloud data according to claim 3, wherein said S51 comprises:

s511, acquiring data points after the attribute is changed and the mass center of each cluster in the current clustering model;

s513, acquiring the shortest distance in the Euclidean distances, and classifying the data points into a cluster corresponding to the shortest distance;

and S514, projecting the data points in a scatter diagram, and carrying out color coding by using the same color of the corresponding cluster to realize forward projection interaction.

5. The cloud data trusted sharing incremental interactive clustering visualization method of claim 4, wherein in S511, a mean method is adopted to obtain a centroid of each cluster in a current clustering model.

6. The cloud-based data trusted sharing incremental interactive cluster visualization method of claim 3, wherein said S52 comprises:

S523, calculating delta x by adopting a PCA dimension reduction method:

Δx[e₀ e₁]＝Δy

wherein, Δ x is a characteristic variation vector of an original data point; [ e ]₀ e₁]A feature vector matrix of the original data points; Δ y is a position change vector;

7. The cloud-based data trusted sharing incremental interactive cluster visualization method of claim 3, wherein said S53 comprises:

8. The cloud data trusted sharing incremental interactive cluster visualization method of claim 7, wherein in S533, a sample-out extension manner is adopted to implement cluster analysis and visualization operations of incremental data, including: when the K-L divergence is smaller than a K-L divergence threshold value, dividing the clusters of the newly added data points according to the distance from the points to the cluster center, and updating the cluster center; and projecting the newly added data points into a scatter diagram by adopting a sample external expansion mode.

9. The cloud-based data trusted sharing incremental interactive cluster visualization method of claim 7, wherein in S533, clustering analysis and visualization operation of the incremental data are implemented by means of re-clustering, and the method includes: when the K-L divergence is larger than a K-L divergence threshold, re-clustering the data set, finding the optimal overlapping position of the target data dimension reduction result, and projecting; and carrying out color coding on the projection points after dimension reduction according to the clustering result.

10. Cloud data credible shared incremental interactive clustering visualization system is characterized by comprising: