CN107451238B - Visual analysis method and system for exploring inherent low-dimensional structure of high-dimensional data - Google Patents

Visual analysis method and system for exploring inherent low-dimensional structure of high-dimensional data Download PDF

Info

Publication number
CN107451238B
CN107451238B CN201710619472.9A CN201710619472A CN107451238B CN 107451238 B CN107451238 B CN 107451238B CN 201710619472 A CN201710619472 A CN 201710619472A CN 107451238 B CN107451238 B CN 107451238B
Authority
CN
China
Prior art keywords
data
dimensional
view
point
constructing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201710619472.9A
Other languages
Chinese (zh)
Other versions
CN107451238A (en
Inventor
夏佳志
李强
叶奋进
王建新
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Central South University
Original Assignee
Central South University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Central South University filed Critical Central South University
Priority to CN201710619472.9A priority Critical patent/CN107451238B/en
Publication of CN107451238A publication Critical patent/CN107451238A/en
Application granted granted Critical
Publication of CN107451238B publication Critical patent/CN107451238B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/248Presentation of query results

Abstract

The invention discloses a visual analysis method of an internal low-dimensional structure of high-dimensional data, which comprises the following steps of S1, selecting data to be analyzed and adjusting analysis parameters; constructing a t-SNE view; constructing a two-dimensional projection view of high-dimensional data; constructing a true dimension histogram view; constructing a point rolling stone graph view; constructing a rock rolling graph view of a low-dimensional structure; and constructing a structure list view. The invention also provides a system for realizing the visual analysis method of the internal low-dimensional structure of the high-dimensional data. According to the method and the device, the high-dimensional data are projected, and the corresponding view is established, so that a group of global and local feature descriptions can be constructed for the potential low-dimensional structure, and therefore a user can be helped to construct a group of global and local feature descriptions for the potential low-dimensional structure and explore the potential low-dimensional structure.

Description

Visual analysis method and system for exploring inherent low-dimensional structure of high-dimensional data
Technical Field
The invention particularly relates to a visual analysis method and a visual analysis system for exploring an internal low-dimensional structure of high-dimensional data.
Background
In the present society, various data flood all aspects of people's life, and analysis and processing of large-scale data are becoming more and more important in the field of scientific research. The high structural complexity of the dimensionality of the data presents certain difficulties in the analysis and processing of the data. How to effectively find out the characteristic information of the high-dimensional data is a basic problem in the fields of information science and statistical science, and is also a main challenge faced by high-dimensional data analysis. The first step to address this challenge is to perform efficient dimensionality reduction on the high-dimensional data. The dimensionality reduction means that data in a high-dimensional space is projected into a low-dimensional space through linear or nonlinear mapping, and a low-dimensional structure which is meaningful in high-dimensional observation data and can reveal the essence of the data is found. The method can reduce the dimension disaster problem of the high-dimensional data and promote the classification, compression and visualization of the high-dimensional data.
For the problem of data dimension reduction, the conventional method is to assume that data has low-dimensional linear distribution, and the representative methods are Principal Component Analysis (PCA) and linear discriminant analysis (L DA), which have formed a complete theoretical system and also show good behavior in application, but because of the non-linear relationship between the representation dimension and the essential feature dimension of real data, the manifold learning methods proposed in recent years by ST rowei and JB tennbaum have gradually become the research focus problem of data feature extraction methods.
Disclosure of Invention
One of the objectives of the present invention is to provide a visual analysis method for exploring an intrinsic low-dimensional structure of high-dimensional data, which is capable of constructing a set of global and local feature descriptions for the intrinsic low-dimensional structure in the high-dimensional data to thereby explore the intrinsic low-dimensional structure.
The second objective of the present invention is to provide a system for implementing a visual analysis method for exploring an intrinsic low-dimensional structure in high-dimensional data.
The invention provides a visual analysis method for exploring the inherent low-dimensional structure of high-dimensional data, which comprises the following steps:
s1, selecting data to be analyzed, and adjusting related parameters according to the selected data, wherein the parameters comprise the number of neighbor points, a threshold parameter α and the size of a point in each scatter diagram;
s2, constructing a t-SNE view;
s3, constructing a two-dimensional projection view of the high-dimensional data, and projecting the high-dimensional data to a two-dimensional plane;
s4, constructing a true dimension bar chart view for assisting in analyzing essential dimensions of data;
s5, constructing a point rock graph view for assisting in analyzing essential dimensions of data;
s6, constructing a rock rolling graph view of a low-dimensional structure, and using the rock rolling graph view to assist in analyzing the essential dimension of a data whole;
and S7, constructing a structure list view for assisting in analyzing and generating a report.
And S2, constructing the t-SNE view, specifically, using a t-SNE algorithm to project the data to a two-dimensional plane in a dimensionality reduction manner, and when the kNN changes at a later stage, correspondingly changing the t-SNE view.
The constructing of the two-dimensional projection view of the high-dimensional data in step S3 is specifically to construct the two-dimensional projection view by adopting the following steps:
A. constructing a two-dimensional projection of high-dimensional data;
B. projecting the local tangent space distance between any two points obtained in the step A to a two-dimensional space;
C. k proximity distance L is calculated using the following equationp
Lp=disp/disn
Dis in the formulapIs the distance of the point p to its local tangent space, disnIs the average distance of point p to its k nearest neighbors.
The step A of constructing the two-dimensional projection of the high-dimensional data specifically comprises the following steps of:
establishing data point correlation measurement based on hierarchical geodesic distance;
establishing local tangent space divergence measurement;
and III, establishing local tangent space divergence-hierarchical geodesic distance projection according to the measurement established in the step I and the step II, and finishing the two-dimensional projection of the high-dimensional data.
Step I, establishing a data point correlation metric based on a hierarchical geodesic distance, specifically, establishing a correlation metric by adopting the following steps:
a. constructing an sNN graph having several connected components on a data set;
b. and a plurality of connected subgraphs are obtained based on the sNN graph obtained in the step a, and a plurality of geodesic distance matrixes are obtained according to the obtained connected subgraphs, so that the data point correlation measurement based on the hierarchical geodesic distance is obtained.
Step II, establishing the local tangent space divergence measurement, specifically, establishing the local tangent space divergence measurement by adopting the following steps:
1) acquiring a neighbor matrix X, wherein the matrix X is k rows and d columns, and each row represents one of k neighbors in a point p;
2) performing singular value decomposition on the neighbor matrix X obtained in the step 1) to obtain X ═ U ∑ VTAnd the values on the opposite corners of the diagonal matrix ∑ are sorted in descending order to obtain { sigma12,...,σi,...,σn};
3) D is calculated according to the following equationpThe value of (c):
Figure BDA0001361339820000041
wherein α is a threshold parameter, typically 0.9;
4) taking the front d of the matrix V obtained in the step 2)pRow, i.e. the local tangent space S of the point pp
5) The local tangent spatial divergence div (S) is calculated using the following equationp,Sq):
Figure BDA0001361339820000042
In the formula cos theta(i)Defined as singular values τi(ii) a Setting the local tangent space of point p as U, the local tangent space of point q as V, and comparing UTThe singular value obtained by the singular value decomposition of V is taui
Step III, establishing the local tangent space divergence-hierarchical geodesic distance projection, specifically adopting the following steps to establish the local tangent space divergence-hierarchical geodesic distance projection:
(1) taking each connected subgraph obtained in the step I as a point, and calculating the shortest distance between every two subgraphs;
(2) projecting to a y axis by using an MDS method;
(3) determining the range of each connected subgraph in the y axis according to the maximum distance in each connected subgraph and the shortest distance between each connected subgraph and the nearest connected subgraph;
(4) on each connected subgraph, projecting each point on the connected subgraph to a corresponding position on a y axis by using an MDS method;
(5) and (4) according to the local tangent space divergence between any two points obtained in the step (II), mapping data points in the space to an x axis by using an MDS algorithm to complete local tangent space divergence-hierarchical geodesic distance projection.
The step S4 of constructing the true dimension histogram view specifically includes the following steps:
(A) the variables X and d are calculated as followsp
X=U∑VT
Figure BDA0001361339820000051
Wherein X is a neighbor matrix of k rows and d columns of the point p, each row representing one data point in k neighbors in the point p, and performing singular value decomposition on X to obtain U, ∑ and VTWherein ∑ is a diagonal matrix and the values on the diagonals are arranged in descending order as { σ }12,...,σi,...,σnα is a threshold parameter, generally 0.9, dpAn intrinsic dimension estimated for the local tangent space of point p;
(B) will dpAnd drawing by using a bar chart, namely constructing and finishing the true dimension histogram view.
The constructing of the point rock rolling graph view in step S5 specifically comprises the following steps:
(a) for each data point p, a matrix { σ } for the point p is obtained12,...,σi,...,σn};
(b) Aiming at each eigenvalue in (a), calculating the maximum value of the ith eigenvalue in all data to obtain a matrix
Figure BDA0001361339820000052
(c) Computing
Figure BDA0001361339820000053
α are the threshold parameters mentioned in the parameter adjustment in step S1;
(d) will be provided with
Figure BDA0001361339820000054
Is plotted using a parallel graph, wherein the first n issIs 3 times the remaining distance between the shafts, and will be amaxAnd (5) cutting off the axis of 0 without drawing to finish the true dimension bar chart view.
The step S6 of constructing the rock graph view of the low-dimensional structure specifically includes the following steps:
constructing an sNN graph with several connected components on the basis of a high-dimensional dataset;
obtaining a plurality of connected subgraphs according to the sNN graph obtained in the step i, and obtaining the geodesic distance between any two points in each connected subgraph through a shortest path algorithm so as to form a geodesic distance matrix G;
calculating the matrix B using the following equation
Figure BDA0001361339820000061
In the formula gijThe element of the ith row and the jth column corresponding to the geodesic distance matrix G;
iv, singular value decomposition is carried out on the matrix B
B=U1∧V1 T
In which ^ is the diagonal matrix and the corresponding value on the diagonal is { λ12,...,λi,...,λnObtaining a group of singular values under each connected subgraph linear mode;
drawing the characteristic value obtained in the step iv in a parallel coordinate graph;
using an MDS algorithm to obtain a characteristic value of the geodesic distance matrix G obtained in the step ii, and drawing a parallel coordinate graph in a nonlinear mode; thereby obtaining a view of the stone rolling graph with a low-dimensional structure.
The invention also provides a system for realizing the visual analysis method for exploring the internal low-dimensional structure of the high-dimensional data, which comprises a data selection module, a t-SNE view construction module, a two-dimensional projection view construction module of the high-dimensional data, a true dimension histogram view construction module, a point rolling stone map view construction module, a low-dimensional structure rolling stone map view construction module and a structure list view construction module; the data selection module is used for selecting data to be analyzed and adjusting analysis parameters; the t-SNE view construction module is used for constructing a t-SNE view; the two-dimensional projection view construction module of the high-dimensional data is used for constructing a two-dimensional projection view of the high-dimensional data; the true dimension histogram view construction module is used for constructing a true dimension histogram view and assisting in analyzing the essential dimension of data; the point rolling stone graph view construction module is used for constructing a point rolling stone graph view and assisting in analyzing the essential dimensionality of data; the low-dimensional structure rock rolling graph view construction module is used for constructing a low-dimensional structure rock rolling graph view and assisting in analyzing the essential dimension of the whole data; and the structure list view building module is used for building the structure list view and assisting in analyzing and generating the report.
According to the visual analysis method and the visual analysis system for exploring the inherent low-dimensional structure of the high-dimensional data, provided by the invention, a group of global and local feature descriptions can be constructed for the inherent low-dimensional structure by projecting the high-dimensional data and establishing a corresponding view, so that a user can be helped to construct a group of global and local feature descriptions for the inherent low-dimensional structure and explore the inherent low-dimensional structure.
Drawings
FIG. 1 is a process flow diagram of the process of the present invention.
FIG. 2 is a functional block diagram of the system of the present invention.
Detailed Description
FIG. 1 shows a flow chart of the method of the present invention: the invention provides a visual analysis method of the internal low-dimensional structure of high-dimensional data, which comprises the following steps:
s1, selecting data to be analyzed, and adjusting related parameters according to the selected data, wherein the parameters comprise the number of neighbor points, a threshold parameter α and the size of a point in each scatter diagram;
s2, projecting the data to a two-dimensional plane in a dimensionality reduction mode by using a t-SNE algorithm to construct a t-SNE view;
s3, constructing a two-dimensional projection view of the high-dimensional data, and projecting the high-dimensional data to a two-dimensional plane; specifically, the two-dimensional projection view is constructed by adopting the following steps:
A. constructing a two-dimensional projection of high-dimensional data; specifically, the method comprises the following steps of:
establishing data point correlation measurement based on hierarchical geodesic distance; specifically, the correlation measurement is established by adopting the following steps:
a. sNN plots of components with several connected components on the dataset;
b. a plurality of connected subgraphs are obtained based on the sNN graph obtained in the step a, and a plurality of geodesic distance matrixes are obtained according to the obtained connected subgraphs, so that data point correlation measurement based on hierarchical geodesic distances is obtained;
establishing local tangent space divergence measurement; specifically, the method comprises the following steps of:
1) acquiring a neighbor matrix X, wherein the matrix X is k rows and d columns, and each row represents one of k neighbors in a point p;
2) performing singular value decomposition on the neighbor matrix X obtained in the step 1) to obtain X ═ U ∑ VTAnd the values on the opposite corners of the diagonal matrix ∑ are sorted in descending order to obtain { sigma12,...,σi,...,σn};
3) D is calculated according to the following equationpThe value of (c):
Figure BDA0001361339820000081
wherein α is a threshold parameter, typically 0.9;
4) taking the front d of the matrix V obtained in the step 2)pRow, i.e. the local tangent space S of the point pp
5) The local tangent spatial divergence div (S) is calculated using the following equationp,Sq):
Figure BDA0001361339820000082
In the formula cos theta(i)Defined as singular values τi(ii) a Setting the local tangent space of point p as U, the local tangent space of point q as V, and comparing UTThe singular value obtained by the singular value decomposition of V is taui
According to the measurement established in the step I and the step II, establishing local tangent space divergence-hierarchical geodesic distance projection to complete the two-dimensional projection of high-dimensional data; specifically, the method comprises the following steps of establishing local tangent space divergence-layered geodesic distance projection:
(1) taking each connected subgraph obtained in the step I as a point, and calculating the shortest distance between every two subgraphs;
(2) projecting to a y axis by using an MDS method;
(3) determining the range of each connected subgraph in the y axis according to the maximum distance in each connected subgraph and the shortest distance between each connected subgraph and the nearest connected subgraph;
(4) on each connected subgraph, projecting each point on the connected subgraph to a corresponding position on a y axis by using an MDS method;
(5) according to the local tangent space divergence between any two points obtained in the step II, mapping data points in the space to an x axis by using an MDS algorithm to complete local tangent space divergence-hierarchical geodesic distance projection;
B. projecting the local tangent space distance between any two points obtained in the step A to a two-dimensional space;
C. k proximity distance L is calculated using the following equationp
Lp=disp/disn
Dis in the formulapIs the distance of the point p to its local tangent space, disnThe average distance from the point p to its k nearest neighbor;
s4, constructing a true dimension bar chart view for assisting in analyzing essential dimensions of data; specifically, the view is constructed by adopting the following steps:
(A) the variables X and d are calculated as followsp
X=U∑VT
Figure BDA0001361339820000091
Wherein X is a neighbor matrix of k rows and d columns of the point p, each row representing one data point in k neighbors in the point p, and performing singular value decomposition on X to obtain U, ∑ and VTWherein ∑ is a diagonal matrix and the values on the diagonals are arranged in descending order as { σ }12,...,σi,...,σnα is a threshold parameter, generally 0.9, dpAn intrinsic dimension estimated for the local tangent space of point p;
(B) will dpDrawing by using a bar chart, namely constructing and finishing a true dimension bar chart view;
s5, constructing a point rock graph view for assisting in analyzing essential dimensions of data; specifically, the view is constructed by adopting the following steps:
(a) for each data point p, a matrix { σ } for the point p is obtained12,...,σi,...,σn};
(b) Aiming at each eigenvalue in (a), calculating the maximum value of the ith eigenvalue in all data to obtain a matrix
Figure BDA0001361339820000101
(c) Computing
Figure BDA0001361339820000102
α are the threshold parameters mentioned in the parameter adjustment in step S1;
(d) will be provided with
Figure BDA0001361339820000103
Is plotted using a parallel graph, wherein the first n issIs 3 times the remaining distance between the shafts, and will be amaxCutting off the axis of 0 without drawing to finish the true dimension bar chart view;
s6, constructing a rock rolling graph view of a low-dimensional structure, and using the rock rolling graph view to assist in analyzing the essential dimension of a data whole; specifically, the view is constructed by adopting the following steps:
constructing an sNN graph with several connected components on the basis of a high-dimensional dataset;
obtaining a plurality of connected subgraphs according to the sNN graph obtained in the step i, and obtaining the geodesic distance between any two points in each connected subgraph through a shortest path algorithm so as to form a geodesic distance matrix G;
calculating the matrix B using the following equation
Figure BDA0001361339820000104
In the formula gijThe element of the ith row and the jth column corresponding to the geodesic distance matrix G;
iv, singular value decomposition is carried out on the matrix B
B=U1∧V1 T
In which ^ is the diagonal matrix and the corresponding value on the diagonal is { λ12,...,λi,...,λnObtaining each connected subgraph linear modeA set of singular values of;
drawing the characteristic value obtained in the step iv in a parallel coordinate graph;
using an MDS algorithm to obtain a characteristic value of the geodesic distance matrix G obtained in the step ii, and drawing a parallel coordinate graph in a nonlinear mode; thereby obtaining a rock pattern view of a low-dimensional structure;
and S7, constructing a structure list view for assisting in analyzing and generating a report.
FIG. 2 shows a functional block diagram of the system of the present invention: the system for realizing the visual analysis method of the internal low-dimensional structure of the high-dimensional data comprises a data selection module, a t-SNE view construction module, a two-dimensional projection view construction module of the high-dimensional data, a true dimension histogram view construction module, a point rolling stone map view construction module, a low-dimensional structure rolling stone map view construction module and a structure list view construction module; the data selection module is used for selecting data to be analyzed and adjusting analysis parameters; the t-SNE view construction module is used for constructing a t-SNE view; the two-dimensional projection view construction module of the high-dimensional data is used for constructing a two-dimensional projection view of the high-dimensional data; the true dimension histogram view construction module is used for constructing a true dimension histogram view and assisting in analyzing the essential dimension of data; the point rolling stone graph view construction module is used for constructing a point rolling stone graph view and assisting in analyzing the essential dimensionality of data; the low-dimensional structure rock rolling graph view construction module is used for constructing a low-dimensional structure rock rolling graph view and assisting in analyzing the essential dimension of the whole data; and the structure list view building module is used for building the structure list view and assisting in analyzing and generating the report.

Claims (9)

1. A visual analysis method for exploring intrinsic low-dimensional structures of high-dimensional data, comprising the steps of:
s1, selecting data to be analyzed, and adjusting related parameters according to the selected data, wherein the parameters comprise the number of neighbor points, a threshold parameter α and the size of a point in each scatter diagram;
s2, constructing a t-SNE view;
s3, constructing a two-dimensional projection view of the high-dimensional data, and projecting the high-dimensional data to a two-dimensional plane; specifically, the two-dimensional projection view is constructed by adopting the following steps:
A. constructing a two-dimensional projection of high-dimensional data;
B. projecting the local tangent space distance between any two points obtained in the step A to a two-dimensional space;
C. k proximity distance L is calculated using the following equationp
Lp=disp/disn
Dis in the formulapIs the distance of the point p to its local tangent space, disnThe average distance from the point p to its k nearest neighbor;
s4, constructing a true dimension bar chart view for assisting in analyzing essential dimensions of data;
s5, constructing a point rock graph view for assisting in analyzing essential dimensions of data;
s6, constructing a rock rolling graph view of a low-dimensional structure, and using the rock rolling graph view to assist in analyzing the essential dimension of a data whole;
and S7, constructing a structure list view for assisting in analyzing and generating a report.
2. The method according to claim 1, wherein the step a of constructing the two-dimensional projection of the high-dimensional data comprises the following steps:
establishing data point correlation measurement based on hierarchical geodesic distance;
establishing local tangent space divergence measurement;
and III, establishing local tangent space divergence-hierarchical geodesic distance projection according to the measurement established in the step I and the step II, and finishing the two-dimensional projection of the high-dimensional data.
3. The method of claim 2, wherein the step i of establishing the data point relevance metric based on the hierarchical geodesic distance comprises the steps of:
a. constructing an sNN graph having several connected components on a data set;
b. and a plurality of connected subgraphs are obtained based on the sNN graph obtained in the step a, and a plurality of geodesic distance matrixes are obtained according to the obtained connected subgraphs, so that the data point correlation measurement based on the hierarchical geodesic distance is obtained.
4. The method according to claim 3, wherein the step II of establishing the local tangential spatial divergence measure comprises the following steps:
1) acquiring a neighbor matrix X, wherein the matrix X is k rows and d columns, and each row represents one of k neighbors in a point p;
2) performing singular value decomposition on the neighbor matrix X obtained in the step 1) to obtain X ═ U ∑ VTAnd the values on the opposite corners of the diagonal matrix ∑ are sorted in descending order to obtain { sigma12,...,σi,...,σn};
3) D is calculated according to the following equationpThe value of (c):
Figure FDA0002525490300000021
wherein α is a threshold parameter;
4) taking the front d of the matrix V obtained in the step 2)pRow, i.e. the local tangent space S of the point pp
5) The local tangent spatial divergence div (S) is calculated using the following equationp,Sq):
Figure FDA0002525490300000022
In the formula cos theta(i)Defined as singular values τi(ii) a Setting the local tangent space of point p as U, the local tangent space of point q as V, and comparing UTThe singular value obtained by the singular value decomposition of V is taui
5. The method according to claim 4, wherein the step III of creating the local tangential spatial divergence-geodetic distance projection comprises the steps of:
(1) taking each connected subgraph obtained in the step I as a point, and calculating the shortest distance between every two subgraphs;
(2) projecting to a y axis by using an MDS method;
(3) determining the range of each connected subgraph in the y axis according to the maximum distance in each connected subgraph and the shortest distance between each connected subgraph and the nearest connected subgraph;
(4) on each connected subgraph, projecting each point on the connected subgraph to a corresponding position on a y axis by using an MDS method;
(5) and (4) according to the local tangent space divergence between any two points obtained in the step (II), mapping data points in the space to an x axis by using an MDS algorithm to complete local tangent space divergence-hierarchical geodesic distance projection.
6. The method according to claim 5, wherein the step S4 is performed to construct the intrinsic dimension histogram, specifically by the following steps:
(A) the variables X and d are calculated as followsp
X=U∑VT
Figure FDA0002525490300000031
Wherein X is a neighbor matrix of k rows and d columns of the point p, each row representing one data point in k neighbors in the point p, and performing singular value decomposition on X to obtain U, ∑ and VTWherein ∑ is a diagonal matrix and the values on the diagonals are arranged in descending order as { σ }12,...,σi,...,σnD, α is a threshold parameterpAn intrinsic dimension estimated for the local tangent space of point p;
(B) will dpAnd drawing by using a bar chart, namely constructing and finishing the true dimension histogram view.
7. The method according to claim 6, wherein said step S5 is performed to construct a point rock map view by the following steps:
(a) for each data point p, a matrix { σ } for the point p is obtained12,...,σi,...,σn};
(b) Aiming at each eigenvalue in (a), calculating the maximum value of the ith eigenvalue in all data to obtain a matrix
Figure FDA0002525490300000041
(c) Computing
Figure FDA0002525490300000042
α are the threshold parameters mentioned in the parameter adjustment in step S1;
(d) will be provided with
Figure FDA0002525490300000043
The characteristic value of the point rolling stone graph is drawn by a parallel coordinate graph, and the point rolling stone graph view is completed.
8. The method for visual analysis of intrinsic low-dimensional structures for exploring high-dimensional data according to claim 7, wherein the step of constructing the rock diagram view of the low-dimensional structure in step S6 is specifically implemented by the steps of:
constructing an sNN graph with several connected components on the basis of a high-dimensional dataset;
obtaining a plurality of connected subgraphs according to the sNN graph obtained in the step i, and obtaining the geodesic distance between any two points in each connected subgraph through a shortest path algorithm so as to form a geodesic distance matrix G;
calculating the matrix B using the following equation
Figure FDA0002525490300000044
In the formula gijThe element of the ith row and the jth column corresponding to the geodesic distance matrix G;
iv, singular value decomposition is carried out on the matrix B
B=U1∧V1 T
In which ^ is the diagonal matrix and the corresponding value on the diagonal is { λ12,...,λi,...,λnObtaining a group of singular values under each connected subgraph linear mode;
drawing the characteristic value obtained in the step iv in a parallel coordinate graph;
using an MDS algorithm to obtain a characteristic value of the geodesic distance matrix G obtained in the step ii, and drawing a parallel coordinate graph in a nonlinear mode; thereby obtaining a view of the stone rolling graph with a low-dimensional structure.
9. A system for realizing the visual analysis method for exploring the intrinsic low-dimensional structure of the high-dimensional data according to any one of claims 1 to 8, which is characterized by comprising a data selection module, a t-SNE view construction module, a two-dimensional projection view construction module of the high-dimensional data, a true-dimension histogram view construction module, a point rolling stone map view construction module, a low-dimensional structure rolling stone map view construction module and a structure list view construction module; the data selection module is used for selecting data to be analyzed and adjusting analysis parameters; the t-SNE view construction module is used for constructing a t-SNE view; the two-dimensional projection view construction module of the high-dimensional data is used for constructing a two-dimensional projection view of the high-dimensional data; the true dimension histogram view construction module is used for constructing a true dimension histogram view and assisting in analyzing the essential dimension of data; the point rolling stone graph view construction module is used for constructing a point rolling stone graph view and assisting in analyzing the essential dimensionality of data; the low-dimensional structure rock rolling graph view construction module is used for constructing a low-dimensional structure rock rolling graph view and assisting in analyzing the essential dimension of the whole data; and the structure list view building module is used for building the structure list view and assisting in analyzing and generating the report.
CN201710619472.9A 2017-07-26 2017-07-26 Visual analysis method and system for exploring inherent low-dimensional structure of high-dimensional data Expired - Fee Related CN107451238B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710619472.9A CN107451238B (en) 2017-07-26 2017-07-26 Visual analysis method and system for exploring inherent low-dimensional structure of high-dimensional data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710619472.9A CN107451238B (en) 2017-07-26 2017-07-26 Visual analysis method and system for exploring inherent low-dimensional structure of high-dimensional data

Publications (2)

Publication Number Publication Date
CN107451238A CN107451238A (en) 2017-12-08
CN107451238B true CN107451238B (en) 2020-08-04

Family

ID=60489021

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710619472.9A Expired - Fee Related CN107451238B (en) 2017-07-26 2017-07-26 Visual analysis method and system for exploring inherent low-dimensional structure of high-dimensional data

Country Status (1)

Country Link
CN (1) CN107451238B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113095427B (en) * 2021-04-23 2022-09-13 中南大学 High-dimensional data analysis method and face data analysis method based on user guidance

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6675103B1 (en) * 2000-03-22 2004-01-06 Tripos, Inc. Visualizing high dimensional descriptors of molecular structures
CN102682089A (en) * 2012-04-24 2012-09-19 浙江工业大学 Method for data dimensionality reduction by identifying random neighbourhood embedding analyses
CN106096640A (en) * 2016-05-31 2016-11-09 合肥工业大学 A kind of feature dimension reduction method of multi-mode system
CN106203516A (en) * 2016-07-13 2016-12-07 中南大学 A kind of subspace clustering visual analysis method based on dimension dependency

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7756907B2 (en) * 2003-09-16 2010-07-13 The Board Of Trustees Of The Leland Stanford Jr. University Computer systems and methods for visualizing data

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6675103B1 (en) * 2000-03-22 2004-01-06 Tripos, Inc. Visualizing high dimensional descriptors of molecular structures
CN102682089A (en) * 2012-04-24 2012-09-19 浙江工业大学 Method for data dimensionality reduction by identifying random neighbourhood embedding analyses
CN106096640A (en) * 2016-05-31 2016-11-09 合肥工业大学 A kind of feature dimension reduction method of multi-mode system
CN106203516A (en) * 2016-07-13 2016-12-07 中南大学 A kind of subspace clustering visual analysis method based on dimension dependency

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
DimStiller: Workflows for dimensional analysis and reduction;Tamara Munzner 等;《2010 IEEE Symposium on Visual Analytics Science and Technology》;20101026;第3-10页 *
一种基于子空间聚类的局部相关性可视分析方法;夏佳志 等;《计算机辅助设计与图形学学报》;20161130;第28卷(第11期);第1857-1858页 *

Also Published As

Publication number Publication date
CN107451238A (en) 2017-12-08

Similar Documents

Publication Publication Date Title
Guo et al. Anchors bring ease: An embarrassingly simple approach to partial multi-view clustering
Liu et al. Enhancing low-rank subspace clustering by manifold regularization
Cherian et al. Jensen-bregman logdet divergence with application to efficient similarity search for covariance matrices
Zhang et al. Detecting densely distributed graph patterns for fine-grained image categorization
CN106845341B (en) Unlicensed vehicle identification method based on virtual number plate
Li et al. Context-aware hypergraph construction for robust spectral clustering
Marinoni et al. Unsupervised data driven feature extraction by means of mutual information maximization
CN105160352A (en) High-dimensional data subspace clustering projection effect optimization method based on dimension reconstitution
CN112529068B (en) Multi-view image classification method, system, computer equipment and storage medium
Zhen et al. Kernel truncated regression representation for robust subspace clustering
CN108764276A (en) A kind of robust weights multi-characters clusterl method automatically
Belahcene et al. Local descriptors and tensor local preserving projection in face recognition
Bai et al. A graph kernel from the depth-based representation
Jouili et al. Graph matching based on node signatures
CN111401429A (en) Multi-view image clustering method based on clustering self-adaptive canonical correlation analysis
CN107451238B (en) Visual analysis method and system for exploring inherent low-dimensional structure of high-dimensional data
Geng et al. Local-density subspace distributed clustering for high-dimensional data
Wang et al. Comparison of dimensionality reduction techniques for multi-variable spatiotemporal flow fields
Xu et al. Topology-based clustering using polar self-organizing map
Hanczar et al. Precision-recall space to correct external indices for biclustering
Bronstein et al. A multigrid approach for multidimensional scaling
Zhang et al. Quantitative analysis of nonlinear embedding
CN111651501B (en) Spatial aggregation scale selection method for geographic big data
Santoso et al. Efficient K-nearest neighbor searches for multiple-face recognition in the classroom based on three levels DWT-PCA
Coppi et al. Fuzzy c-medoids clustering models for time-varying data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20200804

Termination date: 20210726

CF01 Termination of patent right due to non-payment of annual fee