CN107423763A - The two-dimensional projection's method and its optical projection system of high dimensional data - Google Patents
The two-dimensional projection's method and its optical projection system of high dimensional data Download PDFInfo
- Publication number
- CN107423763A CN107423763A CN201710619475.2A CN201710619475A CN107423763A CN 107423763 A CN107423763 A CN 107423763A CN 201710619475 A CN201710619475 A CN 201710619475A CN 107423763 A CN107423763 A CN 107423763A
- Authority
- CN
- China
- Prior art keywords
- mrow
- data
- msub
- projection
- msubsup
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/213—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/231—Hierarchical techniques, i.e. dividing or merging pattern sets so as to obtain a dendrogram
Landscapes
- Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a kind of two-dimensional projection's method of high dimensional data, including establish the data point relativity measurement based on geodesic distance;Establish Local Subspace difference metric;The projection of Local Subspace difference geodesic distance is established, completes the two-dimensional projection of high dimensional data.Present invention also offers the optical projection system for the two-dimensional projection's method for realizing the high dimensional data.The present invention maintains during to nonlinear data dimensionality reduction pents up structure in data, have the interactive operation of subspace Exploring Analysis and intuitive and convenient concurrently, user can be helped quickly to find the cluster structure of data when exploring and analyzing high dimensional data, and to subspace clustering and analysis for be reliable technical foundation, the redundancy of trial and error number and analysis result is significantly reduced in high dimensional data processing, improves the reliability of Data Mining.
Description
Technical field
Present invention relates particularly to the two-dimensional projection's method and its optical projection system of a kind of high dimensional data.
Background technology
High-dimensional is one of key character of big data.From the perspective of visual analysis, high dimensional data refers to dimension too
Height, so that being difficult to the data that significant related information is extracted from dimension collection.In terms of geometrical point, dimensionality reduction is considered as
Excavate the linearly or nonlinearly low dimensional manifold being nested in higher dimensional space.Subspace clustering analysis is a kind of effectively exploration and analysis
The method of high dimensional data.
The key of cluster analysis is to define a suitable measurement.Geodesic distance on K-NN figures is one appropriate
The measurement of similarity between assessment data point.Existing dimension reduction method is mainly the PCA for maintaining maximum variance information
((Principal Component Analysis)), maintain the MDS (multidimensional of data variance
Scaling the methods of) and maintaining the t-SNE of probability distribution of sample neighborhood.But all there is certain for these main methods
The defects of:
(1) PCA and MDS methods belong to global linear dimension reduction method, it is difficult to keep that may be present non-thread in high dimensional data
Property structure and subspace clustering structure.
(2) t-SNE belongs to Method of Nonlinear Dimensionality Reduction, and the computation complexity of t-SNE methods is higher, to extensive higher-dimension
Efficiency significantly reduces in the dimensionality reduction of data, and parameter logistic is higher according to sensitiveness.Meanwhile t-SNE methods are also difficult to detection
Space clustering structure.
Some existing automation spatial clustering methods are only limitted to that the process automation of crossing of subspace clustering, and this will be identified
The often generation of a little methods has High redundancy and the data result that can not accurately explain.Dual space detection method uses full dimension
Space length is as the distance metric between initialization data point, but it frequently results in and occurs substantial amounts of trial and error step during analyzing
Suddenly.
The content of the invention
An object of the present invention is that providing one kind can help user can be fast when exploring and analyzing high dimensional data
Speed finds two-dimensional projection's method of the high dimensional data of the subspace clustering structure of data..
The second object of the present invention is to provide a kind of optical projection system for the two-dimensional projection's method for realizing the high dimensional data.
Two-dimensional projection's method of this high dimensional data provided by the invention, comprises the following steps:
S1. for the high dimensional data for needing to project, the data point relativity measurement based on geodesic distance is established;
S2. Local Subspace difference metric is established according to the step S1 measurements established;
S3. the measurement established according to step S1 and S2, the projection of Local Subspace difference-geodesic distance is established, so as to by higher-dimension
Data carry out two-dimensional projection.
Data point relativity measurement of the foundation based on geodesic distance described in step S1, is specially established using following steps
Measurement:
A. S-NN of the structure with some connected components schemes on the basis of the high position data collection for needing to project;
B. each connected component being directed in step A, is attached to the connected component of any two independence;
C. the beeline between any two points is calculated, so as to obtain geodesic distance.
The connected component to any two independence described in step B is attached, and is specially connected in two connected components
Two closest data points.
Calculating beeline described in step C, is specially calculated using shortest path first.
Local Subspace difference metric is established described in step S2, is specially established and measured using following steps:
1) weight of each dimension is calculated using equation below:
ω is dimension weight matrix in formula, ωiRepresent the weight of i-th of dimension, σiRepresent data point in i-th of dimension
Variance, d are the quantity of dimension;
2) the cum rights distance in SNN figures between any two points is calculated using equation below:
dpq[W]=max { dpq[ωp],dpq[ωq]}
Whereindpq[W] is point p and point q cum rights distance matrix,
ωp=[ωp1,ωp2,...,ωpi,...,ωpd] represent p Local Subspace characteristic vector, ωq=[ωq1,ωq2,...,
ωqi,...,ωqd] represent q Local Subspace characteristic vector, diBetween the p Local Subspace in i-th dimension and point q
Euclidean distance, dpq[ωp] it is to put the cum rights distance relative to point p, dpq[ωq] it is cum rights distances of the point p relative to point q;
3) cosine similarity is based on, residual quantity matrix is established according to equation below:
In formulaFor point piAnd pjDifference value based on cosine similarity;I and j is all the numbering of data point, value
Scope for [0, n), n is data set size.
Local Subspace difference-geodesic distance projection is established described in step S3, specially using MDS algorithm by space
Data point is mapped to x-axis by Local Subspace difference metric, and y-axis is mapped to by the data point relativity measurement of geodesic distance.
Present invention also offers the optical projection system of the two-dimensional projection's method for realizing high dimensional data, including be sequentially connected in series
Data point relativity measurement based on geodesic distance establishes module, Local Subspace difference metric establishes module and Local Subspace
Module is established in difference-geodesic distance projection;Data point relativity measurement based on geodesic distance establishes module and is used to throw as needed
The data of shadow, the data point relativity measurement based on geodesic distance accordingly is established, and the measurement results of foundation are uploaded into part
Subspace difference metric establishes module;Local Subspace difference metric establishes module and is used to establish part according to the relativity measurement of foundation
Subspace difference metric, and the Local Subspace difference metric of foundation is uploaded into Local Subspace difference-geodesic distance projection and establishes module;
Local Subspace difference-geodesic distance projection establishes module for the data point relativity measurement based on geodesic distance according to foundation
Local Subspace difference-geodesic distance projection is established with Local Subspace difference metric, so as to which high dimensional data is carried out into two-dimensional projection.
The two-dimensional projection's method and its optical projection system of this high dimensional data provided by the invention, it is a kind of holding high dimensional data
The visual analysis method and system of structure are inside pented up, maintains in data and pents up during to nonlinear data dimensionality reduction
Structure, have the interactive operation of subspace Exploring Analysis and intuitive and convenient concurrently, user can be helped when exploring and analyzing high dimensional data
The cluster structure of data can be quickly found, and is reliable technical foundation for subspace clustering and analysis, in higher-dimension
Data processing significantly reduces the redundancy of trial and error number and analysis result, improves the reliability of Data Mining.
Brief description of the drawings
Fig. 1 is the method flow diagram of the inventive method.
Fig. 2 is the functional block diagram of present system.
Embodiment
It is the method flow diagram of the inventive method as shown in Figure 1:The two-dimensional projection of this high dimensional data provided by the invention
Method, comprise the following steps:
S1. for the high dimensional data for needing to project, the data point relativity measurement based on geodesic distance is established;Specially adopt
Established and measured with following steps:
A. S-NN of the structure with some connected components schemes on the basis of the high position data collection for needing to project, and schemes in SNN
In, a line just be present between them when and if only if point p, q is k neighbours;
B. each connected component being directed in step A, is attached to the connected component of any two independence, specially connects
Connect two data points closest in two connected components;
C. the beeline between any two points is calculated using shortest path first, so as to obtain geodesic distance;
S2. Local Subspace difference metric is established according to the step S1 measurements established;Specially use following steps degree of foundation
Amount:
1) weight of each dimension is calculated using equation below:
ω is dimension weight matrix in formula, ωiRepresent the weight of i-th of dimension, σiRepresent data point in i-th of dimension
Variance, d are the quantity of dimension;
2) the cum rights distance in SNN figures between any two points is calculated using equation below:
dpq[W]=max { dpq[ωp],dpq[ωq]}
Whereindpq[W] is point p and point q cum rights distance matrix,
ωp=[ωp1,ωp2,...,ωpi,...,ωpd] represent p Local Subspace characteristic vector, ωq=[ωq1,ωq2,...,
ωqi,...,ωqd] represent q Local Subspace characteristic vector, diBetween the p Local Subspace in i-th dimension and point q
Euclidean distance, dpq[ωp] it is to put the cum rights distance relative to point p, dpq[ωq] it is cum rights distances of the point p relative to point q;
3) cosine similarity is based on, residual quantity matrix is established according to equation below:
In formulaFor point piAnd pjDifference value based on cosine similarity;I and j is all the numbering of data point, value model
Enclose for [0, n), n is data set size.
In the specific implementation, a kind of iterative algorithm similar to COSA is employed to estimate the dimension weight of all subspaces
Vector, algorithm realize that step is as follows:
Step 1:Initialize the structure of KNN figures in two-dimensional space:Direction matrix W={ 1/d } is initialized, d is number of dimensions,
It is 1*e7 to initialize iterative value E;
Step 2:Work as E>During 1*e-3, following iterative step is carried out:
1) cum rights Distance matrix D [W] is updated:Cum rights distance matrix is dpq[W]=max { dpq[ωp],dpq[ωq]};
2) structure of renewal KNN figures is calculated:The structure of KNN figures is updated with the renewal of cum rights distance matrix;
3) KNN and dimension weight are updated:New direction matrix W ' is calculated on the basis of updated KNN figures;
4) new E values are calculated:Wij' it is WijThe last value of iteration, i.e. E are WijRenewal
Value;
S3. the measurement established according to step S1 and S2, passes through Local Subspace using MDS algorithm by the data point in space
Difference metric is mapped to x-axis, and y-axis is mapped to by the data point relativity measurement of geodesic distance, so as to establish Local Subspace it is poor-
Geodesic distance projects, so as to which high dimensional data is carried out into two-dimensional projection.
It is illustrated in figure 2 the functional block diagram of present system:The two of high dimensional data is realized present invention also offers described
The optical projection system of projecting method is tieed up, including the data point relativity measurement based on geodesic distance being sequentially connected in series establishes module, office
Portion subspace difference metric establishes module and module is established in Local Subspace difference-geodesic distance projection;Data based on geodesic distance
Point relativity measurement establishes the data that module is used to project as needed, and it is related to establish the data point based on geodesic distance accordingly
Property measurement, and the measurement results of foundation are uploaded into Local Subspace difference metric and establish module;Local Subspace difference metric establishes mould
Block is used to establish Local Subspace difference metric according to the relativity measurement of foundation, and the Local Subspace difference metric of foundation is uploaded
Module is established in Local Subspace difference-geodesic distance projection;The projection of Local Subspace difference-geodesic distance is established module and is used for according to building
Vertical data point relativity measurement and Local Subspace difference metric based on geodesic distance establish Local Subspace difference-geodesic distance
Projection, so as to which high dimensional data is carried out into two-dimensional projection.
Claims (7)
1. a kind of two-dimensional projection's method of high dimensional data, comprises the following steps:
S1. for the high dimensional data for needing to project, the data point relativity measurement based on geodesic distance is established;
S2. Local Subspace difference metric is established according to the step S1 measurements established;
S3. the measurement established according to step S1 and S2, the projection of Local Subspace difference-geodesic distance is established, so as to by high dimensional data
Carry out two-dimensional projection.
2. two-dimensional projection's method of high dimensional data according to claim 1, it is characterised in that establish base described in step S1
In the data point relativity measurement of geodesic distance, specially established and measured using following steps:
A. S-NN of the structure with some connected components schemes on the basis of the high position data collection for needing to project;
B. each connected component being directed in step A, is attached to the connected component of any two independence;
C. the beeline between any two points is calculated, so as to obtain geodesic distance.
3. two-dimensional projection's method of high dimensional data according to claim 2, it is characterised in that described in step B to any two
Individual independent connected component is attached, and specially connects two data points closest in two connected components.
4. two-dimensional projection's method of high dimensional data according to claim 3, it is characterised in that the calculating described in step C is most
Short distance, specially calculated using shortest path first.
5. two-dimensional projection's method of high dimensional data according to claim 4, it is characterised in that the foundation office described in step S2
Portion subspace difference metric, specially established and measured using following steps:
1) weight of each dimension is calculated using equation below:
<mrow>
<mi>&omega;</mi>
<mo>=</mo>
<mrow>
<mo>&lsqb;</mo>
<mrow>
<msub>
<mi>&omega;</mi>
<mn>1</mn>
</msub>
<mo>,</mo>
<msub>
<mi>&omega;</mi>
<mn>2</mn>
</msub>
<mo>,</mo>
<mo>...</mo>
<mo>,</mo>
<msub>
<mi>&omega;</mi>
<mi>i</mi>
</msub>
<mo>,</mo>
<mo>...</mo>
<mo>,</mo>
<msub>
<mi>&omega;</mi>
<mi>d</mi>
</msub>
</mrow>
<mo>&rsqb;</mo>
</mrow>
<mo>,</mo>
<msub>
<mi>&omega;</mi>
<mi>i</mi>
</msub>
<mo>=</mo>
<mfrac>
<mrow>
<mn>1</mn>
<mo>/</mo>
<msub>
<mi>&sigma;</mi>
<mi>i</mi>
</msub>
</mrow>
<mrow>
<msubsup>
<mi>&Sigma;</mi>
<mrow>
<mi>i</mi>
<mo>=</mo>
<mn>1</mn>
</mrow>
<mi>d</mi>
</msubsup>
<mn>1</mn>
<mo>/</mo>
<msub>
<mi>&sigma;</mi>
<mi>i</mi>
</msub>
</mrow>
</mfrac>
</mrow>
ω is dimension weight matrix in formula, ωiRepresent the weight of i-th of dimension, σiThe variance of data point in i-th of dimension is represented,
D is the quantity of dimension;
2) the cum rights distance in SNN figures between any two points is calculated using equation below:
dpq[W]=max { dpq[ωp],dpq[ωq]}
Whereindpq[W] is point p and point q cum rights distance matrix, ωp=
[ωp1,ωp2,...,ωpi,...,ωpd] represent p Local Subspace characteristic vector, ωq=[ωq1,ωq2,...,
ωqi,...,ωqd] represent q Local Subspace characteristic vector, diBetween the p Local Subspace in i-th dimension and point q
Euclidean distance, dpq[ωp] it is to put the cum rights distance relative to point p, dpq[ωq] it is cum rights distances of the point p relative to point q;
3) cosine similarity is based on, residual quantity matrix is established according to equation below:
<mrow>
<msubsup>
<mi>d</mi>
<mrow>
<mo>(</mo>
<msub>
<mi>p</mi>
<mi>i</mi>
</msub>
<mo>,</mo>
<msub>
<mi>p</mi>
<mi>j</mi>
</msub>
<mo>)</mo>
</mrow>
<mi>s</mi>
</msubsup>
<mo>=</mo>
<mn>1</mn>
<mo>-</mo>
<mfrac>
<mrow>
<msubsup>
<mi>&Sigma;</mi>
<mrow>
<mi>k</mi>
<mo>=</mo>
<mn>1</mn>
</mrow>
<mi>d</mi>
</msubsup>
<msub>
<mi>&omega;</mi>
<mrow>
<mi>i</mi>
<mi>k</mi>
</mrow>
</msub>
<mo>&CenterDot;</mo>
<msub>
<mi>&omega;</mi>
<mrow>
<mi>j</mi>
<mi>k</mi>
</mrow>
</msub>
</mrow>
<msqrt>
<mrow>
<msubsup>
<mi>&Sigma;</mi>
<mrow>
<mi>k</mi>
<mo>=</mo>
<mn>1</mn>
</mrow>
<mi>d</mi>
</msubsup>
<msubsup>
<mi>&omega;</mi>
<mrow>
<mi>i</mi>
<mi>k</mi>
</mrow>
<mn>2</mn>
</msubsup>
<mo>&CenterDot;</mo>
<msqrt>
<mrow>
<msubsup>
<mi>&Sigma;</mi>
<mrow>
<mi>k</mi>
<mo>=</mo>
<mn>1</mn>
</mrow>
<mi>d</mi>
</msubsup>
<msubsup>
<mi>&omega;</mi>
<mrow>
<mi>j</mi>
<mi>k</mi>
</mrow>
<mn>2</mn>
</msubsup>
</mrow>
</msqrt>
</mrow>
</msqrt>
</mfrac>
</mrow>
In formulaFor point piAnd pjDifference value based on cosine similarity;I and j is all the numbering of data point, and span is
[0, n), n is data set size.
6. two-dimensional projection's method of high dimensional data according to claim 5, it is characterised in that the foundation office described in step S3
Portion subspace difference-geodesic distance projection, specially passes through Local Subspace difference metric using MDS algorithm by the data point in space
X-axis is mapped to, y-axis is mapped to by the data point relativity measurement of geodesic distance.
7. a kind of optical projection system for the two-dimensional projection's method for realizing the high dimensional data described in one of claim 1~6, and feature is again
With the data point relativity measurement based on geodesic distance including being sequentially connected in series establishes module, Local Subspace difference metric establishes mould
Module is established in block and Local Subspace difference-geodesic distance projection;Data point relativity measurement based on geodesic distance establishes module
For the data projected as needed, the data point relativity measurement based on geodesic distance accordingly is established, and by the degree of foundation
Amount result uploads Local Subspace difference metric and establishes module;Local Subspace difference metric establishes module for the correlation according to foundation
Property measurement establish Local Subspace difference metric, and the Local Subspace difference metric of foundation is uploaded into Local Subspace difference-geodesic distance
Module is established from projection;Local Subspace difference-geodesic distance projection establishes what module was used for according to foundation based on geodesic distance
Data point relativity measurement and Local Subspace difference metric establish the projection of Local Subspace difference-geodesic distance, so as to by high dimension
According to progress two-dimensional projection.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710619475.2A CN107423763A (en) | 2017-07-26 | 2017-07-26 | The two-dimensional projection's method and its optical projection system of high dimensional data |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710619475.2A CN107423763A (en) | 2017-07-26 | 2017-07-26 | The two-dimensional projection's method and its optical projection system of high dimensional data |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107423763A true CN107423763A (en) | 2017-12-01 |
Family
ID=60431265
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710619475.2A Pending CN107423763A (en) | 2017-07-26 | 2017-07-26 | The two-dimensional projection's method and its optical projection system of high dimensional data |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107423763A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110188098A (en) * | 2019-04-26 | 2019-08-30 | 浙江大学 | A kind of high dimension vector data visualization method and system based on the double-deck anchor point figure projection optimization |
-
2017
- 2017-07-26 CN CN201710619475.2A patent/CN107423763A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110188098A (en) * | 2019-04-26 | 2019-08-30 | 浙江大学 | A kind of high dimension vector data visualization method and system based on the double-deck anchor point figure projection optimization |
CN110188098B (en) * | 2019-04-26 | 2021-02-19 | 浙江大学 | High-dimensional vector data visualization method and system based on double-layer anchor point map projection optimization |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Liu et al. | Lpd-net: 3d point cloud learning for large-scale place recognition and environment analysis | |
Subrahmonia et al. | Practical reliable Bayesian recognition of 2D and 3D objects using implicit polynomials and algebraic invariants | |
CN104199842A (en) | Similar image retrieval method based on local feature neighborhood information | |
Zhu et al. | Learning compact visual representation with canonical views for robust mobile landmark search | |
Yan et al. | Hpnet: Deep primitive segmentation using hybrid representations | |
CN104615676A (en) | Picture searching method based on maximum similarity matching | |
Park et al. | Recognition of partially occluded objects using probabilistic ARG (attributed relational graph)-based matching | |
Gonzalez-Diaz et al. | Neighborhood matching for image retrieval | |
CN105139031A (en) | Data processing method based on subspace clustering | |
CN115311730B (en) | Face key point detection method and system and electronic equipment | |
Xue et al. | A fast visual map building method using video stream for visual-based indoor localization | |
CN107368599A (en) | The visual analysis method and its analysis system of high dimensional data | |
Zhao et al. | CentroidReg: A global-to-local framework for partial point cloud registration | |
CN102902864B (en) | Fast solution to approximate minimum volume bounding box of three-dimensional object | |
Jiang et al. | Extracting 3-D structural lines of building from ALS point clouds using graph neural network embedded with corner information | |
CN107423763A (en) | The two-dimensional projection's method and its optical projection system of high dimensional data | |
Zhang et al. | Large-scale clustering with structured optimal bipartite graph | |
CN111241326B (en) | Image visual relationship indication positioning method based on attention pyramid graph network | |
Zhang et al. | Learning a probabilistic topology discovering model for scene categorization | |
Li et al. | A non-rigid 3D model retrieval method based on scale-invariant heat kernel signature features | |
CN112949576A (en) | Attitude estimation method, attitude estimation device, attitude estimation equipment and storage medium | |
CN115661255B (en) | Laser SLAM loop detection and correction method | |
Zhang et al. | Sparse and low-overlapping point cloud registration network for indoor building environments | |
Yang et al. | Mixed attention hourglass network for robust face alignment | |
Zhang et al. | Hierarchical Image Retrieval Method Based on Bag-of-Visual-Word and Eight-point Algorithm with Feature Clouds for Visual Indoor Positioning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20171201 |
|
RJ01 | Rejection of invention patent application after publication |