CN107423763A - The two-dimensional projection's method and its optical projection system of high dimensional data - Google Patents

The two-dimensional projection's method and its optical projection system of high dimensional data Download PDF

Info

Publication number
CN107423763A
CN107423763A CN201710619475.2A CN201710619475A CN107423763A CN 107423763 A CN107423763 A CN 107423763A CN 201710619475 A CN201710619475 A CN 201710619475A CN 107423763 A CN107423763 A CN 107423763A
Authority
CN
China
Prior art keywords
mrow
data
msub
projection
msubsup
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710619475.2A
Other languages
Chinese (zh)
Inventor
夏佳志
李强
奎晓燕
王建新
廖胜辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Central South University
Original Assignee
Central South University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Central South University filed Critical Central South University
Priority to CN201710619475.2A priority Critical patent/CN107423763A/en
Publication of CN107423763A publication Critical patent/CN107423763A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/231Hierarchical techniques, i.e. dividing or merging pattern sets so as to obtain a dendrogram

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a kind of two-dimensional projection's method of high dimensional data, including establish the data point relativity measurement based on geodesic distance;Establish Local Subspace difference metric;The projection of Local Subspace difference geodesic distance is established, completes the two-dimensional projection of high dimensional data.Present invention also offers the optical projection system for the two-dimensional projection's method for realizing the high dimensional data.The present invention maintains during to nonlinear data dimensionality reduction pents up structure in data, have the interactive operation of subspace Exploring Analysis and intuitive and convenient concurrently, user can be helped quickly to find the cluster structure of data when exploring and analyzing high dimensional data, and to subspace clustering and analysis for be reliable technical foundation, the redundancy of trial and error number and analysis result is significantly reduced in high dimensional data processing, improves the reliability of Data Mining.

Description

The two-dimensional projection's method and its optical projection system of high dimensional data
Technical field
Present invention relates particularly to the two-dimensional projection's method and its optical projection system of a kind of high dimensional data.
Background technology
High-dimensional is one of key character of big data.From the perspective of visual analysis, high dimensional data refers to dimension too Height, so that being difficult to the data that significant related information is extracted from dimension collection.In terms of geometrical point, dimensionality reduction is considered as Excavate the linearly or nonlinearly low dimensional manifold being nested in higher dimensional space.Subspace clustering analysis is a kind of effectively exploration and analysis The method of high dimensional data.
The key of cluster analysis is to define a suitable measurement.Geodesic distance on K-NN figures is one appropriate The measurement of similarity between assessment data point.Existing dimension reduction method is mainly the PCA for maintaining maximum variance information ((Principal Component Analysis)), maintain the MDS (multidimensional of data variance Scaling the methods of) and maintaining the t-SNE of probability distribution of sample neighborhood.But all there is certain for these main methods The defects of:
(1) PCA and MDS methods belong to global linear dimension reduction method, it is difficult to keep that may be present non-thread in high dimensional data Property structure and subspace clustering structure.
(2) t-SNE belongs to Method of Nonlinear Dimensionality Reduction, and the computation complexity of t-SNE methods is higher, to extensive higher-dimension Efficiency significantly reduces in the dimensionality reduction of data, and parameter logistic is higher according to sensitiveness.Meanwhile t-SNE methods are also difficult to detection Space clustering structure.
Some existing automation spatial clustering methods are only limitted to that the process automation of crossing of subspace clustering, and this will be identified The often generation of a little methods has High redundancy and the data result that can not accurately explain.Dual space detection method uses full dimension Space length is as the distance metric between initialization data point, but it frequently results in and occurs substantial amounts of trial and error step during analyzing Suddenly.
The content of the invention
An object of the present invention is that providing one kind can help user can be fast when exploring and analyzing high dimensional data Speed finds two-dimensional projection's method of the high dimensional data of the subspace clustering structure of data..
The second object of the present invention is to provide a kind of optical projection system for the two-dimensional projection's method for realizing the high dimensional data.
Two-dimensional projection's method of this high dimensional data provided by the invention, comprises the following steps:
S1. for the high dimensional data for needing to project, the data point relativity measurement based on geodesic distance is established;
S2. Local Subspace difference metric is established according to the step S1 measurements established;
S3. the measurement established according to step S1 and S2, the projection of Local Subspace difference-geodesic distance is established, so as to by higher-dimension Data carry out two-dimensional projection.
Data point relativity measurement of the foundation based on geodesic distance described in step S1, is specially established using following steps Measurement:
A. S-NN of the structure with some connected components schemes on the basis of the high position data collection for needing to project;
B. each connected component being directed in step A, is attached to the connected component of any two independence;
C. the beeline between any two points is calculated, so as to obtain geodesic distance.
The connected component to any two independence described in step B is attached, and is specially connected in two connected components Two closest data points.
Calculating beeline described in step C, is specially calculated using shortest path first.
Local Subspace difference metric is established described in step S2, is specially established and measured using following steps:
1) weight of each dimension is calculated using equation below:
ω is dimension weight matrix in formula, ωiRepresent the weight of i-th of dimension, σiRepresent data point in i-th of dimension Variance, d are the quantity of dimension;
2) the cum rights distance in SNN figures between any two points is calculated using equation below:
dpq[W]=max { dpqp],dpqq]}
Whereindpq[W] is point p and point q cum rights distance matrix, ωp=[ωp1p2,...,ωpi,...,ωpd] represent p Local Subspace characteristic vector, ωq=[ωq1q2,..., ωqi,...,ωqd] represent q Local Subspace characteristic vector, diBetween the p Local Subspace in i-th dimension and point q Euclidean distance, dpqp] it is to put the cum rights distance relative to point p, dpqq] it is cum rights distances of the point p relative to point q;
3) cosine similarity is based on, residual quantity matrix is established according to equation below:
In formulaFor point piAnd pjDifference value based on cosine similarity;I and j is all the numbering of data point, value Scope for [0, n), n is data set size.
Local Subspace difference-geodesic distance projection is established described in step S3, specially using MDS algorithm by space Data point is mapped to x-axis by Local Subspace difference metric, and y-axis is mapped to by the data point relativity measurement of geodesic distance.
Present invention also offers the optical projection system of the two-dimensional projection's method for realizing high dimensional data, including be sequentially connected in series Data point relativity measurement based on geodesic distance establishes module, Local Subspace difference metric establishes module and Local Subspace Module is established in difference-geodesic distance projection;Data point relativity measurement based on geodesic distance establishes module and is used to throw as needed The data of shadow, the data point relativity measurement based on geodesic distance accordingly is established, and the measurement results of foundation are uploaded into part Subspace difference metric establishes module;Local Subspace difference metric establishes module and is used to establish part according to the relativity measurement of foundation Subspace difference metric, and the Local Subspace difference metric of foundation is uploaded into Local Subspace difference-geodesic distance projection and establishes module; Local Subspace difference-geodesic distance projection establishes module for the data point relativity measurement based on geodesic distance according to foundation Local Subspace difference-geodesic distance projection is established with Local Subspace difference metric, so as to which high dimensional data is carried out into two-dimensional projection.
The two-dimensional projection's method and its optical projection system of this high dimensional data provided by the invention, it is a kind of holding high dimensional data The visual analysis method and system of structure are inside pented up, maintains in data and pents up during to nonlinear data dimensionality reduction Structure, have the interactive operation of subspace Exploring Analysis and intuitive and convenient concurrently, user can be helped when exploring and analyzing high dimensional data The cluster structure of data can be quickly found, and is reliable technical foundation for subspace clustering and analysis, in higher-dimension Data processing significantly reduces the redundancy of trial and error number and analysis result, improves the reliability of Data Mining.
Brief description of the drawings
Fig. 1 is the method flow diagram of the inventive method.
Fig. 2 is the functional block diagram of present system.
Embodiment
It is the method flow diagram of the inventive method as shown in Figure 1:The two-dimensional projection of this high dimensional data provided by the invention Method, comprise the following steps:
S1. for the high dimensional data for needing to project, the data point relativity measurement based on geodesic distance is established;Specially adopt Established and measured with following steps:
A. S-NN of the structure with some connected components schemes on the basis of the high position data collection for needing to project, and schemes in SNN In, a line just be present between them when and if only if point p, q is k neighbours;
B. each connected component being directed in step A, is attached to the connected component of any two independence, specially connects Connect two data points closest in two connected components;
C. the beeline between any two points is calculated using shortest path first, so as to obtain geodesic distance;
S2. Local Subspace difference metric is established according to the step S1 measurements established;Specially use following steps degree of foundation Amount:
1) weight of each dimension is calculated using equation below:
ω is dimension weight matrix in formula, ωiRepresent the weight of i-th of dimension, σiRepresent data point in i-th of dimension Variance, d are the quantity of dimension;
2) the cum rights distance in SNN figures between any two points is calculated using equation below:
dpq[W]=max { dpqp],dpqq]}
Whereindpq[W] is point p and point q cum rights distance matrix, ωp=[ωp1p2,...,ωpi,...,ωpd] represent p Local Subspace characteristic vector, ωq=[ωq1q2,..., ωqi,...,ωqd] represent q Local Subspace characteristic vector, diBetween the p Local Subspace in i-th dimension and point q Euclidean distance, dpqp] it is to put the cum rights distance relative to point p, dpqq] it is cum rights distances of the point p relative to point q;
3) cosine similarity is based on, residual quantity matrix is established according to equation below:
In formulaFor point piAnd pjDifference value based on cosine similarity;I and j is all the numbering of data point, value model Enclose for [0, n), n is data set size.
In the specific implementation, a kind of iterative algorithm similar to COSA is employed to estimate the dimension weight of all subspaces Vector, algorithm realize that step is as follows:
Step 1:Initialize the structure of KNN figures in two-dimensional space:Direction matrix W={ 1/d } is initialized, d is number of dimensions, It is 1*e7 to initialize iterative value E;
Step 2:Work as E>During 1*e-3, following iterative step is carried out:
1) cum rights Distance matrix D [W] is updated:Cum rights distance matrix is dpq[W]=max { dpqp],dpqq]};
2) structure of renewal KNN figures is calculated:The structure of KNN figures is updated with the renewal of cum rights distance matrix;
3) KNN and dimension weight are updated:New direction matrix W ' is calculated on the basis of updated KNN figures;
4) new E values are calculated:Wij' it is WijThe last value of iteration, i.e. E are WijRenewal Value;
S3. the measurement established according to step S1 and S2, passes through Local Subspace using MDS algorithm by the data point in space Difference metric is mapped to x-axis, and y-axis is mapped to by the data point relativity measurement of geodesic distance, so as to establish Local Subspace it is poor- Geodesic distance projects, so as to which high dimensional data is carried out into two-dimensional projection.
It is illustrated in figure 2 the functional block diagram of present system:The two of high dimensional data is realized present invention also offers described The optical projection system of projecting method is tieed up, including the data point relativity measurement based on geodesic distance being sequentially connected in series establishes module, office Portion subspace difference metric establishes module and module is established in Local Subspace difference-geodesic distance projection;Data based on geodesic distance Point relativity measurement establishes the data that module is used to project as needed, and it is related to establish the data point based on geodesic distance accordingly Property measurement, and the measurement results of foundation are uploaded into Local Subspace difference metric and establish module;Local Subspace difference metric establishes mould Block is used to establish Local Subspace difference metric according to the relativity measurement of foundation, and the Local Subspace difference metric of foundation is uploaded Module is established in Local Subspace difference-geodesic distance projection;The projection of Local Subspace difference-geodesic distance is established module and is used for according to building Vertical data point relativity measurement and Local Subspace difference metric based on geodesic distance establish Local Subspace difference-geodesic distance Projection, so as to which high dimensional data is carried out into two-dimensional projection.

Claims (7)

1. a kind of two-dimensional projection's method of high dimensional data, comprises the following steps:
S1. for the high dimensional data for needing to project, the data point relativity measurement based on geodesic distance is established;
S2. Local Subspace difference metric is established according to the step S1 measurements established;
S3. the measurement established according to step S1 and S2, the projection of Local Subspace difference-geodesic distance is established, so as to by high dimensional data Carry out two-dimensional projection.
2. two-dimensional projection's method of high dimensional data according to claim 1, it is characterised in that establish base described in step S1 In the data point relativity measurement of geodesic distance, specially established and measured using following steps:
A. S-NN of the structure with some connected components schemes on the basis of the high position data collection for needing to project;
B. each connected component being directed in step A, is attached to the connected component of any two independence;
C. the beeline between any two points is calculated, so as to obtain geodesic distance.
3. two-dimensional projection's method of high dimensional data according to claim 2, it is characterised in that described in step B to any two Individual independent connected component is attached, and specially connects two data points closest in two connected components.
4. two-dimensional projection's method of high dimensional data according to claim 3, it is characterised in that the calculating described in step C is most Short distance, specially calculated using shortest path first.
5. two-dimensional projection's method of high dimensional data according to claim 4, it is characterised in that the foundation office described in step S2 Portion subspace difference metric, specially established and measured using following steps:
1) weight of each dimension is calculated using equation below:
<mrow> <mi>&amp;omega;</mi> <mo>=</mo> <mrow> <mo>&amp;lsqb;</mo> <mrow> <msub> <mi>&amp;omega;</mi> <mn>1</mn> </msub> <mo>,</mo> <msub> <mi>&amp;omega;</mi> <mn>2</mn> </msub> <mo>,</mo> <mo>...</mo> <mo>,</mo> <msub> <mi>&amp;omega;</mi> <mi>i</mi> </msub> <mo>,</mo> <mo>...</mo> <mo>,</mo> <msub> <mi>&amp;omega;</mi> <mi>d</mi> </msub> </mrow> <mo>&amp;rsqb;</mo> </mrow> <mo>,</mo> <msub> <mi>&amp;omega;</mi> <mi>i</mi> </msub> <mo>=</mo> <mfrac> <mrow> <mn>1</mn> <mo>/</mo> <msub> <mi>&amp;sigma;</mi> <mi>i</mi> </msub> </mrow> <mrow> <msubsup> <mi>&amp;Sigma;</mi> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>d</mi> </msubsup> <mn>1</mn> <mo>/</mo> <msub> <mi>&amp;sigma;</mi> <mi>i</mi> </msub> </mrow> </mfrac> </mrow>
ω is dimension weight matrix in formula, ωiRepresent the weight of i-th of dimension, σiThe variance of data point in i-th of dimension is represented, D is the quantity of dimension;
2) the cum rights distance in SNN figures between any two points is calculated using equation below:
dpq[W]=max { dpqp],dpqq]}
Whereindpq[W] is point p and point q cum rights distance matrix, ωp= [ωp1p2,...,ωpi,...,ωpd] represent p Local Subspace characteristic vector, ωq=[ωq1q2,..., ωqi,...,ωqd] represent q Local Subspace characteristic vector, diBetween the p Local Subspace in i-th dimension and point q Euclidean distance, dpqp] it is to put the cum rights distance relative to point p, dpqq] it is cum rights distances of the point p relative to point q;
3) cosine similarity is based on, residual quantity matrix is established according to equation below:
<mrow> <msubsup> <mi>d</mi> <mrow> <mo>(</mo> <msub> <mi>p</mi> <mi>i</mi> </msub> <mo>,</mo> <msub> <mi>p</mi> <mi>j</mi> </msub> <mo>)</mo> </mrow> <mi>s</mi> </msubsup> <mo>=</mo> <mn>1</mn> <mo>-</mo> <mfrac> <mrow> <msubsup> <mi>&amp;Sigma;</mi> <mrow> <mi>k</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>d</mi> </msubsup> <msub> <mi>&amp;omega;</mi> <mrow> <mi>i</mi> <mi>k</mi> </mrow> </msub> <mo>&amp;CenterDot;</mo> <msub> <mi>&amp;omega;</mi> <mrow> <mi>j</mi> <mi>k</mi> </mrow> </msub> </mrow> <msqrt> <mrow> <msubsup> <mi>&amp;Sigma;</mi> <mrow> <mi>k</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>d</mi> </msubsup> <msubsup> <mi>&amp;omega;</mi> <mrow> <mi>i</mi> <mi>k</mi> </mrow> <mn>2</mn> </msubsup> <mo>&amp;CenterDot;</mo> <msqrt> <mrow> <msubsup> <mi>&amp;Sigma;</mi> <mrow> <mi>k</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>d</mi> </msubsup> <msubsup> <mi>&amp;omega;</mi> <mrow> <mi>j</mi> <mi>k</mi> </mrow> <mn>2</mn> </msubsup> </mrow> </msqrt> </mrow> </msqrt> </mfrac> </mrow>
In formulaFor point piAnd pjDifference value based on cosine similarity;I and j is all the numbering of data point, and span is [0, n), n is data set size.
6. two-dimensional projection's method of high dimensional data according to claim 5, it is characterised in that the foundation office described in step S3 Portion subspace difference-geodesic distance projection, specially passes through Local Subspace difference metric using MDS algorithm by the data point in space X-axis is mapped to, y-axis is mapped to by the data point relativity measurement of geodesic distance.
7. a kind of optical projection system for the two-dimensional projection's method for realizing the high dimensional data described in one of claim 1~6, and feature is again With the data point relativity measurement based on geodesic distance including being sequentially connected in series establishes module, Local Subspace difference metric establishes mould Module is established in block and Local Subspace difference-geodesic distance projection;Data point relativity measurement based on geodesic distance establishes module For the data projected as needed, the data point relativity measurement based on geodesic distance accordingly is established, and by the degree of foundation Amount result uploads Local Subspace difference metric and establishes module;Local Subspace difference metric establishes module for the correlation according to foundation Property measurement establish Local Subspace difference metric, and the Local Subspace difference metric of foundation is uploaded into Local Subspace difference-geodesic distance Module is established from projection;Local Subspace difference-geodesic distance projection establishes what module was used for according to foundation based on geodesic distance Data point relativity measurement and Local Subspace difference metric establish the projection of Local Subspace difference-geodesic distance, so as to by high dimension According to progress two-dimensional projection.
CN201710619475.2A 2017-07-26 2017-07-26 The two-dimensional projection's method and its optical projection system of high dimensional data Pending CN107423763A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710619475.2A CN107423763A (en) 2017-07-26 2017-07-26 The two-dimensional projection's method and its optical projection system of high dimensional data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710619475.2A CN107423763A (en) 2017-07-26 2017-07-26 The two-dimensional projection's method and its optical projection system of high dimensional data

Publications (1)

Publication Number Publication Date
CN107423763A true CN107423763A (en) 2017-12-01

Family

ID=60431265

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710619475.2A Pending CN107423763A (en) 2017-07-26 2017-07-26 The two-dimensional projection's method and its optical projection system of high dimensional data

Country Status (1)

Country Link
CN (1) CN107423763A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110188098A (en) * 2019-04-26 2019-08-30 浙江大学 A kind of high dimension vector data visualization method and system based on the double-deck anchor point figure projection optimization

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110188098A (en) * 2019-04-26 2019-08-30 浙江大学 A kind of high dimension vector data visualization method and system based on the double-deck anchor point figure projection optimization
CN110188098B (en) * 2019-04-26 2021-02-19 浙江大学 High-dimensional vector data visualization method and system based on double-layer anchor point map projection optimization

Similar Documents

Publication Publication Date Title
Liu et al. Lpd-net: 3d point cloud learning for large-scale place recognition and environment analysis
Subrahmonia et al. Practical reliable Bayesian recognition of 2D and 3D objects using implicit polynomials and algebraic invariants
CN104199842A (en) Similar image retrieval method based on local feature neighborhood information
Zhu et al. Learning compact visual representation with canonical views for robust mobile landmark search
Yan et al. Hpnet: Deep primitive segmentation using hybrid representations
CN104615676A (en) Picture searching method based on maximum similarity matching
Park et al. Recognition of partially occluded objects using probabilistic ARG (attributed relational graph)-based matching
Gonzalez-Diaz et al. Neighborhood matching for image retrieval
CN105139031A (en) Data processing method based on subspace clustering
CN115311730B (en) Face key point detection method and system and electronic equipment
Xue et al. A fast visual map building method using video stream for visual-based indoor localization
CN107368599A (en) The visual analysis method and its analysis system of high dimensional data
Zhao et al. CentroidReg: A global-to-local framework for partial point cloud registration
CN102902864B (en) Fast solution to approximate minimum volume bounding box of three-dimensional object
Jiang et al. Extracting 3-D structural lines of building from ALS point clouds using graph neural network embedded with corner information
CN107423763A (en) The two-dimensional projection&#39;s method and its optical projection system of high dimensional data
Zhang et al. Large-scale clustering with structured optimal bipartite graph
CN111241326B (en) Image visual relationship indication positioning method based on attention pyramid graph network
Zhang et al. Learning a probabilistic topology discovering model for scene categorization
Li et al. A non-rigid 3D model retrieval method based on scale-invariant heat kernel signature features
CN112949576A (en) Attitude estimation method, attitude estimation device, attitude estimation equipment and storage medium
CN115661255B (en) Laser SLAM loop detection and correction method
Zhang et al. Sparse and low-overlapping point cloud registration network for indoor building environments
Yang et al. Mixed attention hourglass network for robust face alignment
Zhang et al. Hierarchical Image Retrieval Method Based on Bag-of-Visual-Word and Eight-point Algorithm with Feature Clouds for Visual Indoor Positioning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20171201

RJ01 Rejection of invention patent application after publication