CN106339354A - Visualization method of high-dimensional data in cloud computing network based on improved PCA - Google Patents

Visualization method of high-dimensional data in cloud computing network based on improved PCA Download PDF

Info

Publication number
CN106339354A
CN106339354A CN201610682672.4A CN201610682672A CN106339354A CN 106339354 A CN106339354 A CN 106339354A CN 201610682672 A CN201610682672 A CN 201610682672A CN 106339354 A CN106339354 A CN 106339354A
Authority
CN
China
Prior art keywords
dimensional data
data
high dimensional
row
formula
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201610682672.4A
Other languages
Chinese (zh)
Other versions
CN106339354B (en
Inventor
顾爱华
李树军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangsu Jialu Hengxin Intelligent Technology Co.,Ltd.
Original Assignee
Yancheng Teachers University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yancheng Teachers University filed Critical Yancheng Teachers University
Priority to CN201610682672.4A priority Critical patent/CN106339354B/en
Publication of CN106339354A publication Critical patent/CN106339354A/en
Application granted granted Critical
Publication of CN106339354B publication Critical patent/CN106339354B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Computational Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computing Systems (AREA)
  • Algebra (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a visualization method of high-dimensional data in a cloud computing network based on improved PCA for standardized processing of the high-dimensional data in the cloud computing network. The visualization method of high-dimensional data in the cloud computing network based on the improved PCA comprises two parts of establishment of a high-dimensional data characteristic matrix and data standardization optimization based on high-dimensional data visualization; in the process of the high-dimensional data visualization, variables in the original high-dimensional data matrix are standardized to give a new high-dimensional data characteristic matrix, characteristic values in the matrix are arranged in sequence, and principal component data of which the variances are maximum are selected; principal component contribution ratio factors and similarities of the columns are comprehensively considered in the data standardization optimization based on the high-dimensional data visualization to put forward a new data column sorting method. Simulation results show that the improved method has better visualization and classification effects, so that high-dimensional data standardization in the cloud computing network can be better realized.

Description

Methods of High-dimensional Data Visualization in system for cloud computing based on improvement pca
Technical field
The present invention relates to Methods of High-dimensional Data Visualization in a kind of system for cloud computing based on improvement pca, belong to high dimension According to standardization technical field.
Background technology
At present, with calculating developing rapidly of science and technology, high dimensional data presents the pattern of magnanimity growth.In cloud computing In network, big data is basis and the core technology of cloud computing, there is substantial amounts of high dimensional data in these big data, but currently Human cognitive ability has certain limitation it is impossible to grasp the deep information containing in high dimensional data complicated and changeable, therefore In this case, how effectively these high dimensional datas of development process become association area subject matter urgently to be resolved hurrily, High dimensional data is carried out visualize the premise being by that high dimensional data is standardized processing, determine the effect of standardization Really.And Methods of High-dimensional Data Visualization can be standardized to the variable in original high dimensional data matrix locating in system for cloud computing Reason, is set up by row again to the high dimensional data after conversion, completes high dimensional data visualization in system for cloud computing and presents, is to solve The fundamental way of the problems referred to above, causes the attention of a lot of experts and scholar.
Document [1] proposes one kind based on Methods of High-dimensional Data Visualization in the visual system for cloud computing of radial coordinate. The method estimates the intrinsic dimension of high dimensional data in system for cloud computing first with maximum likelihood principle, using less variable knot Close and blend with radial coordinate principle, on this basis dimension reduction and visualization processing is carried out to high dimensional data in system for cloud computing.Should Method is relatively simple, but there is a problem of that method limitation is big.Document [2] proposes a kind of cloud computing based on random forest Methods of High-dimensional Data Visualization in network.The method carries out supervised learning first with rf, similar between high dimensional data sample Measured, and using scatterplot, data is visualized in lower dimensional space, thus completing in system for cloud computing High dimensional data visualization presents.The method robustness is stronger, but when visualization processing is carried out to data using current algorithm, no Method eliminates high dimensional data and concentrates and comprises a large amount of irrelevant informations and redundancy, there is a problem of that data assumes error big.Document [3] Employ based on Methods of High-dimensional Data Visualization in the system for cloud computing of som.The method is first by the high dimension in system for cloud computing According to being mapped in three dimensions, using tdsom, the abscissa of point set, vertical coordinate and three variables of ordinate under three-dimensional coordinate are reflected Penetrate in the attribute classification of data set, complete high dimensional data visualization in system for cloud computing on this basis and present.The method Extensibility is stronger, but there is the defect being difficult to clearly represent high dimensional data exactly.
List of references:
[1] Xie Yonghua, Wang Chang, Yuan Fuxing. based on application in terms of cloud visualization for the Linear Octree light projecting algorithm [j]. science and technology and engineering, 2014,14 (30): 191-195.
Steel when [2]. the large-scale terrain rendering algorithm based on mipmap and emulation [j]. Computer Simulation, 2015,32 (2):270-274.
[3] Wang Jing, Xu Zhijie. real-time colony behavioral value [j] based on space-time texture. Xi'an University of Posts & Telecommunications's journal, 2015,20(2):64-76.
Content of the invention
Goal of the invention: for problems of the prior art and a kind of not enough, cloud based on improvement pca of present invention proposition Methods of High-dimensional Data Visualization in calculating network, is standardized to high dimensional data in system for cloud computing processing, simulation result table Bright, improved method has preferable visualization and classifying quality, can be very good to realize high dimensional data standard in system for cloud computing Change is processed.Principal component analysiss (pca) are a kind of methods of mathematics dimensionality reduction, and its method is to find out several aggregate variables to replace originally Numerous variables, makes these aggregate variables reflect the quantity of information of primal variable as much as possible, and separate each other.
Technical scheme: Methods of High-dimensional Data Visualization in a kind of system for cloud computing based on improvement pca, to system for cloud computing Middle high dimensional data is standardized processing and optimizes;Establishment including high dimensional data eigenmatrix and visual based on high dimensional data Data normalization processes and optimizes two parts.
The establishment of high dimensional data eigenmatrix
To in high dimensional data visualization process, the variable in original high dimensional data matrix is standardized processing, gives Go out new high dimensional data eigenmatrix, the eigenvalue in matrix is arranged in order, choose the maximum number of principal components of variance According to.Specific step is as detailed below:
It is assumed that byRepresent raw data matrix under system for cloud computing, each variable that x is represented enters rower Standardization pretreatment, obtains standardized data matrix z using formula (5)
z i j = x i , j - x &overbar; i s i 2 - - - ( 5 )
In formula, xi,jRepresent j-th category attribute of i-th high dimensional data,Represent the covariance square of i-th high dimensional data Battle array,Represent the low-dimensional embedded space of i-th high dimensional data.
Formula (6) and formula (7) is then utilized to calculate
x &overbar; i = 1 n σ j = 1 n x i , j - - - ( 6 )
s i 2 = 1 n - 1 σ j = 1 n | x i j - x &overbar; i | 2 - - - ( 7 )
It is assumed that covariance matrix is represented by c, then calculate c using formula (8)
c = 1 n z t z - - - ( 8 )
Obtain the eigenvalue matrix λ=diag (λ of c using Jacobi method12,…λm) and characteristic vector w.
The eigenvalue of each data is arranged λ according to descending order1> λ2> ... > λm, and to characteristic vector row Order is adjusted correspondingly, and promotes first main constituent to have the variance of maximum, promotes second main constituent to have secondary big Variance, and minimum variance is corresponded to d-th main constituent.Choose k maximum main constituent of variance, and promote k main constituent energy Enough retain most raw information, the cumulative variance contribution of k main constituent generally making selection is more than population variance 85%, that is,It is assumed that by wiRepresent the characteristic vector of k main constituent of selection, then utilize formula (11) Obtain k independent linear combination new variables;
ξ k = w i × w d &circleplus; ( λ 1 &greaterequal; λ 2 &greaterequal; ... &greaterequal; λ m ) - - - ( 11 )
Can illustrate in sum, in high dimensional data visualization process, by the variable in original high dimensional data matrix It is standardized processing, provide new high dimensional data eigenmatrix, the eigenvalue in matrix is arranged in order, selection side The maximum number of principal components evidence of difference, for realizing high dimensional data visualization is laid a good foundation.
Processed based on the visual data normalization of high dimensional data and optimize
Consider similarity between principal component contributor rate factor and row it is proposed that new data row sort method, mainly Process is as follows:
It is assumed that the data matrix after main constituent conversion is represented by y, with the ξ obtainingkIt is foundation, calculate y using formula (12)
y = ξ k f c &circletimes; gω * - - - ( 12 )
In formula, fc represents separating degree between the class of different classes of data, and g represents high dimensional spatial clustering data, ω*Represent in class Concentration class.
1. the contribution degree factor calculates
First calculated row between similarity matrix be
s = s 11 ... s 1 d . . . s i j s n 1 s n d
Wherein sijRepresent the similarity of the i-th row and jth row.Then for the i-th row, and the average similarity of other all row is
t i = σ k = 1 d s i k d
tiThe i-th row and the similarity degree of other row can be reflected, therefore can define the new contribution degree factor is
g i = a i σ k = 1 d a k t i - - - ( 13 )
aiRepresent contribution degree factor weights, this contribution degree factor is taken advantage of by the similarity between principal component contributor rate factor and row Amass and obtain, can preferably reflect the importance degree of each row.
2. data sorting
To giThe contribution degree factor representing is according to order arrangement from big to small, and will adjust it accordingly and correspond in y The order of middle row it is assumed that representing the matrix after adjustment order by y ', is then stated using formula (14)
y ′ = y 11 ′ ... y 1 d ′ . . . . . . y n 1 ′ y n d ′ - - - ( 14 )
Can state out from formula (14), contribution rate is bigger, the data in y is listed in corresponding data row sequence in y ' and gets over Forward, then in visualization presents, DISPLAY ORDER is more forward.
3. data row weight
The weight size of every string of y ' is defined as contribution rate, and every for y ' string is multiplied with corresponding contribution rate, then Using formula (15) statement
y n e w ′ = g 1 y 11 ′ ... g 1 y 1 d ′ . . . . . . g n y n 1 ′ ... g n y n d ′ - - - ( 15 )
It is assumed that by λnewRepresent new contribution data rate, calculate λ using formula (16)newIn any two row i, the distance between j
λ n e w ( i , j ) = d ( i , j ) × y n e w ′ - - - ( 16 )
In formula, d (i, j) represents and does not add i before contribution rate factor, the distance of j.
Brief description
The data visualization effect of Fig. 1 the inventive method;
Fig. 2 is the data visualization effect of document [2] algorithm.
Specific embodiment
With reference to specific embodiment, it is further elucidated with the present invention it should be understood that these embodiments are merely to illustrate the present invention Rather than restriction the scope of the present invention, after having read the present invention, the various equivalences to the present invention for the those skilled in the art The modification of form all falls within the application claims limited range.
In carrying out high dimensional data standardization optimization process, need high dimensional data visualization in system for cloud computing On the basis of just can complete, first dimensionality reduction is carried out to whole high dimensional datas in system for cloud computing, high dimensional data is projected to two dimension On data space, extract high dimensional data and concentrate relation between classification and its feature, search different pieces of information feature permutation order and Excellent mapping, classifies to data set on this basis, completes the visualization processing to high dimensional data using its result, realizes high Data normalization process, specific step as detailed below:
It is assumed that by xiHigh dimensional data under input cloud computing environment, xi=(xi1,xi2,xid)tRepresent xiD dimensional feature to Amount, then carry out dimensionality reduction using formula (1) to whole high dimensional datas in system for cloud computing, high dimensional data is projected empty to 2-D data Between on;
| | x - w c | | = σ j m i n w j × x j x - ϵ ( t ) λ ( k ) - - - ( 1 )
In formula, wjRepresent the vector of whole high dimensional data identical dimensional, ε (t) representative sample data set.λ (k) representative feature Between similarity meansigma methodss, the j=1 ... in formula 1, d, x be high dimensional data matrix, wcData matrix by wanted dimensionality reduction.
It is assumed that the similarity between high dimensional data sample, v are represented by cv (i, j)kI () represents random high dimensional data feature Vector, then utilize formula (2) to extract high dimensional data and concentrate the relation between classification and its feature
p r o x = c v ( i , j ) &circletimes; v k ( i ) | | x - w c | | · cv n × n - - - ( 2 )
In formula, cvn×nRepresent the transformation matrix of high dimensional data.
It is assumed that by cvn×nRepresent the transformation matrix of high dimensional data,Represent the eigenvalue of dissimilar high dimensional data, full It is sufficient to the condition of (k=1,2,3...), then search different pieces of information feature permutation order and optimum mapping, set up high using formula (3) The Visualization Model of dimension data
In formula, similarity between prox (i, j) representative sample (i, j), the meansigma methodss of similarity between λ (k) representative feature.
With formula (3) as foundation, whole High Dimensional Data Sets is standardized process using formula (4)
m ^ ( k ) = p r o x · c v ( i , j ) w ( ξ ) - - - ( 4 )
But traditional method can not effectively eliminate redundant data and information, effect of visualization is poor, reduces data normalization The effect processing.Methods of High-dimensional Data Visualization in a kind of system for cloud computing based on improvement pca is proposed, in system for cloud computing High dimensional data is standardized processing and optimizes;Establishment including high dimensional data eigenmatrix and be based on the visual number of high dimensional data Optimize two parts according to standardization.
The establishment of high dimensional data eigenmatrix
To in high dimensional data visualization process, the variable in original high dimensional data matrix is standardized processing, gives Go out new high dimensional data eigenmatrix, the eigenvalue in matrix is arranged in order, choose the maximum number of principal components of variance According to.Specific step is as detailed below:
It is assumed that byRepresent raw data matrix under system for cloud computing, each variable that x is represented enters rower Standardization pretreatment, obtains standardized data matrix using formula (5)
z i , j = x i , j - x &overbar; i s i 2 - - - ( 5 )
In formula, xi,jRepresent j-th category attribute of i-th high dimensional data,Represent the covariance square of i-th high dimensional data Battle array,Represent the low-dimensional embedded space of i-th high dimensional data.
Formula (6) and formula (7) is then utilized to calculate
x &overbar; i = 1 n σ j = 1 n x i , j - - - ( 6 )
s i 2 = 1 n - 1 σ j = 1 n | x i j - x &overbar; i | 2 - - - ( 7 )
It is assumed that covariance matrix is represented by c, then calculate c using formula (8)
c = 1 n z t z - - - ( 8 )
Obtain the eigenvalue matrix λ=diag (λ of c using Jacobi method12,...λm) and characteristic vector w.
By described above, the eigenvalue of each data is arranged λ according to order1> λ2> ... > λm, and to characteristic vector The order of row is adjusted correspondingly, and promotes first main constituent to have the variance of maximum, promotes second main constituent to have secondary Big variance, and minimum variance is corresponded to d-th main constituent.Choose k maximum main constituent of variance, and promote k main one-tenth Divide and can retain most raw information, the cumulative variance contribution of k main constituent generally making selection is more than always side The 85% of difference, that is,It is assumed that by wiRepresent the characteristic vector of k main constituent of selection, then utilize formula (11) obtain k independent linear combination new variables;
ξ k = w i × w d &circleplus; ( λ 1 &greaterequal; λ 2 &greaterequal; ... &greaterequal; λ m ) - - - ( 11 )
Can illustrate in sum, in high dimensional data visualization process, by the variable in original high dimensional data matrix It is standardized processing, provide new high dimensional data eigenmatrix, the eigenvalue in matrix is arranged in order, selection side The maximum number of principal components evidence of difference, for realizing high dimensional data visualization is laid a good foundation.
Processed based on the visual data normalization of high dimensional data and optimize
Consider similarity between principal component contributor rate factor and row it is proposed that new data row sort method, mainly Process is as follows: it is assumed that representing the data matrix after main constituent conversion by y, with the ξ obtainingkIt is foundation, calculated using formula (12) y
y = ξ k f c &circletimes; gω * - - - ( 12 )
In formula, fc represents separating degree between the class of different classes of data, and g represents high dimensional spatial clustering data, ω*Represent in class Concentration class.
1. the contribution degree factor calculates
First calculated row between similarity matrix be
s = s 11 ... s 1 d . . . s i j s n 1 s n d
Wherein sijRepresent the similarity of the i-th row and jth row.Then for the i-th row, and the average similarity of other all row is
t i = σ k = 1 d s i k d
tiThe i-th row and the similarity degree of other row can be reflected, therefore can define the new contribution degree factor is
g i = a i σ k = 1 d a k t i - - - ( 13 )
In formula, aiRepresent contribution degree factor weights, this contribution degree factor is by similar between principal component contributor rate factor and row The product of degree obtains, and can preferably reflect the importance degree of each row.
2. data sorting
To giThe contribution degree factor representing is according to order arrangement from big to small, and will adjust it accordingly and correspond in y The order of middle row it is assumed that representing the matrix after adjustment order by y ', is then stated using formula (14)
y ′ = y 11 ′ ... y 1 d ′ . . . . . . y n 1 ′ y n d ′ - - - ( 14 )
Can state out from formula (14), contribution rate is bigger, the data in y is listed in corresponding data row sequence in y ' and gets over Forward, then in visualization presents, DISPLAY ORDER is more forward.
3. data row weight
1) the weight size of every string of y ' is defined as contribution rate, and every for y ' string is multiplied with corresponding contribution rate, Formula (15) is then utilized to state
y n e w ′ = g 1 y 11 ′ ... g 1 y 1 d ′ . . . . . . g n y n 1 ′ ... g n y n d ′ - - - ( 15 )
It is assumed that by λnewRepresent new contribution data rate, calculate λ using formula (16)newIn any two row i, the distance between j
λ n e w ( i , j ) = d ( i , j ) × y n e w ′ - - - ( 16 )
In formula, d (i, j) represents and does not add i before contribution rate factor, the distance of j.
Emulation proves
In order to prove that in the system for cloud computing based on improvement pca proposing, Methods of High-dimensional Data Visualization carries out high dimensional data The effectiveness of standardization, needs once to be tested.The hardware system that experiment is chosen is 2.8ghz cpu, the meter of 1g internal memory Calculation machine, the data set in experiment derives fromhttp://dbgroup.cs.tsinghua.edu download.Selected data collection is through conventional The Performance comparision of various pattern recognition task in document, table 1 provides sample number, characteristic number and the classification number of experimental data set.
Table 1 experimental data set information
Wherein, nd represents data name, ns representative sample number, cs representative feature number, and nc represents classification number,
In order to ensure the fairness of high dimensional data visualized experiment, the estimation of grader error rate adopts 6v, takes 6 independences The average result of experiment, 11v refers to for data set sample to be divided into into 6 parts.Because the good and bad directly shadow data mark of effect of visualization Standardization processes effect of optimization, and therefore the present invention to high dimensional data, verify by visual effect.
Algorithms of different classification error rate
It is respectively adopted this paper algorithm and document [2], document [1] algorithm carry out high dimensional data in system for cloud computing and visualize in fact Test.The high dimensional data classification error rate of relatively 3 kinds of algorithms of different, comparing result is shown in Table 2.
Table 2 algorithms of different classification error rate contrasts
Wherein, nd represents data name, and pa represents the error rate of the inventive method, and la [9] represents the mistake of document [2] algorithm Rate by mistake, la [8] represents the error rate of document [1] algorithm.
Can analyze from table 2 and draw, carry out high dimensional data visualization classification in system for cloud computing using the inventive method Error rate carry out the mistake of high dimensional data visualization classification in system for cloud computing well below document [2], document [1] algorithm Rate, this is primarily due to when carrying out high dimensional data visualization using this paper algorithm, after being changed with principal component contributor rate High dimensional data column pitch is from again being set up by row to the high dimensional data after conversion using hierarchical clustering algorithm, thus having ensured this Inventive method carries out the accuracy that high dimensional data in system for cloud computing visualizes data classification.
Algorithms of different carries out the visual Contrast on effect of high dimensional data
It is respectively adopted the inventive method and document [2] carries out high dimensional data visualized experiment in system for cloud computing.Relatively 2 kinds The high dimensional data effect of visualization of algorithms of different.Comparing result is shown in Fig. 1 and Fig. 2.
Analyze from Fig. 1 and Fig. 2 and can draw, carry out high dimensional data visualization in system for cloud computing using the inventive method Effect be better than document [2] and carry out the visual effect of high dimensional data, this is primarily due to carrying out higher-dimension using document [2] During data visualization, first the variable in original high dimensional data matrix is standardized processing, provides new high dimensional data feature Matrix, the eigenvalue in matrix is arranged in order, chooses variance maximum number of principal components evidence, thus having ensured side of the present invention Method carries out the visual superiority of high dimensional data.
Simulation result shows, institute's extracting method has preferable visualization and classifying quality.

Claims (3)

1. a kind of based on improve pca system for cloud computing in Methods of High-dimensional Data Visualization it is characterised in that: to system for cloud computing Middle high dimensional data is standardized processing and optimizes;Establishment including high dimensional data eigenmatrix and visual based on high dimensional data Data normalization processes and optimizes two parts;
The establishment of high dimensional data eigenmatrix
To in high dimensional data visualization process, the variable in original high dimensional data matrix is standardized processing, is given new High dimensional data eigenmatrix, the eigenvalue in matrix is arranged in order, is chosen the maximum number of principal components evidence of variance;
Processed based on the visual data normalization of high dimensional data and optimize
Consider similarity between principal component contributor rate factor and row it is proposed that new data row sort method.
2. Methods of High-dimensional Data Visualization in the system for cloud computing based on improvement pca as claimed in claim 1, its feature exists In:
The establishment of high dimensional data eigenmatrix, specific step as detailed below:
It is assumed that byRepresent raw data matrix under system for cloud computing, each variable that x is represented is standardized Pretreatment, obtains standardized data matrix z using formula (5)
z i , j = x i , j - x i &overbar; s i 2 - - - ( 5 )
In formula, xi,jRepresent j-th category attribute of i-th high dimensional data,Represent the covariance matrix of i-th high dimensional data,Represent the low-dimensional embedded space of i-th high dimensional data;
Formula (6) and formula (7) is then utilized to calculate
x &overbar; i = 1 n σ j = 1 n x i , j - - - ( 6 )
s i 2 = 1 n - 1 σ j = 1 n | x i j - x &overbar; i | 2 - - - ( 7 )
It is assumed that covariance matrix is represented by c, then calculate c using formula (8)
c = 1 n z t z - - - ( 8 )
Obtain the eigenvalue matrix λ=diag (λ of c using Jacobi method12,...λm) and characteristic vector w;
The eigenvalue of each data is arranged λ according to descending order1> λ2> ... > λm, and the order to characteristic vector row It is adjusted correspondingly, promotes first main constituent to have the variance of maximum, promote second main constituent to have secondary big variance, And minimum variance is corresponded to d-th main constituent;Choose k maximum main constituent of variance, and promote k main constituent can retain Most raw information;It is assumed that by wiRepresent the characteristic vector of k main constituent of selection, then obtain k solely using formula (11) Vertical linear combination new variables:
ξ k = w i × w d &circleplus; ( λ 1 &greaterequal; λ 2 &greaterequal; ... &greaterequal; λ m ) - - - ( 11 ) .
3. Methods of High-dimensional Data Visualization in the system for cloud computing based on improvement pca as claimed in claim 2, its feature exists In: processed based on the visual data normalization of high dimensional data and optimize, main process is as follows:
It is assumed that the data matrix after main constituent conversion is represented by y, with the ξ obtainingkIt is foundation, calculate y using formula (12)
y = ξ k f c &circletimes; gω * - - - ( 12 )
In formula, fc represents separating degree between the class of different classes of data, and g represents high dimensional spatial clustering data, ω*Represent in class and assemble Degree.
(1) the contribution degree factor calculates
First calculated row between similarity matrix be
s = s 11 ... s 1 d . . . s i j s n 1 s n d
Wherein sijRepresent the similarity of the i-th row and jth row.Then for the i-th row, and the average similarity of other all row is
t i = σ k = 1 d s i k d
tiThe i-th row and the similarity degree of other row can be reflected, therefore can define the new contribution degree factor is
g i = a i σ k = 1 d a k t i - - - ( 13 )
aiRepresent contribution degree factor weights, this contribution degree factor is obtained by the product of the similarity between principal component contributor rate factor and row Arrive, can preferably reflect the importance degree of each row;
(2) data sorting
To giThe contribution degree factor representing is according to order arrangement from big to small, and will adjust it accordingly and correspond to row in y Order it is assumed that representing the matrix after adjustment order by y ', is then stated using formula (14)
y ′ = y 11 ′ ... y 1 d ′ . . . . . . y n 1 ′ y n d ′ - - - ( 14 )
Can state out from formula (14), contribution rate is bigger, the data in y is listed in corresponding data row sequence in y ' and more leans on Before, then in visualization presents, DISPLAY ORDER is more forward;
(3) data row weight
The weight size of every string of y ' is defined as contribution rate, and every for y ' string is multiplied with corresponding contribution rate, then utilize Formula (15) is stated
y n e w ′ = g 1 y 11 ′ ... g 1 y 1 d ′ . . . . . . g n y n 1 ′ ... g n y n d ′ - - - ( 15 )
It is assumed that by λnewRepresent new contribution data rate, calculate λ using formula (16)newIn any two row i, the distance between j
λ n e w ( i , j ) = d ( i , j ) × y n e w ′ - - - ( 16 )
In formula, d (i, j) represents and does not add i before contribution rate factor, the distance of j.
CN201610682672.4A 2016-08-17 2016-08-17 Based on Methods of High-dimensional Data Visualization in the system for cloud computing for improving PCA Active CN106339354B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610682672.4A CN106339354B (en) 2016-08-17 2016-08-17 Based on Methods of High-dimensional Data Visualization in the system for cloud computing for improving PCA

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610682672.4A CN106339354B (en) 2016-08-17 2016-08-17 Based on Methods of High-dimensional Data Visualization in the system for cloud computing for improving PCA

Publications (2)

Publication Number Publication Date
CN106339354A true CN106339354A (en) 2017-01-18
CN106339354B CN106339354B (en) 2018-11-20

Family

ID=57824246

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610682672.4A Active CN106339354B (en) 2016-08-17 2016-08-17 Based on Methods of High-dimensional Data Visualization in the system for cloud computing for improving PCA

Country Status (1)

Country Link
CN (1) CN106339354B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106940803A (en) * 2017-02-17 2017-07-11 平安科技(深圳)有限公司 Correlated variables recognition methods and device
CN106960213A (en) * 2017-02-14 2017-07-18 广东广业开元科技有限公司 A kind of Key Unit of Fire Safety grade sequence system analyzed based on big data
CN108428209A (en) * 2018-03-28 2018-08-21 深圳大学 Methods of High-dimensional Data Visualization, apparatus and system

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103902706A (en) * 2014-03-31 2014-07-02 东华大学 Method for classifying and predicting big data on basis of SVM (support vector machine)
CN103942568A (en) * 2014-04-22 2014-07-23 浙江大学 Sorting method based on non-supervision feature selection
CN104063511A (en) * 2014-07-09 2014-09-24 哈尔滨工业大学 Complex system supervision graph embedding structural data visualized monitoring method based on relevant measurement

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103902706A (en) * 2014-03-31 2014-07-02 东华大学 Method for classifying and predicting big data on basis of SVM (support vector machine)
CN103942568A (en) * 2014-04-22 2014-07-23 浙江大学 Sorting method based on non-supervision feature selection
CN104063511A (en) * 2014-07-09 2014-09-24 哈尔滨工业大学 Complex system supervision graph embedding structural data visualized monitoring method based on relevant measurement

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
刘广等: "高维多目标优化的可视化技术研究", 《中国优秀硕士学位论文全文数据库》 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106960213A (en) * 2017-02-14 2017-07-18 广东广业开元科技有限公司 A kind of Key Unit of Fire Safety grade sequence system analyzed based on big data
CN106960213B (en) * 2017-02-14 2018-08-31 广东广业开元科技有限公司 A kind of Key Unit of Fire Safety grade sequence system based on big data analysis
CN106940803A (en) * 2017-02-17 2017-07-11 平安科技(深圳)有限公司 Correlated variables recognition methods and device
CN106940803B (en) * 2017-02-17 2018-04-17 平安科技(深圳)有限公司 Correlated variables recognition methods and device
CN108428209A (en) * 2018-03-28 2018-08-21 深圳大学 Methods of High-dimensional Data Visualization, apparatus and system
CN108428209B (en) * 2018-03-28 2022-02-15 深圳大学 High-dimensional data visualization method, device and system

Also Published As

Publication number Publication date
CN106339354B (en) 2018-11-20

Similar Documents

Publication Publication Date Title
CN110443281B (en) Text classification self-adaptive oversampling method based on HDBSCAN (high-density binary-coded decimal) clustering
CN106971205A (en) A kind of embedded dynamic feature selection method based on k nearest neighbor Mutual Information Estimation
CN100557626C (en) Image partition method based on immune spectrum clustering
CN106845717A (en) A kind of energy efficiency evaluation method based on multi-model convergence strategy
CN103810704B (en) Based on support vector machine and the SAR image change detection of discriminative random fields
CN106557579A (en) A kind of vehicle model searching system and method based on convolutional neural networks
CN101877007A (en) Remote sensing image retrieval method with integration of spatial direction relation semanteme
CN106257498A (en) Zinc flotation work condition state division methods based on isomery textural characteristics
CN109657721A (en) A kind of multi-class decision-making technique of combination fuzzy set and random forest tree
CN108257154A (en) Polarimetric SAR Image change detecting method based on area information and CNN
Guo et al. Urban impervious surface extraction based on multi-features and random forest
Ma et al. A supervised progressive growing generative adversarial network for remote sensing image scene classification
CN106228027A (en) A kind of semi-supervised feature selection approach of various visual angles data
CN106339354A (en) Visualization method of high-dimensional data in cloud computing network based on improved PCA
CN112613536A (en) Near infrared spectrum diesel grade identification method based on SMOTE and deep learning
CN108595558A (en) A kind of image labeling method of data balancing strategy and multiple features fusion
CN113902861A (en) Three-dimensional geological modeling method based on machine learning
Wang et al. R2-trans: Fine-grained visual categorization with redundancy reduction
CN109902731A (en) A kind of detection method and device of the performance fault based on support vector machines
CN108537266A (en) A kind of cloth textured fault sorting technique of depth convolutional network
CN112819208A (en) Spatial similarity geological disaster prediction method based on feature subset coupling model
Hou et al. Imbalanced fault identification via embedding-augmented Gaussian prototype network with meta-learning perspective
Dahal Effect of different distance measures in result of cluster analysis
CN106815320A (en) Based on the investigation big data visual modeling method and system of expanding stereogram
Zhang et al. Multi-hierarchical spatial clustering for characteristic towns in China: An Orange-based framework to integrate GIS and Geodetector

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20210714

Address after: 224400 No.9 Kunlun Road, Funing modern service industry park, Yancheng City, Jiangsu Province (d)

Patentee after: Jiangsu Huilian Zhitong Communication Technology Co.,Ltd.

Address before: 224002 No. 50, open avenue, Jiangsu, Yancheng City

Patentee before: YANCHENG TEACHERS University

TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20211223

Address after: 224400 No. 18-26, Jinan Road, Jinsha Lake Street, Funing County, Yancheng City, Jiangsu Province

Patentee after: Jiangsu Jialu Hengxin Intelligent Technology Co.,Ltd.

Address before: 224400 No.9 Kunlun Road, Funing modern service industry park, Yancheng City, Jiangsu Province (d)

Patentee before: Jiangsu Huilian Zhitong Communication Technology Co.,Ltd.

TR01 Transfer of patent right