CN109271441B - High-dimensional data visual clustering analysis method and system - Google Patents

High-dimensional data visual clustering analysis method and system Download PDF

Info

Publication number
CN109271441B
CN109271441B CN201811517242.2A CN201811517242A CN109271441B CN 109271441 B CN109271441 B CN 109271441B CN 201811517242 A CN201811517242 A CN 201811517242A CN 109271441 B CN109271441 B CN 109271441B
Authority
CN
China
Prior art keywords
dimensional data
dimension
data
expansion
determining
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811517242.2A
Other languages
Chinese (zh)
Other versions
CN109271441A (en
Inventor
黎明
黄珊
陈昊
陈震
李军华
张聪炫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanchang Hangkong University
Original Assignee
Nanchang Hangkong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanchang Hangkong University filed Critical Nanchang Hangkong University
Priority to CN201811517242.2A priority Critical patent/CN109271441B/en
Publication of CN109271441A publication Critical patent/CN109271441A/en
Application granted granted Critical
Publication of CN109271441B publication Critical patent/CN109271441B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a high-dimensional data visual clustering analysis method and system. The method comprises the following steps: carrying out normalization preprocessing on the high-dimensional data; performing dimension expansion on the high-dimensional data subjected to the normalization processing through a multi-target genetic algorithm to obtain the high-dimensional data subjected to the dimension expansion; and mapping each group of the high-dimensional data after the dimensionality expansion to a circle-like space by using a circle-like mapping visualization method to realize the visualization clustering of the high-dimensional data. The method or the system can effectively realize the visual clustering of high-dimensional data, particularly the high-dimensional data containing the nonlinear structure.

Description

High-dimensional data visual clustering analysis method and system
Technical Field
The invention relates to the field of high-dimensional data visual clustering, in particular to a high-dimensional data visual clustering analysis method and system.
Background
The visualization technology is an important data analysis tool, and the internal structure, information and knowledge of data are expressed mainly by computer graphics, image processing, signal processing and other methods, so that the method is beneficial to researches such as pattern recognition, outlier detection and the like. With the rapid development of computers and sensing equipment, multi-dimensional and even high-dimensional data widely exist in the fields of economy, medicine, military, industry and the like, such as high-dimensional functional magnetic resonance imaging data, three-layer defense systems of multi-dimensional structures and the like. The increase of data dimension and scale brings new opportunities for data visualization. However, the traditional rectangular coordinates can express three-dimensional data at most and are not suitable for visualization research of high-dimensional data.
At present, the high-dimensional visualization technology mainly has two types. One of them is a dimension reduction method, which maps high-dimensional data to a low-dimensional space and represents the reduced data by scatter or other symbols. Mainly comprises principal component analysis, self-organizing mapping, neuron measurement method and the like. Although the dimension reduction visualization method can overcome the dimension disaster of the visualization technology in a certain sense, the dimension reduction visualization method can cause the loss of potentially important information, and the accuracy of high-dimensional data analysis is restricted. Another type of method obtains visualization results without using dimension reduction techniques, such as scatter plot matrices, parallel coordinate systems, and heat plot, which can represent high-dimensional data information intact. However, as the dimension and scale of data increase, a large number of curves or color blocks are complicatedly interlaced together due to the limitation of a screen, and the effectiveness of visualization is greatly restricted.
Compared with the above methods, the Radial layout Visualization method represented by Radial Visualization (RadViz) and Star Coordinates (SC) has a significant advantage in expressing high-dimensional data. The radial layout visualization method characterizes the data dimensions by circular radii and maps each individual to a point in a low dimensional space. The method can not only efficiently express any dimension data in a low-dimensional space, but also project the data with similar characteristics to similar positions, thereby forming a better visual clustering effect. However, RadViz is defined as a general non-linear mapping that does not take into account the shape and distribution of the data; and SC itself is a linear visualization method. Therefore, when the data is a nonlinear manifold structure, the traditional radial layout visualization method has a limitation in capturing the nonlinear data structure.
Therefore, how to efficiently realize the visualized clustering of the high-dimensional data, especially the high-dimensional data containing the nonlinear structure, is a technical problem that needs to be solved by those skilled in the art.
Disclosure of Invention
The invention aims to provide a high-dimensional data visual clustering analysis method and system, which are used for efficiently realizing visual clustering of high-dimensional data, particularly high-dimensional data containing nonlinear structures.
In order to achieve the purpose, the invention provides the following scheme:
a method of high dimensional data visualization cluster analysis, the method comprising:
carrying out normalization preprocessing on the high-dimensional data;
performing dimension expansion on the high-dimensional data subjected to the normalization processing through a multi-target genetic algorithm to obtain the high-dimensional data subjected to the dimension expansion;
and mapping each group of the high-dimensional data after the dimensionality expansion to a circle-like space by using a circle-like mapping visualization method to realize the visualization clustering of the high-dimensional data.
Optionally, the performing normalization preprocessing on the high-dimensional data specifically includes:
according to the formula
Figure BDA0001902312340000021
Normalizing pre-processing the high dimensional data, wherein FkmAnd
Figure BDA0001902312340000022
respectively representing the original attribute value and the normalized attribute value of the kth group of high-dimensional data on the mth dimension; max (F)m) And min (F)m) Respectively representing the maximum attribute value and the minimum attribute value of the high-dimensional data F on the mth dimension; k1, 2,., K, M1, 2,., M, K, and M represent the scale and the dimension of the high-dimensional data F, respectively.
Optionally, the performing, by the multi-target genetic algorithm, dimension expansion on the high-dimensional data after the normalization processing to obtain the high-dimensional data after the dimension expansion specifically includes:
initializing a population of the multi-target genetic algorithm; the population comprises a plurality of individuals; the individual represents an expanded state of the high-dimensional data;
constructing a multi-target evaluation index; the multi-target evaluation index comprises an expansion dimension, a topology maintenance index and a Dunn index of the high-dimensional data;
screening out an optimal individual through the multi-target evaluation index, wherein the optimal individual represents an optimal expansion state;
and performing dimension expansion on the high-dimensional data after the normalization processing according to the optimal expansion state to obtain the high-dimensional data after the dimension expansion.
Optionally, the constructing a multi-target evaluation index specifically includes:
determining the expansion dimension of the high-dimensional data by counting the number of 1 in each individual binary code in the population;
according to the formula
Figure BDA0001902312340000031
Determining a topology maintenance index for each of the individuals, wherein TP represents the topology maintenance index and K represents a gauge of the high-dimensional data FModule, tkRepresenting rank ordering of the kth group of data, according to formula
Figure BDA0001902312340000032
It is determined that u and s both represent the number of nearest neighbor data points, typically u is 4, s is 10, NNkyAnd nnkyY nearest data points, nn, representing the set of data points k in original space and in mapped space, respectivelyklAnd nnktRespectively representing the i and t nearest data points of the kth group of data points in the mapping space;
according to the formula
Figure BDA0001902312340000033
Determining Dunn index for each of said individuals, DI representing Dunn index, d (x, y) representing Euclidean distance between mapping points x and y, Ci、CjAnd CkAll represent the cluster of mapping points i, j, k, nc represents the number of the cluster of mapping points,
Figure BDA0001902312340000034
represents a cluster CiAnd cluster CjThe distance of (d);
Figure BDA0001902312340000035
represents a cluster CkOf (c) is measured.
Optionally, performing dimension expansion on the high-dimensional data after the normalization processing according to the optimal expansion state to obtain the high-dimensional data after the dimension expansion, specifically including:
counting the probability of r equal division of each dimension of the high-dimensional data after normalization processing in a value range of [0, 1], and determining a probability histogram of each dimension;
dividing each probability histogram by utilizing a neighbor propagation clustering algorithm, and determining each dimension division result;
and performing dimension expansion according to the division result and the optimal expansion state to obtain dimension-expanded high-dimensional data, wherein the dimension after each dimension expansion is equal to the number of clustering clusters of the probability distribution histogram of each dimension, and only one-dimensional data value of each dimension-expanded data is equal to the data value of the corresponding original dimension.
Optionally, the mapping each group of the dimensionality-expanded high-dimensional data to a circle-like space by using a circle-like mapping visualization method to implement the visual clustering of the high-dimensional data specifically includes:
constructing a circle-like space COThe circle-like space is a unit circle space of a two-dimensional rectangular coordinate system with an original point as a circle center;
according to
Figure BDA0001902312340000041
Determining the correlation among the high-dimensional data dimensions after each group of dimension expansion to obtain a similarity matrix, wherein SijFor the element in the ith row and the jth column in the similarity matrix, K represents the scale of the high-dimensional data F, tkiOrdering values of the kth group of data in the ith dimension, wherein the ordering values are numerical values obtained by ordering each group of data of the high-dimensional data after dimension expansion according to the size of the attribute value in each dimension by using 1-M integers;
determining a Fiedler vector by solving a eigenvector corresponding to the maximum eigenvalue of the Laplace matrix of the similarity matrix;
sorting the dimensionalities of the high-dimensional data after each group of dimensionality expansion according to the sizes of elements in the Fiedler vector to obtain sorted high-dimensional data;
according to the formula
Figure BDA0001902312340000042
Determining the dimensions of the sorted high-dimensional data to be COCoordinate point V on arcλ(i)Wherein, in the step (A),
Figure BDA0001902312340000043
the vector lambda represents a standard sequence vector of the sizes of elements of the Fiedler vector, lambda (i) represents the ith element value of the vector lambda, and i is 1, 2.
In the quasi-circular space, for any high-dimensional data
Figure BDA0001902312340000044
At the origin of coordinates and coordinate point Vλ(i)On a straight line connecting, determining the distance to the origin of coordinates as
Figure BDA0001902312340000045
Is recorded as a two-dimensional mapping point, wherein,
Figure BDA0001902312340000046
for the property value of the kth group of data in the lambda (i) dimension, any one of the individuals
Figure BDA0001902312340000047
Corresponding to the N two-dimensional mapping points;
forming one-to-one corresponding polygons through the two-dimensional space point sets corresponding to the groups of data, and determining the geometric center of the polygons;
and reducing the same cluster spacing of the geometric center of the polygon through a t-distribution neighborhood embedding algorithm, increasing the different cluster spacing of the geometric center of the polygon to determine the position of a mapping point, and realizing the visual clustering of high-dimensional data.
A high dimensional data visualization cluster analysis system, the system comprising:
the preprocessing module is used for carrying out normalization preprocessing on the high-dimensional data;
the dimensionality extension module is used for carrying out dimensionality extension on the high-dimensional data after the normalization processing through a multi-target genetic algorithm to obtain the high-dimensional data after the dimensionality extension;
and the mapping module is used for mapping each group of the dimensionality-expanded high-dimensional data to a circle-like space by using a circle-like mapping visualization method so as to realize the visualization clustering of the high-dimensional data.
Optionally, the dimension extension module specifically includes:
the initialization unit is used for initializing the population of the multi-target genetic algorithm; the population comprises a plurality of individuals; the individual represents an expanded state of the high-dimensional data;
the index construction unit is used for constructing a multi-target evaluation index; the multi-target evaluation index comprises an expansion dimension, a topology maintenance index and a Dunn index of the high-dimensional data;
the screening unit is used for screening out the optimal individual through the multi-target evaluation index, and the optimal individual represents the optimal expansion state;
and the dimension expansion unit is used for performing dimension expansion on the high-dimensional data after the normalization processing according to the optimal expansion state to obtain the high-dimensional data after the dimension expansion.
Optionally, the dimension extension unit specifically includes:
the statistical subunit is used for counting the probability that r is equally divided in the value range of [0, 1] of each dimension of the high-dimensional data after the normalization processing, and determining a probability histogram of each dimension;
the dividing unit is used for dividing each probability histogram by utilizing a neighbor propagation clustering algorithm and determining each dimension dividing result;
and the expansion subunit is used for performing dimension expansion according to the division result and the optimal expansion state to obtain high-dimensional data after the dimension expansion, wherein the dimension after the dimension expansion is equal to the number of clustering clusters of the probability distribution histogram of each dimension, and only one-dimensional data value of the data after the dimension expansion is equal to the data value on the corresponding original dimension.
Optionally, the mapping module specifically includes:
a circle-like space construction unit for constructing a circle-like space COThe circle-like space is a unit circle space of a two-dimensional rectangular coordinate system with an original point as a circle center;
a similarity matrix determination unit for determining a similarity matrix based on
Figure BDA0001902312340000061
Determining the correlation among the high-dimensional data dimensions after each group of dimension expansion to obtain a similarity matrix, wherein SijFor the element in the ith row and the jth column in the similarity matrix, K represents the scale of the high-dimensional data F, tkiThe sorting value of the ith dimension for the kth group of data is the number of each group of high-dimensional data with the dimension expanded by 1 to M integersAccording to the value of the attribute value in each dimension, carrying out the order marking;
the Fiedler vector determining unit is used for determining a Fiedler vector by solving the eigenvector corresponding to the maximum eigenvalue of the Laplace matrix of the similar matrix;
the sorting unit is used for sorting the dimensionality of the high-dimensional data after each group of dimensionality expansion according to the size of the elements in the Fiedler vector to obtain the sorted high-dimensional data;
a coordinate point determination unit for determining a coordinate point according to a formula
Figure BDA0001902312340000062
Determining the dimensions of the sorted high-dimensional data to be COCoordinate point V on arcλ(i)Wherein, in the step (A),
Figure BDA0001902312340000063
the vector lambda represents a standard sequence vector of the sizes of elements of the Fiedler vector, lambda (i) represents the ith element value of the vector lambda, and i is 1, 2.
A two-dimensional mapping point determining unit for determining any high-dimensional data in a circle-like space
Figure BDA0001902312340000064
At the origin of coordinates and coordinate point Vλ(i)On a straight line connecting, determining the distance to the origin of coordinates as
Figure BDA0001902312340000065
Is recorded as a two-dimensional mapping point, wherein,
Figure BDA0001902312340000066
for the property value of the kth group of data in the lambda (i) dimension, any one of the individuals
Figure BDA0001902312340000067
Corresponding to the N two-dimensional mapping points;
the geometric center determining unit is used for forming one-to-one corresponding polygons through the two-dimensional space point sets corresponding to the groups of data and determining the geometric centers of the polygons;
and the visual clustering realization unit is used for reducing the same cluster spacing of the geometric center of the polygon through a t-distribution neighborhood embedding algorithm, increasing the different cluster spacing of the geometric center of the polygon to determine the position of a mapping point and realizing the visual clustering of high-dimensional data.
Compared with the prior art, the invention has the following technical effects: the method carries out normalization preprocessing on the high-dimensional data; performing dimension expansion on the high-dimensional data subjected to the normalization processing through a multi-target genetic algorithm to obtain the high-dimensional data subjected to the dimension expansion; the high-dimensional data visualized clustering analysis method and the system provided by the invention can ensure the scientificity and effectiveness of visualized clustering analysis, thereby more efficiently realizing the visualized clustering of the high-dimensional data, particularly the high-dimensional data containing the nonlinear structure.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without inventive exercise.
FIG. 1 is a flow chart of a high dimensional data visualization clustering analysis method according to an embodiment of the present invention;
FIG. 2 is a block diagram of a high dimensional data visualization cluster analysis system according to an embodiment of the present invention;
fig. 3 is a schematic diagram of a probability histogram and a partitioning result of each dimension of the iris dataset when r is 20 according to the embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The invention aims to provide a high-dimensional data visual clustering analysis method and system, which are used for efficiently realizing visual clustering of high-dimensional data, particularly high-dimensional data containing nonlinear structures.
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in further detail below.
As shown in fig. 1, the high-dimensional data visualization cluster analysis method includes the following steps:
step 101: and carrying out normalization preprocessing on the high-dimensional data.
According to the formula
Figure BDA0001902312340000081
Normalizing pre-processing the high dimensional data, wherein FkmAnd
Figure BDA0001902312340000082
respectively representing the original attribute value and the normalized attribute value of the kth group of high-dimensional data on the mth dimension; max (F)m) And min (F)m) Respectively representing the maximum attribute value and the minimum attribute value of the high-dimensional data F on the mth dimension; k1, 2,., K, M1, 2,., M, K, and M represent the scale and the dimension of the high-dimensional data F, respectively.
Step 102: and performing dimension expansion on the high-dimensional data after the normalization processing through a multi-target genetic algorithm to obtain the high-dimensional data after the dimension expansion. The method specifically comprises the following steps:
1) initializing a population of the multi-target genetic algorithm; the population comprises a plurality of individuals; the individual represents an expanded state of the high-dimensional data.
2) Constructing a multi-target evaluation index; the multi-target evaluation index comprises an expansion dimension, a topology maintenance index and a Dunn index of the high-dimensional data. Specifically, the method comprises the following steps:
and determining the extension dimension of the high-dimensional data by counting the number of 1 in each individual binary code in the population.
According to the formula
Figure BDA0001902312340000083
Determining a topology maintenance index for each of the individuals, wherein TP represents the topology maintenance index, K represents the scale of the high-dimensional data F, and tkRepresenting rank ordering of the kth group of data, according to formula
Figure BDA0001902312340000084
It is determined that u and s both represent the number of nearest neighbor data points, typically u is 4, s is 10, NNkyAnd nnkyY nearest data points, nn, representing the set of data points k in original space and in mapped space, respectivelyklAnd nnktRespectively representing the i and t nearest data points of the kth group of data points in the mapping space;
according to the formula
Figure BDA0001902312340000085
Determining Dunn index for each of said individuals, DI representing Dunn index, d (x, y) representing Euclidean distance between mapping points x and y, Ci、CjAnd CkAll represent the cluster of mapping points i, j, k, nc represents the number of the cluster of mapping points,
Figure BDA0001902312340000091
represents a cluster CiAnd cluster CjThe distance of (d);
Figure BDA0001902312340000092
represents a cluster CkOf (c) is measured.
3) And screening out the optimal individual through the multi-target evaluation index, wherein the optimal individual represents the optimal expansion state.
4) And performing dimension expansion on the high-dimensional data after the normalization processing according to the optimal expansion state to obtain the high-dimensional data after the dimension expansion. Counting the probability of r equal division of each dimension of the high-dimensional data after normalization processing in a value range of [0, 1], and determining a probability histogram of each dimension; dividing each probability histogram by utilizing a neighbor propagation clustering algorithm, and determining each dimension division result; and performing dimension expansion according to the division result and the optimal expansion state to obtain dimension-expanded high-dimensional data, wherein the dimension after each dimension expansion is equal to the number of clustering clusters of the probability distribution histogram of each dimension, only one-dimensional data value of each dimension-expanded data is equal to the data value of the corresponding original dimension, the dimension is equal to the equal division of the data value of the original dimension, and the data values of the rest dimensions are 0.
Step 103: and mapping each group of the high-dimensional data after the dimensionality expansion to a circle-like space by using a circle-like mapping visualization method to realize the visualization clustering of the high-dimensional data. The method specifically comprises the following steps:
construct a structure quasi-circular space COThe circle-like space is a unit circle space of a two-dimensional rectangular coordinate system with an original point as a circle center;
according to
Figure BDA0001902312340000093
Determining the correlation among the high-dimensional data dimensions after each group of dimension expansion to obtain a similarity matrix, wherein SijFor the element in the ith row and the jth column in the similarity matrix, K represents the scale of the high-dimensional data F, tkiOrdering values of the kth group of data in the ith dimension, wherein the ordering values are numerical values obtained by ordering each group of data of the high-dimensional data after dimension expansion according to the size of the attribute value in each dimension by using 1-M integers;
determining a Fiedler vector by solving a eigenvector corresponding to the maximum eigenvalue of the Laplace matrix of the similarity matrix;
sorting the dimensionalities of the high-dimensional data after each group of dimensionality expansion according to the sizes of elements in the Fiedler vector to obtain sorted high-dimensional data;
according to the formula
Figure BDA0001902312340000101
Determining the dimensions of the sorted high-dimensional data to be COOn a circular arcCoordinate point Vλ(i)Wherein, in the step (A),
Figure BDA0001902312340000102
the vector lambda represents a standard sequence vector of the sizes of elements of the Fiedler vector, lambda (i) represents the ith element value of the vector lambda, and i is 1, 2.
In the quasi-circular space, for any high-dimensional data
Figure BDA0001902312340000103
At the origin of coordinates and coordinate point Vλ(i)On a straight line connecting, determining the distance to the origin of coordinates as
Figure BDA0001902312340000104
Is recorded as a two-dimensional mapping point, wherein,
Figure BDA0001902312340000105
for the property value of the kth group of data in the lambda (i) dimension, any one of the individuals
Figure BDA0001902312340000106
Corresponding to the N two-dimensional mapping points;
forming one-to-one corresponding polygons through the two-dimensional space point sets corresponding to the groups of data, and determining the geometric center of the polygons;
and reducing the same cluster spacing of the geometric center of the polygon through a t-distribution neighborhood embedding algorithm, increasing the different cluster spacing of the geometric center of the polygon to determine the position of a mapping point, and realizing the visual clustering of high-dimensional data.
According to the specific embodiment provided by the invention, the invention discloses the following technical effects: the method carries out normalization preprocessing on the high-dimensional data; performing dimension expansion on the high-dimensional data subjected to the normalization processing through a multi-target genetic algorithm to obtain the high-dimensional data subjected to the dimension expansion; the high-dimensional data visualized clustering analysis method and the system provided by the invention can ensure the scientificity and effectiveness of visualized clustering analysis, thereby more efficiently realizing the visualized clustering of the high-dimensional data, particularly the high-dimensional data containing the nonlinear structure.
The visualized cluster analysis method proposed in this patent is described below by taking a 150-dimensional iris data set as an example.
Step A: the method comprises the following steps of carrying out normalization pretreatment on the iris data set, wherein the normalization pretreatment specifically comprises the following steps:
according to the formula
Figure BDA0001902312340000107
Normalization pre-processing of Iris data set F, wherein FkmAnd
Figure BDA0001902312340000108
respectively representing the original attribute value and the normalized attribute value of the kth group of iris data sets in the mth dimension; max (F)m) And min (F)m) Respectively representing the maximum and minimum attribute values of the iris dataset in the mth dimension; k 1,2, 150, m 1,2,3, 4;
and B, performing dimension expansion on the iris data set subjected to the normalization treatment through an NSGAII multi-target genetic algorithm to obtain an iris data set subjected to the dimension expansion, wherein the method specifically comprises the following steps:
initializing a population of the NSGAII multi-target genetic algorithm; the population comprises a plurality of individuals; the individual represents the expansion state of the high-dimensional data binary code, the length of the high-dimensional data binary code is Iris florida dataset dimension 4, wherein 1 and 0 in the binary code respectively represent the corresponding Iris florida dataset dimension and do not carry out dimension expansion;
constructing a multi-target evaluation index, wherein the multi-target evaluation index comprises the expansion dimension, the topology maintenance index and the Dunn index of the iris data set;
screening out an optimal individual through the multi-target evaluation index, wherein the optimal individual represents an optimal expansion state of the iris data set;
performing dimension expansion on the iris data set subjected to the normalization processing according to the optimal expansion state to obtain the iris data set subjected to the dimension expansion, and specifically comprising the following steps:
counting the probability that each dimension of the iris data set after normalization processing appears in 20 equal divisions on the value range of [0, 1], and determining 4 dimension probability histograms;
dividing each of the 4 probability histograms by using a neighbor propagation clustering algorithm to determine 4 dimension division results, wherein the division of the probability distribution can be regarded as clustering two-dimensional data, and the two-dimensional data are x-axis (namely value) and y-axis (namely probability value) of each dimension probability distribution histogram respectively. Fig. 3 shows probability histograms of 4 dimensions of the iris data set and the division results, in which two-dimensional data coordinates are represented by scatter points, and scatter points of the same division type are connected by a broken line of the same type.
And performing dimension expansion according to the division result and the optimal expansion state to obtain a dimension-expanded high-dimensional iris data set, wherein the dimension after 4 dimension expansion is equal to the number of corresponding probability distribution histogram cluster, each dimension-expanded data has and only has one-dimensional data value equal to the data value on the corresponding original dimension, the dimension is equal to the equal division of the original dimension data value, and the data values on the other dimensions are 0. For example, fig. 3 illustrates that the first dimension of the iris dataset is divided into 3 parts, including 6, 7, and 7 data points, respectively. I.e. the first dimension of the Iris dataset is extended to three new dimensions and divided where the probabilities are 0.3 and 0.65. From this, it can be seen that if the data values of the 3 sets of data in the first dimension of the iris data set are 0.2, 0.5, and 0.8, the values in the new dimension are [0.200], [00.50], [000.8], respectively.
And C: respectively mapping the high-dimensional iris data sets with the extended dimensions to a circle-like space by using a circle-like mapping visualization method, specifically comprising:
constructing a circle-like space, wherein the circle-like space is a unit circle space of a two-dimensional rectangular coordinate system with an original point as a circle center;
according to
Figure BDA0001902312340000121
Determining each groupObtaining a similarity matrix by the correlation between dimensionalities of the data sets of the iris after dimensionality extension, wherein SijFor the element in the ith row and the jth column in the similarity matrix, K represents the scale of the high-dimensional data F, tkiThe ordering value of the kth group of data in the ith dimension is a numerical value obtained by ordering each group of data of the iris data subjected to dimension expansion according to the attribute value of each dimension by using 1 to N integers, wherein N is the dimension of the iris data set subjected to dimension expansion;
determining a Fiedler vector by solving a eigenvector corresponding to the maximum eigenvalue of the Laplace matrix of the similarity matrix;
sorting the dimensions of the high-dimensional iris data sets after the dimension expansion according to the sizes of elements in the Fiedler vector to obtain sorted high-dimensional data;
according to the formula
Figure BDA0001902312340000122
Determining the dimension of the sorted high-dimensional iris data set at COCoordinate point V on arcλ(i)Wherein, in the step (A),
Figure BDA0001902312340000123
the vector lambda represents a standard sequence vector of the sizes of elements of the Fiedler vector, lambda (i) represents the ith element value of the vector lambda, and i is 1, 2.
Iris data set expanded for any dimension in circle-like space
Figure BDA0001902312340000124
At the origin of coordinates and coordinate point Vλ(i)On a straight line connecting, determining the distance to the origin of coordinates as
Figure BDA0001902312340000125
Is recorded as a two-dimensional mapping point, wherein,
Figure BDA0001902312340000126
properties in the λ (i) th dimension for the kth group of dataValue, any individual
Figure BDA0001902312340000127
Corresponding to the N two-dimensional mapping points;
forming one-to-one corresponding polygons through the two-dimensional space point sets corresponding to the iris data sets, and determining the geometric center of the polygons;
and reducing the same cluster spacing of the geometric center of the polygon through a t-SNE algorithm, increasing the different cluster spacing of the geometric center of the polygon to determine the position of a mapping point, and realizing the visualized clustering of the iris data set. .
As shown in fig. 2, the present invention further provides a high dimensional data visualization cluster analysis system, which includes:
and the preprocessing module 201 is configured to perform normalization preprocessing on the high-dimensional data. According to the formula
Figure BDA0001902312340000131
Normalizing pre-processing the high dimensional data, wherein FkmAnd
Figure BDA0001902312340000132
respectively representing the original attribute value and the normalized attribute value of the kth group of high-dimensional data on the mth dimension; max (F)m) And min (F)m) Respectively representing the maximum attribute value and the minimum attribute value of the high-dimensional data F on the mth dimension; k1, 2,., K, M1, 2,., M, K, and M represent the scale and the dimension of the high-dimensional data F, respectively.
And the dimension expansion module 202 is configured to perform dimension expansion on the normalized high-dimensional data through a multi-target genetic algorithm to obtain the high-dimensional data after the dimension expansion.
The dimension extension module 202 specifically includes:
the initialization unit is used for initializing the population of the multi-target genetic algorithm; the population comprises a plurality of individuals; the individual represents an expanded state of the high-dimensional data;
the index construction unit is used for constructing a multi-target evaluation index; the multi-target evaluation index comprises the extended dimension and topology of the high-dimensional data
Maintenance index, Dunn index; specifically, the number of 1 in each individual binary code in the population is counted to determine the extension dimension of the high-dimensional data; according to the formula
Figure BDA0001902312340000133
Determining a topology maintenance index for each of the individuals, wherein TP represents the topology maintenance index, K represents the scale of the high-dimensional data F, and tkRepresenting rank ordering of the kth group of data, according to formula
Figure BDA0001902312340000141
It is determined that u and s both represent the number of nearest neighbor data points, typically u is 4, s is 10, NNkyAnd nnkyY nearest data points, nn, representing the set of data points k in original space and in mapped space, respectivelyklAnd nnktRespectively representing the i and t nearest data points of the kth group of data points in the mapping space;
according to the formula
Figure BDA0001902312340000142
Determining Dunn index for each of said individuals, DI representing Dunn index, d (x, y) representing Euclidean distance between mapping points x and y, Ci、CjAnd CkAll represent the cluster of mapping points i, j, k, nc represents the number of the cluster of mapping points,
Figure BDA0001902312340000143
represents a cluster CiAnd cluster CjThe distance of (d);
Figure BDA0001902312340000144
represents a cluster CkOf (c) is measured.
The screening unit is used for screening out the optimal individual through the multi-target evaluation index, and the optimal individual represents the optimal expansion state;
and the dimension expansion unit is used for performing dimension expansion on the high-dimensional data after the normalization processing according to the optimal expansion state to obtain the high-dimensional data after the dimension expansion.
The dimension extension unit specifically includes:
the statistical subunit is used for counting the probability that r is equally divided in the value range of [0, 1] of each dimension of the high-dimensional data after the normalization processing, and determining a probability histogram of each dimension;
the dividing unit is used for dividing each probability histogram by utilizing a neighbor propagation clustering algorithm and determining each dimension dividing result;
and the expansion subunit is used for performing dimension expansion according to the division result and the optimal expansion state to obtain high-dimensional data after the dimension expansion, wherein the dimension after the dimension expansion is equal to the number of clustering clusters of the probability distribution histogram of each dimension, and only one-dimensional data value of the data after the dimension expansion is equal to the data value on the corresponding original dimension.
And the mapping module 203 is configured to map each set of the dimensionality-expanded high-dimensional data to a circle-like space by using a circle-like mapping visualization method, so as to implement visual clustering of the high-dimensional data.
The mapping module 203 specifically includes:
a similarity matrix determination unit for determining a similarity matrix based on
Figure BDA0001902312340000151
Determining the correlation among the high-dimensional data dimensions after each group of dimension expansion to obtain a similarity matrix, wherein SijFor the element in the ith row and the jth column in the similarity matrix, K represents the scale of the high-dimensional data F, tkiOrdering values of the kth group of data in the ith dimension, wherein the ordering values are numerical values obtained by ordering each group of data of the high-dimensional data after dimension expansion according to the size of the attribute value in each dimension by using 1-M integers;
the Fiedler vector determining unit is used for determining a Fiedler vector by solving the eigenvector corresponding to the maximum eigenvalue of the Laplace matrix of the similar matrix;
the sorting unit is used for sorting the dimensionality of the high-dimensional data after each group of dimensionality expansion according to the size of the elements in the Fiedler vector to obtain the sorted high-dimensional data;
a coordinate point determination unit for determining a coordinate point according to a formula
Figure BDA0001902312340000152
Determining the dimensions of the sorted high-dimensional data to be COCoordinate point V on arcλ(i)Wherein, in the step (A),
Figure BDA0001902312340000153
the vector lambda represents a standard sequence vector of the sizes of elements of the Fiedler vector, lambda (i) represents the ith element value of the vector lambda, and i is 1, 2.
A two-dimensional mapping point determining unit for determining any high-dimensional data in a circle-like space
Figure BDA0001902312340000154
At the origin of coordinates and coordinate point Vλ(i)On a straight line connecting, determining the distance to the origin of coordinates as
Figure BDA0001902312340000155
Is recorded as a two-dimensional mapping point, wherein,
Figure BDA0001902312340000156
for the property value of the kth group of data in the lambda (i) dimension, any one of the individuals
Figure BDA0001902312340000157
Corresponding to the N two-dimensional mapping points;
the geometric center determining unit is used for forming one-to-one corresponding polygons through the two-dimensional space point sets corresponding to the groups of data and determining the geometric centers of the polygons;
and the visual clustering realization unit is used for reducing the same cluster spacing of the geometric center of the polygon through a t-distribution neighborhood embedding algorithm, increasing the different cluster spacing of the geometric center of the polygon to determine the position of a mapping point and realizing the visual clustering of high-dimensional data.
The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. For the system disclosed by the embodiment, the description is relatively simple because the system corresponds to the method disclosed by the embodiment, and the relevant points can be referred to the method part for description.
The principles and embodiments of the present invention have been described herein using specific examples, which are provided only to help understand the method and the core concept of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, the specific embodiments and the application range may be changed. In view of the above, the present disclosure should not be construed as limiting the invention.

Claims (7)

1. A high-dimensional data visualization cluster analysis method is characterized by comprising the following steps:
carrying out normalization preprocessing on the high-dimensional data;
performing dimension expansion on the high-dimensional data subjected to the normalization processing through a multi-target genetic algorithm to obtain the high-dimensional data subjected to the dimension expansion; the method specifically comprises the following steps:
initializing a population of the multi-target genetic algorithm; the population comprises a plurality of individuals; the individual represents an expanded state of the high-dimensional data;
constructing a multi-target evaluation index; the multi-target evaluation index comprises an expansion dimension, a topology maintenance index and a Dunn index of the high-dimensional data; the method specifically comprises the following steps:
determining the expansion dimension of the high-dimensional data by counting the number of 1 in each individual binary code in the population;
according to the formula
Figure FDA0002597429810000011
Determining a topology maintenance index for each of the individuals, wherein TP represents the topology maintenance index, K represents the scale of the high-dimensional data F, and tkRepresenting rank ordering of the kth group of data, according to formula
Figure FDA0002597429810000012
Determining that u and s both represent the number of nearest neighbor data points, NNkyAnd nnkyY nearest data points, nn, representing the set of data points k in original space and in mapped space, respectivelyklAnd nnktRespectively representing the i and t nearest data points of the kth group of data points in the mapping space; according to the formula
Figure FDA0002597429810000013
Determining Dunn index for each of said individuals, DI representing Dunn index, d (x, y) representing Euclidean distance between mapping points x and y, Ci、CjAnd CkAll represent the cluster of mapping points i, j, k, nc represents the number of the cluster of mapping points,
Figure FDA0002597429810000014
represents cluster C and the distance of cluster C;
Figure FDA0002597429810000015
represents the diameter of cluster C;
screening out an optimal individual through the multi-target evaluation index, wherein the optimal individual represents an optimal expansion state;
performing dimension expansion on the high-dimensional data after the normalization processing according to the optimal expansion state to obtain the high-dimensional data after the dimension expansion;
and mapping each group of the high-dimensional data after the dimensionality expansion to a circle-like space by using a circle-like mapping visualization method to realize the visualization clustering of the high-dimensional data.
2. The high-dimensional data visualization cluster analysis method according to claim 1, wherein the normalization preprocessing is performed on the high-dimensional data, and specifically comprises: according to the formula
Figure FDA0002597429810000021
Normalizing pre-processing the high dimensional data, wherein FkmAnd
Figure FDA0002597429810000022
respectively representing the original attribute value and the normalized attribute value of the kth group of high-dimensional data on the mth dimension; max (F)m) And min (F)m) Respectively representing the maximum attribute value and the minimum attribute value of the high-dimensional data F on the mth dimension; k1, 2,., K, M1, 2,., M, K, and M represent the scale and the dimension of the high-dimensional data F, respectively.
3. The high-dimensional data visualization cluster analysis method according to claim 1, wherein performing dimension expansion on the high-dimensional data after the normalization processing according to the optimal expansion state to obtain the high-dimensional data after the dimension expansion specifically comprises:
counting the probability of r equal division of each dimension of the high-dimensional data after normalization processing in a value range of [0, 1], and determining a probability histogram of each dimension;
dividing each probability histogram by utilizing a neighbor propagation clustering algorithm, and determining each dimension division result;
and performing dimension expansion according to the division result and the optimal expansion state to obtain dimension-expanded high-dimensional data, wherein the dimension after each dimension expansion is equal to the number of clustering clusters of the probability distribution histogram of each dimension, and only one-dimensional data value of each dimension-expanded data is equal to the data value of the corresponding original dimension.
4. The method for high-dimensional data visual cluster analysis according to claim 1, wherein the step of mapping each set of the dimensionality-expanded high-dimensional data to a circle-like space by using a circle-like mapping visualization method to realize visual clustering of the high-dimensional data specifically comprises the steps of:
constructing a circle-like space C0The circle-like space is a unit circle space of a two-dimensional rectangular coordinate system with an original point as a circle center;
according to
Figure FDA0002597429810000031
Determining each groupObtaining a similarity matrix by the correlation between the dimensionalities of the high-dimensional data after the dimensionality expansion, wherein SijFor the element in the ith row and the jth column in the similarity matrix, K represents the scale of the high-dimensional data F, tkiOrdering values of the kth group of data in the ith dimension, wherein the ordering values are numerical values obtained by ordering each group of data of the high-dimensional data after dimension expansion according to the size of the attribute value in each dimension by using 1-M integers;
determining a Fiedler vector by solving a eigenvector corresponding to the maximum eigenvalue of the Laplace matrix of the similarity matrix;
sorting the dimensionalities of the high-dimensional data after each group of dimensionality expansion according to the sizes of elements in the Fiedler vector to obtain sorted high-dimensional data;
according to the formula
Figure FDA0002597429810000032
Determining the dimensions of the sorted high-dimensional data to be C0Coordinate point V on arcλ(i)Wherein, in the step (A),
Figure FDA0002597429810000033
the vector lambda represents a standard sequence vector of the sizes of elements of the Fiedler vector, lambda (i) represents the ith element value of the vector lambda, and i is 1, 2.
In the quasi-circular space, for any high-dimensional data
Figure FDA0002597429810000034
At the origin of coordinates and coordinate point Vλ(i)On a straight line connecting, determining the distance to the origin of coordinates as
Figure FDA0002597429810000035
Is recorded as a two-dimensional mapping point, wherein,
Figure FDA0002597429810000036
for the property value of the kth group of data in the lambda (i) dimension, any one of the individuals
Figure FDA0002597429810000037
Corresponding to the N two-dimensional mapping points;
forming one-to-one corresponding polygons through the two-dimensional space point sets corresponding to the groups of data, and determining the geometric center of the polygons;
and reducing the same cluster spacing of the geometric center of the polygon through a t-distribution neighborhood embedding algorithm, increasing the different cluster spacing of the geometric center of the polygon to determine the position of a mapping point, and realizing the visual clustering of high-dimensional data.
5. A high dimensional data visualization cluster analysis system, the system comprising:
the preprocessing module is used for carrying out normalization preprocessing on the high-dimensional data;
the dimensionality extension module is used for carrying out dimensionality extension on the high-dimensional data after the normalization processing through a multi-target genetic algorithm to obtain the high-dimensional data after the dimensionality extension;
the mapping module is used for mapping each group of the high-dimensional data after the dimensionality expansion to a circle-like space by using a circle-like mapping visualization method to realize the visualization clustering of the high-dimensional data;
the dimension extension module specifically includes:
the initialization unit is used for initializing the population of the multi-target genetic algorithm; the population comprises a plurality of individuals; the individual represents an expanded state of the high-dimensional data;
the index construction unit is used for constructing a multi-target evaluation index; the multi-target evaluation index comprises an expansion dimension, a topology maintenance index and a Dunn index of the high-dimensional data; the method specifically comprises the following steps:
determining the expansion dimension of the high-dimensional data by counting the number of 1 in each individual binary code in the population;
according to the formula
Figure FDA0002597429810000041
Determining a topology maintenance indicator for each of said individuals, wherein TP represents topology maintenanceIndex, K denotes the scale of the high dimensional data F, tkRepresenting rank ordering of the kth group of data, according to formula
Figure FDA0002597429810000042
Determining that u and s both represent the number of nearest neighbor data points, NNkyAnd nnkyY nearest data points, nn, representing the set of data points k in original space and in mapped space, respectivelyklAnd nnktRespectively representing the i and t nearest data points of the kth group of data points in the mapping space; according to the formula
Figure FDA0002597429810000043
Determining Dunn index for each of said individuals, DI representing Dunn index, d (x, y) representing Euclidean distance between mapping points x and y, Ci、CjAnd CkAll represent the cluster of mapping points i, j, k, nc represents the number of the cluster of mapping points,
Figure FDA0002597429810000051
represents cluster C and the distance of cluster C;
Figure FDA0002597429810000052
represents the diameter of cluster C;
the screening unit is used for screening out the optimal individual through the multi-target evaluation index, and the optimal individual represents the optimal expansion state;
and the dimension expansion unit is used for performing dimension expansion on the high-dimensional data after the normalization processing according to the optimal expansion state to obtain the high-dimensional data after the dimension expansion.
6. The high-dimensional data visualization cluster analysis system according to claim 5, wherein the dimension extension unit specifically comprises:
the statistical subunit is used for counting the probability that r is equally divided in the value range of [0, 1] of each dimension of the high-dimensional data after the normalization processing, and determining a probability histogram of each dimension;
the dividing unit is used for dividing each probability histogram by utilizing a neighbor propagation clustering algorithm and determining each dimension dividing result;
and the expansion subunit is used for performing dimension expansion according to the division result and the optimal expansion state to obtain high-dimensional data after the dimension expansion, wherein the dimension after the dimension expansion is equal to the number of clustering clusters of the probability distribution histogram of each dimension, and only one-dimensional data value of the data after the dimension expansion is equal to the data value on the corresponding original dimension.
7. The high-dimensional data visualization cluster analysis system according to claim 5, wherein the mapping module specifically comprises:
a circle-like space construction unit for constructing a circle-like space C0The circle-like space is a unit circle space of a two-dimensional rectangular coordinate system with an original point as a circle center;
a similarity matrix determination unit for determining a similarity matrix based on
Figure FDA0002597429810000053
Determining the correlation among the high-dimensional data dimensions after each group of dimension expansion to obtain a similarity matrix, wherein SijFor the element in the ith row and the jth column in the similarity matrix, K represents the scale of the high-dimensional data F, tkiOrdering values of the kth group of data in the ith dimension, wherein the ordering values are numerical values obtained by ordering each group of data of the high-dimensional data after dimension expansion according to the size of the attribute value in each dimension by using 1-M integers;
the Fiedler vector determining unit is used for determining a Fiedler vector by solving the eigenvector corresponding to the maximum eigenvalue of the Laplace matrix of the similar matrix;
the sorting unit is used for sorting the dimensionality of the high-dimensional data after each group of dimensionality expansion according to the size of the elements in the Fiedler vector to obtain the sorted high-dimensional data;
a coordinate point determination unit for determining a coordinate point according to a formula
Figure FDA0002597429810000061
Determining the dimensions of the sorted high-dimensional data to be C0Coordinate point V on arcλ(i)Wherein, in the step (A),
Figure FDA0002597429810000062
the vector lambda represents a standard sequence vector of the sizes of elements of the Fiedler vector, lambda (i) represents the ith element value of the vector lambda, and i is 1, 2.
A two-dimensional mapping point determining unit for determining any high-dimensional data in a circle-like space
Figure FDA0002597429810000063
At the origin of coordinates and coordinate point Vλ(i)On a straight line connecting, determining the distance to the origin of coordinates as
Figure FDA0002597429810000064
Is recorded as a two-dimensional mapping point, wherein,
Figure FDA0002597429810000065
for the property value of the kth group of data in the lambda (i) dimension, any one of the individuals
Figure FDA0002597429810000066
Corresponding to the N two-dimensional mapping points;
the geometric center determining unit is used for forming one-to-one corresponding polygons through the two-dimensional space point sets corresponding to the groups of data and determining the geometric centers of the polygons;
and the visual clustering realization unit is used for reducing the same cluster spacing of the geometric center of the polygon through a t-distribution neighborhood embedding algorithm, increasing the different cluster spacing of the geometric center of the polygon to determine the position of a mapping point and realizing the visual clustering of high-dimensional data.
CN201811517242.2A 2018-12-12 2018-12-12 High-dimensional data visual clustering analysis method and system Active CN109271441B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811517242.2A CN109271441B (en) 2018-12-12 2018-12-12 High-dimensional data visual clustering analysis method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811517242.2A CN109271441B (en) 2018-12-12 2018-12-12 High-dimensional data visual clustering analysis method and system

Publications (2)

Publication Number Publication Date
CN109271441A CN109271441A (en) 2019-01-25
CN109271441B true CN109271441B (en) 2020-09-01

Family

ID=65187645

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811517242.2A Active CN109271441B (en) 2018-12-12 2018-12-12 High-dimensional data visual clustering analysis method and system

Country Status (1)

Country Link
CN (1) CN109271441B (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110162568B (en) * 2019-05-24 2021-01-08 东北大学 Three-dimensional data visualization method based on PCA-Radviz
CN110308873B (en) * 2019-06-24 2023-04-07 浙江大华技术股份有限公司 Data storage method, device, equipment and medium
CN110458187B (en) * 2019-06-27 2020-07-31 广州大学 Malicious code family clustering method and system
CN110781569B (en) * 2019-11-08 2023-12-19 桂林电子科技大学 Abnormality detection method and system based on multi-resolution grid division
CN113095427B (en) * 2021-04-23 2022-09-13 中南大学 High-dimensional data analysis method and face data analysis method based on user guidance
US12026450B2 (en) 2022-08-01 2024-07-02 International Business Machines Corporation Visual representation for higher dimension data sets
CN116049697A (en) * 2023-01-10 2023-05-02 苏州科技大学 Interactive clustering quality improving method based on user intention learning

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108764676B (en) * 2018-05-17 2020-10-30 南昌航空大学 High-dimensional multi-target evaluation method and system

Also Published As

Publication number Publication date
CN109271441A (en) 2019-01-25

Similar Documents

Publication Publication Date Title
CN109271441B (en) High-dimensional data visual clustering analysis method and system
Zhang et al. Local density adaptive similarity measurement for spectral clustering
CN112990010B (en) Point cloud data processing method and device, computer equipment and storage medium
CN103164701B (en) Handwritten Numeral Recognition Method and device
CN110751027B (en) Pedestrian re-identification method based on deep multi-instance learning
CN108564130B (en) Infrared target identification method based on monogenic features and multi-kernel learning
CN103403704A (en) Method and device for finding nearest neighbor
CN108764676B (en) High-dimensional multi-target evaluation method and system
CN104282025A (en) Biomedical image feature extraction method
CN108960335A (en) One kind carrying out efficient clustering method based on large scale network
CN110188864B (en) Small sample learning method based on distribution representation and distribution measurement
CN110083731B (en) Image retrieval method, device, computer equipment and storage medium
CN113496260B (en) Grain depot personnel non-standard operation detection method based on improved YOLOv3 algorithm
CN114332172A (en) Improved laser point cloud registration method based on covariance matrix
CN110390337B (en) Ship individual identification method
WO2023050461A1 (en) Data clustering method and system, and storage medium
US20220156416A1 (en) Techniques for comparing geometric styles of 3d cad objects
CN105718950B (en) A kind of semi-supervised multi-angle of view clustering method based on structural constraint
CN109977787B (en) Multi-view human behavior identification method
CN113627522A (en) Image classification method, device and equipment based on relational network and storage medium
Borges et al. Spatial-time motifs discovery
Teng et al. The calculation of similarity and its application in data mining
Xue Comparison of conventional and lightweight convolutional neural networks for Image Classification
Shi et al. Metric-based curve clustering and feature extraction in flow visualization
Lu et al. K‐Nearest Neighbor Intervals Based AP Clustering Algorithm for Large Incomplete Data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant