CN113537281B - Dimension reduction method for performing visual comparison on multiple high-dimension data - Google Patents

Dimension reduction method for performing visual comparison on multiple high-dimension data Download PDF

Info

Publication number
CN113537281B
CN113537281B CN202110576652.XA CN202110576652A CN113537281B CN 113537281 B CN113537281 B CN 113537281B CN 202110576652 A CN202110576652 A CN 202110576652A CN 113537281 B CN113537281 B CN 113537281B
Authority
CN
China
Prior art keywords
dimension reduction
data
dimension
similarity
distribution
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110576652.XA
Other languages
Chinese (zh)
Other versions
CN113537281A (en
Inventor
汪云海
孙国霞
王银桥
陈路
卢金禹
华博
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong University
Original Assignee
Shandong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong University filed Critical Shandong University
Priority to CN202110576652.XA priority Critical patent/CN113537281B/en
Publication of CN113537281A publication Critical patent/CN113537281A/en
Application granted granted Critical
Publication of CN113537281B publication Critical patent/CN113537281B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • G06F18/24147Distances to closest patterns, e.g. nearest neighbour classification
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Analysis (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a dimension reduction method for performing visual comparison on a plurality of high-dimension data sets, which is used for receiving two high-dimension data sets to be processed; computing edge similarity for the two datasets; performing dimension reduction processing on the first data set by using a t distribution-random neighbor embedding method; based on the edge similarity, introducing edge vector constraint into an optimization equation of a t distribution-random neighbor embedding method, and obtaining a dimension reduction result of the second data set through solving optimization. The invention can realize the consistency dimension reduction result suitable for the comparison task.

Description

Dimension reduction method for performing visual comparison on multiple high-dimension data
Technical Field
The invention belongs to the technical field of data visualization, and particularly relates to a dimension reduction method for performing visual comparison on a plurality of high-dimension data.
Background
The statements in this section merely provide background information related to the present disclosure and may not necessarily constitute prior art.
Dimension reduction is a process of mapping high-dimensional data into a perceptibly low-dimensional space and maintaining as much as possible the correlation of data points in the original space. The dimension reduction can reveal the underlying distribution and topology of high-dimensional data, so that human analysis and interpretation become possible, and therefore, the dimension reduction method is widely applied to a plurality of fields such as data mining, machine learning, bioinformatics and the like. Common dimension reduction methods include t-distribution-random neighbor embedding (t-SNE), principal Component Analysis (PCA), multidimensional scaling (MDS), and the like.
The dimension reduction can be compared as an extension of conventional dimension reduction for processing a series of dynamic high-dimensional datasets. Such as comparing the outputs of different layers of the deep neural network. The simplest way is to reduce the dimension for each data individually. However, due to the randomness and unpredictable optimization process of many dimension reduction methods, this approach often introduces undesirable variations, such as shifts in the location of the same data point between different frames. Thus, a general goal of comparable dimension reduction is to maintain the fidelity of the dimension reduction while achieving visual consistency of the sequence dimension reduction results.
Existing methods for comparable dimension reduction can be divided into the following two types according to the type of data change:
incremental dimension reduction methods, in which there is one incremental or additive update of the data in each time frame, the previous point is typically maintained in a static position. Such as incremental principal component analysis (incremental PCA), by finding the optimal overlap of common data points in two adjacent dimension-reduction results, then using a position estimation algorithm to support the addition of non-uniform dimension data points to the data.
The time-varying dimension reduction method varies the characteristics of the data points between different time frames without the number of data points varying. Dynamic t-distribution-random nearest neighbor embedding (Dynamic t-SNE) introduces an additional loss function term on the basis of t-SNE, which acts to penalize the movement of the position of each data point in different dimension reduction results. While this approach achieves visual consistency, the strict constraints on the absolute position of each point easily lead to distortion of the dimension reduction result. In addition, dynamic t-SNE is optimized together with a series of data sets received at one time, causing a great computational burden, which is also a challenge to hardware and thus unsuitable for the dimension reduction of streaming data.
Disclosure of Invention
In order to solve the problems, the invention provides a dimension reduction method for performing visual comparison on a plurality of high-dimensional data.
According to some embodiments, the present invention employs the following technical solutions:
a dimension reduction method for visually comparing a plurality of high-dimensional data, comprising the steps of:
receiving two high-dimensional data sets to be processed;
computing edge similarity for the two datasets;
performing dimension reduction processing on the first data set by using a t distribution-random neighbor embedding method;
based on the edge similarity, introducing edge vector constraint into an optimization equation of a t distribution-random neighbor embedding method, and obtaining a dimension reduction result of the second data set through solving optimization.
In an alternative embodiment, the process of computing edge similarity for two data sets includes:
receiving two input high-dimensional data sets, and respectively constructing k neighbor graphs by using KD trees;
searching all the primitives containing the node in the k neighbor graph, and taking normalized primitive frequency distribution as the characteristic vector of the node;
and calculating the similarity of the common edges in the k-nearest neighbor graphs of two adjacent time frames based on the feature vector of each node.
As a further limitation, the specific procedure of taking the normalized primitive frequency distribution as the eigenvector of the node: for all nodes in the two k-nearest neighbor graphs, the frequency distribution of all primitives containing them is counted separately.
As a further limitation, the specific process of calculating the similarity of the common edges in the k-nearest neighbor graphs of two adjacent time frames based on the feature vector of each node includes: the vertex similarity between corresponding nodes is calculated based on the frequency distribution of all primitives including the nodes, and then the similarity of two edges in two k-nearest neighbor graphs is calculated based on the vertex similarity.
As an alternative embodiment, the specific process of introducing the edge vector constraint into the optimization equation of the t-distribution-random neighbor embedding method includes: and establishing a unified energy optimization equation for the coordinates of the current frame dimension reduction space, the coordinates in the current frame high-dimension space and the coordinates of the previous frame dimension reduction space based on the similarity.
A dimension reduction system for visually comparing a plurality of high-dimensional data, comprising:
the first dimension reduction module is configured to receive two high-dimension data sets to be processed, and the dimension reduction processing is carried out on the first data set by using a t distribution-random neighbor embedding method;
a similarity calculation module configured to calculate edge similarity for the two data sets based on vertex similarity;
the second dimension reduction module is configured to introduce edge vector constraint into an optimization equation of the t distribution-random neighbor embedding method, and obtain a dimension reduction result of the second data set through solving optimization.
As an alternative implementation mode, the method further comprises a visualization module, wherein the visualization module is configured to obtain a visualization result according to the optimized dimension reduction position and map the optimized dimension reduction position to class labels of the data points by using a color table selected by a user in advance.
A computer readable storage medium having stored therein a plurality of instructions adapted to be loaded by a processor of a terminal device and to perform the steps of one of the above-described dimension reduction methods of visually comparing a plurality of high-dimensional data.
A terminal device comprising a processor and a computer readable storage medium, the processor configured to implement instructions; the computer readable storage medium is for storing a plurality of instructions adapted to be loaded by a processor and to perform the steps of one of the above-described dimension reduction methods for visually comparing a plurality of high-dimensional data.
Compared with the prior art, the invention has the beneficial effects that:
the invention provides a comparable dimension reduction method aiming at high-dimension dynamic data. By introducing similarity metrics and vector constraints based on primitive kernel functions, a consistent and realistic dimension reduction result can be generated for multiple datasets. The method solves the defect that the prior method adds global constraint and can not reflect the real local change of high-dimensional data, is easier for a user to analyze, and has wide application prospect in the field of data visualization.
In order to make the above objects, features and advantages of the present invention more comprehensible, preferred embodiments accompanied with figures are described in detail below.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the invention.
FIG. 1 is a flow chart of a method for computing structural similarity based on primitive kernel functions according to an embodiment;
fig. 2 (a) shows the dimension reduction result of the high-dimension time series data corresponding to λ=0.01;
fig. 2 (b) shows the dimension reduction result of the high-dimension time series data corresponding to λ=0.05;
fig. 2 (c) shows the dimension reduction result of the high-dimension time series data corresponding to λ=0.1;
FIG. 3 (a) is a graph showing the dimension reduction result of artificial high-dimension time series data of a randomly initialized t-SNE;
FIG. 3 (b) is a graph of the artificial high-dimensional time series data dimension reduction results for the same initialized t-SNE;
FIG. 3 (c) is a graph showing the dimension reduction result of the artificial high-dimension time series data of Dynamic t-SNE;
FIG. 3 (d) is a dimension reduction result of the artificial high-dimension time series data in the present embodiment;
FIG. 4 (a) is a real high-dimensional time series data dimension reduction result of a randomly initialized t-SNE;
FIG. 4 (b) is a true high-dimensional time series data dimension reduction result of the same initialized t-SNE;
FIG. 4 (c) shows the real high-dimensional time series data dimension reduction result of Dynamic t-SNE;
FIG. 4 (d) is a real high-dimensional time-series data dimension reduction result of the present embodiment;
fig. 5 is a schematic flow chart of the first embodiment.
The specific embodiment is as follows:
the invention will be further described with reference to the drawings and examples.
It should be noted that the following detailed description is illustrative and is intended to provide further explanation of the invention. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments according to the present invention. As used herein, the singular is also intended to include the plural unless the context clearly indicates otherwise, and furthermore, it is to be understood that the terms "comprises" and/or "comprising" when used in this specification are taken to specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof.
Embodiments of the invention and features of the embodiments may be combined with each other without conflict.
Example 1
A dimension reduction method for visually comparing a plurality of high-dimensional data, comprising the steps of:
receiving two high-dimensional data sets to be processed;
computing edge similarity for the two datasets;
performing dimension reduction processing on the first data set by using a t distribution-random neighbor embedding method;
based on the edge similarity, introducing edge vector constraint into an optimization equation of a t distribution-random neighbor embedding method, and obtaining a dimension reduction result of the second data set through solving optimization.
The calculating of the edge similarity is based on a primitive kernel function, as shown in fig. 1, and the method for calculating the edge similarity based on the primitive kernel function includes:
step 1: receiving two high-dimensional data X of input 0 And X 1 The method comprises the steps of carrying out a first treatment on the surface of the Constructing k-nearest neighbor graphs G using KD-trees, respectively 0 And G 1
Step 2: for G 0 And G 1 The ith node in (a)And->The frequency distribution of all primitives containing them, denoted fv, is counted separately i 0 ,fv i 1
Step 3: calculation of v i 0 And v i 1 Vertex similarity between:
wherein,representation->And->Is the proportion of the common k-nearest neighbor,/>Representation->And->Cosine similarity of (c).
Step 4: calculation G 0 Edges in (a)And G 1 Side->Similarity of (3):
the process of the t-SNE of the edge vector constraint comprises the following steps:
step 1: receiving two high-dimensional data X of input 0 And X 1
Step 2: pair X using t-SNE 0 Dimension reduction is carried out to obtain dimension reduction result Y 0 . the optimization equation for t-SNE is as follows:
wherein,is X 0 I th point->And j' th point>Symmetric joint probability of->Is Y 0 I th point->And j' th point>Is used to determine the joint probability of (1).
Step 3: adding vector constraints to the optimization equation of t-SNE for X 1 Dimension reduction is carried out to obtain dimension reduction result Y 1 . The definition is as follows:
wherein the method comprises the steps ofRepresents G 0 And G 1 Middle common edge e 0 ij And e 1 ij N represents the number of common edges, λ represents the weight of the artificially set vector constraintHeavy. />Represents Y 0 I data point of>Represents Y 1 I data point of (b). />Represents Y 0 In j data points>Represents Y 1 In j data points>Representation->And->Symmetric joint probability of->Representation->Andis used to determine the joint probability of (1).
This embodiment uses a gradient descent algorithm to solve this equation. By solving the optimization equation, the reality of the single-frame dimension reduction result and the consistency among different frame dimension reduction results can be balanced to the greatest extent.
The parameter lambda is the weight of vector constraint, fig. 2 (a), fig. 2 (b), and fig. 2 (c) illustrates the optimization result at different lambda;
fig. 3 (a), 3 (b), 3 (c) and 3 (d) show application scenarios of the dimension reduction method on artificial data, where there are four time frames in total. At t=0, 500 data points are generated from five isotropic gaussian distributions in 100-dimensional space, with the center of each distribution randomly chosen among the standard basis vectors, and the variance of each distribution is 0.05. The resulting dataset is noted as t=0. At t=1, all points in the first cluster are shifted +0.15 in each dimension. At t=2, we split the second cluster in half, with half moving +0.15 in all dimensions and the other half moving-0.15 in all dimensions. At t=3, we overlap the centers of the third and fourth clusters. In this example, the comparison results of the same initialized t-SNE and the randomly initialized t-SNE are shown in FIG. 3 (a), FIG. 3 (b), FIG. 3 (c) and FIG. 3 (d). For ease of comparison, we used the dimension reduction result of Dynamic t-SNE as the first frame of the four methods at t=0.
It can be seen that the results of the randomly initialized t-SNEs of FIG. 3 (a) at different time frames are not aligned; FIG. 3 (b) the same initialized t-SNE provides better visual consistency compared to the same initialized t-SNE, but still has clusters unnecessarily shifted; the third and fourth clusters at t=3 should be completely coincident together, but merely adjacent together due to absolute position constraints in the dimension reduction result of Dynamic t-SNE of fig. 3 (c); in contrast, the present embodiment of fig. 3 (d) is capable of generating a dimension reduction result that satisfies consistency and is true and reliable.
Fig. 4 (a), 4 (b), 4 (c), and 4 (d) show the application scenario of the present embodiment on the data set on the convolutional neural network VGG-16, where the raw data is 700 images from the ImageNet data set, covering ten categories of tiger cat, cat plaque, giant chenille, standard chenille, big white shark, tiger shark, canary, sparrow, traveling car, and military. The image is input into a pretrained VGG-16 network, and the output characteristic vector of the last four layers of the network is taken as high-dimensional time sequence data of four time frames.
Four dimension reduction methods were used to compare the data. Similarly, we use the dimension reduction result of Dynamic t-SNE as the first frame of the four methods at t=0. The randomly initialized t-SNE of FIG. 4 (a) and the identically initialized t-SNE of FIG. 4 (b) produce the most authentic results, but fail to maintain consistency. FIG. 4 (c) Dynamic t-SNE produces results that are too stiff to reflect dramatic changes in topology. Fig. 4 (d) this embodiment is more robust to drastic changes while exhibiting more realistic dimension reduction results.
Embodiment two:
the embodiment also provides a dimension reduction system for visually comparing a plurality of high-dimension data, which comprises:
an input module configured to receive the input high-dimensional time series data, calculate a k-nearest neighbor map for each frame of data;
a similarity module configured to calculate a topological structure similarity between corresponding nodes of k-nearest neighbor graphs of adjacent frames using a primitive kernel function; and calculating the edge similarity between the corresponding public edges;
the system comprises an energy optimization equation establishment module, a storage module and a storage module, wherein the energy optimization equation establishment module is configured to establish a unified energy optimization equation for the coordinates of a current frame dimension reduction space and the coordinates in a current frame high-dimension space and the coordinates of a previous frame dimension reduction space based on similarity; in this embodiment, the dimension-reducing space is generally a two-dimensional space;
and the visualization module is configured to solve the energy optimization equation to obtain an optimized dimension reduction position, and map the dimension reduction position to class labels of the data points by using a color table provided by a user to obtain a final visualization result.
Example III
A computer readable storage medium having stored therein a plurality of instructions adapted to be loaded by a processor of a terminal device and to perform the steps of the method provided by the above embodiments.
Example IV
A terminal device comprising a processor and a computer readable storage medium, the processor configured to implement instructions; the computer readable storage medium is for storing a plurality of instructions adapted to be loaded by a processor and to perform the steps of the methods provided by the above-described embodiments.
It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The above description is only of the preferred embodiments of the present invention and is not intended to limit the present invention, but various modifications and variations can be made to the present invention by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.
While the foregoing description of the embodiments of the present invention has been presented in conjunction with the drawings, it should be understood that it is not intended to limit the scope of the invention, but rather, it is intended to cover all modifications or variations within the scope of the invention as defined by the claims of the present invention.

Claims (8)

1. A dimension reduction method for visually comparing a plurality of high-dimension data is characterized by comprising the following steps: the method comprises the following steps:
receiving two high-dimensional data sets to be processed;
computing edge similarity for the two datasets; specifically, two input high-dimensional data sets are received, and k neighbor graphs are respectively constructed by using KD trees; searching all the primitives containing the node in the k neighbor graph, and taking normalized primitive frequency distribution as the characteristic vector of the node; calculating the similarity of the public edges in the k-nearest neighbor graphs of two adjacent time frames based on the feature vector of each node; the specific process of taking the normalized primitive frequency distribution as the characteristic vector of the node is to count the frequency distribution of all primitives containing the nodes in the two k-nearest neighbor graphs respectively;
performing dimension reduction processing on the first data set by using a t distribution-random neighbor embedding method;
based on the edge similarity, introducing edge vector constraint into an optimization equation of a t distribution-random neighbor embedding method, and obtaining a dimension reduction result of a second data set by solving and optimizing, wherein the dimension reduction result is specifically as follows: solving an energy optimization equation to obtain an optimized dimension reduction position, and mapping a color table provided by a user to class labels of data points to obtain a final visual result;
the method is applied to a data set of a convolutional neural network VGG-16, original data come from 700 images in the image Net data set, ten categories including tiger cat, spot cat, giant Xuenarui dog, standard Xuenarui dog, big white shark, tiger shark, canary, sparrow, traveling automobile and military uniform are covered, the images are input into the pretrained VGG-16 network, and output feature vectors of the last four layers of the network are taken as high-dimensional time sequence data of four time frames.
2. A dimension reduction method for visual comparison of a plurality of high-dimensional data as defined in claim 1, wherein: the vertex similarity between corresponding nodes is calculated based on the frequency distribution of all primitives including the nodes, and then the similarity of two edges in two k-nearest neighbor graphs is calculated based on the vertex similarity.
3. A dimension reduction method for visual comparison of a plurality of high-dimensional data as defined in claim 1, wherein: the specific process of introducing the edge vector constraint into the optimization equation of the t-distribution-random neighbor embedding method comprises the following steps: and establishing a unified energy optimization equation for the coordinates of the current frame dimension reduction space, the coordinates in the current frame high-dimension space and the coordinates of the previous frame dimension reduction space based on the similarity.
4. A dimension reduction method for visual comparison of a plurality of high-dimensional data as defined in claim 1, wherein: and when the solution is optimized, the gradient descent algorithm is utilized for solving.
5. A dimension reduction system for visually comparing a plurality of high-dimension data is characterized in that: comprising the following steps:
the first dimension reduction module is configured to receive two high-dimension data sets to be processed, and the dimension reduction processing is carried out on the first data set by using a t distribution-random neighbor embedding method;
a similarity calculation module configured to calculate edge similarity for the two data sets based on vertex similarity; specifically, two input high-dimensional data sets are received, and k neighbor graphs are respectively constructed by using KD trees; searching all the primitives containing the node in the k neighbor graph, and taking normalized primitive frequency distribution as the characteristic vector of the node; calculating the similarity of the public edges in the k-nearest neighbor graphs of two adjacent time frames based on the feature vector of each node; the specific process of taking the normalized primitive frequency distribution as the characteristic vector of the node is to count the frequency distribution of all primitives containing the nodes in the two k-nearest neighbor graphs respectively;
the second dimension reduction module is configured to introduce edge vector constraint into an optimization equation of the t distribution-random neighbor embedding method, and obtains a dimension reduction result of the second data set by solving and optimizing, specifically: solving an energy optimization equation to obtain an optimized dimension reduction position, and mapping a color table provided by a user to class labels of data points to obtain a final visual result;
the method is applied to a data set of a convolutional neural network VGG-16, original data come from 700 images in the image Net data set, ten categories including tiger cat, spot cat, giant Xuenarui dog, standard Xuenarui dog, big white shark, tiger shark, canary, sparrow, traveling automobile and military uniform are covered, the images are input into the pretrained VGG-16 network, and output feature vectors of the last four layers of the network are taken as high-dimensional time sequence data of four time frames.
6. A dimension reduction system for visual comparison of a plurality of high-dimensional data as defined in claim 5, wherein: the system further comprises a visualization module which is configured to obtain a visualization result according to the optimized dimension reduction position and mapping the optimized dimension reduction position to class labels of the data points by utilizing a color table selected by a user in advance.
7. A computer-readable storage medium, characterized by: in which a plurality of instructions are stored, said instructions being adapted to be loaded by a processor of a terminal device and to perform the steps of a method of dimension reduction for visual comparison of a plurality of high-dimensional data according to any of claims 1-4.
8. A terminal device, characterized by: comprising a processor and a computer-readable storage medium, the processor configured to implement instructions; a computer readable storage medium for storing a plurality of instructions adapted to be loaded by a processor and to perform the steps of a method of dimension reduction for visual comparison of a plurality of high-dimensional data as claimed in any one of claims 1 to 4.
CN202110576652.XA 2021-05-26 2021-05-26 Dimension reduction method for performing visual comparison on multiple high-dimension data Active CN113537281B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110576652.XA CN113537281B (en) 2021-05-26 2021-05-26 Dimension reduction method for performing visual comparison on multiple high-dimension data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110576652.XA CN113537281B (en) 2021-05-26 2021-05-26 Dimension reduction method for performing visual comparison on multiple high-dimension data

Publications (2)

Publication Number Publication Date
CN113537281A CN113537281A (en) 2021-10-22
CN113537281B true CN113537281B (en) 2024-03-19

Family

ID=78094814

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110576652.XA Active CN113537281B (en) 2021-05-26 2021-05-26 Dimension reduction method for performing visual comparison on multiple high-dimension data

Country Status (1)

Country Link
CN (1) CN113537281B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106096066A (en) * 2016-08-17 2016-11-09 盐城工学院 The Text Clustering Method embedded based on random neighbor
CN108280236A (en) * 2018-02-28 2018-07-13 福州大学 A kind of random forest visualization data analysing method based on LargeVis
CN110188098A (en) * 2019-04-26 2019-08-30 浙江大学 A kind of high dimension vector data visualization method and system based on the double-deck anchor point figure projection optimization
CN111340685A (en) * 2020-02-14 2020-06-26 中国地质大学(武汉) Manifold dimension reduction method for remote sensing data processing
CN112163641A (en) * 2020-10-30 2021-01-01 浙江大学 High-dimensional data visualization method based on probability multi-level graph structure

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10127694B2 (en) * 2016-11-18 2018-11-13 Adobe Systems Incorporated Enhanced triplet embedding and triplet creation for high-dimensional data visualizations

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106096066A (en) * 2016-08-17 2016-11-09 盐城工学院 The Text Clustering Method embedded based on random neighbor
CN108280236A (en) * 2018-02-28 2018-07-13 福州大学 A kind of random forest visualization data analysing method based on LargeVis
CN110188098A (en) * 2019-04-26 2019-08-30 浙江大学 A kind of high dimension vector data visualization method and system based on the double-deck anchor point figure projection optimization
CN111340685A (en) * 2020-02-14 2020-06-26 中国地质大学(武汉) Manifold dimension reduction method for remote sensing data processing
CN112163641A (en) * 2020-10-30 2021-01-01 浙江大学 High-dimensional data visualization method based on probability multi-level graph structure

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Implicit Multidimensional Projection of Local Subspaces;Rongzheng Bian 等;IEEE;20201013;全文 *

Also Published As

Publication number Publication date
CN113537281A (en) 2021-10-22

Similar Documents

Publication Publication Date Title
CN111626218B (en) Image generation method, device, equipment and storage medium based on artificial intelligence
CN111199214B (en) Residual network multispectral image ground object classification method
Trappolini et al. Shape registration in the time of transformers
Jampani et al. The informed sampler: A discriminative approach to bayesian inference in generative computer vision models
US20140355899A1 (en) Video enhancement using related content
CN113095254B (en) Method and system for positioning key points of human body part
WO2021232609A1 (en) Semantic segmentation method and system for rgb-d image, medium and electronic device
CN112989116B (en) Video recommendation method, system and device
CN110648289A (en) Image denoising processing method and device
Cuevas et al. Evolutionary computation techniques: a comparative perspective
Meyerhenke et al. Drawing large graphs by multilevel maxent-stress optimization
CN110688897A (en) Pedestrian re-identification method and device based on joint judgment and generation learning
Yuan et al. Compositional scene representation learning via reconstruction: A survey
Vašata et al. Image inpainting using Wasserstein generative adversarial imputation network
CN113537281B (en) Dimension reduction method for performing visual comparison on multiple high-dimension data
CN112183303A (en) Transformer equipment image classification method and device, computer equipment and medium
Gong et al. A superpixel segmentation algorithm based on differential evolution
Venkatraman et al. Learning compositional structures for deep learning: Why routing-by-agreement is necessary
US20220180548A1 (en) Method and apparatus with object pose estimation
WO2022127603A1 (en) Model processing method and related device
CN114897884A (en) No-reference screen content image quality evaluation method based on multi-scale edge feature fusion
CN115205301A (en) Image segmentation method and device based on characteristic space multi-view analysis
Yang et al. Effective hybrid approach for protein structure prediction in a two-dimensional Hydrophobic–Polar model
CN110060343A (en) Map constructing method and system, server, computer-readable medium
Jiang et al. BYY harmony learning of t-mixtures with the application to image segmentation based on contourlet texture features

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant