CN108280236B - Method for analyzing random forest visual data based on LargeVis - Google Patents

Method for analyzing random forest visual data based on LargeVis Download PDF

Info

Publication number
CN108280236B
CN108280236B CN201810170150.5A CN201810170150A CN108280236B CN 108280236 B CN108280236 B CN 108280236B CN 201810170150 A CN201810170150 A CN 201810170150A CN 108280236 B CN108280236 B CN 108280236B
Authority
CN
China
Prior art keywords
largevis
data
random forest
points
random
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810170150.5A
Other languages
Chinese (zh)
Other versions
CN108280236A (en
Inventor
黄立勤
陈宋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fuzhou University
Original Assignee
Fuzhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fuzhou University filed Critical Fuzhou University
Priority to CN201810170150.5A priority Critical patent/CN108280236B/en
Publication of CN108280236A publication Critical patent/CN108280236A/en
Application granted granted Critical
Publication of CN108280236B publication Critical patent/CN108280236B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/26Visual data mining; Browsing structured data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Multimedia (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to a method for analyzing random forest visual data based on LargeVis. Preprocessing a training data set; extracting important features of a training data set through random forests; performing dimensionality reduction treatment by adopting LargeVis; and performing visualization processing on the random forest based on the LargeVis. The invention provides a data analysis method of random forest visualization based on LargeVis, aiming at high-dimensional data, forming new secondary high-dimensional data by using the characteristic importance trained by random forests, and sending the data subjected to LargeVis dimension reduction into random forests for predictive analysis and visualization, so that the classification precision can be improved, the visualization time can be improved, and different data can be adapted.

Description

Method for analyzing random forest visual data based on LargeVis
Technical Field
The invention relates to pattern recognition, machine learning and big data analysis, in particular to a method for analyzing random forest visual data based on LargeVis.
Background
In the big data era, the dimensionality of data features is higher and higher, the analysis of data by a certain dimensionality reduction method becomes particularly important, and meanwhile, how to visualize high-dimensional data is also a research focus in the current environment. At present, the most classical dimensionality reduction method is that pca (principal Component analysis) not only reduces the dimensionality of high-dimensional data, but also removes noise through dimensionality reduction and discovers a mode in the data. PCA replaces the original n features with a smaller number of m features, the new features being linear combinations of the old ones, these linear combinations maximizing the sample variance, trying to make the new m features uncorrelated with each other. The mapping from old features to new features captures the inherent variability in the data. Then, researchers propose manifold learning, which is a main algorithm of nonlinear dimension reduction, and the visualization research is added, wherein the main algorithm of manifold learning is as follows: ISOMap (isometric mapping), LE (laplace feature mapping), LLE (locally linear embedding). Assumptions for manifold learning: data is sampled on a certain manifold. The main algorithms are as follows: ISOMap is a non-iterative global optimization algorithm. The ISOMap modifies MDS (Multidimensional Scaling-Multidimensional Scaling), geodesic distance (curve distance) is used as the distance between two points in space, and the original Euclidean distance is used, so that data on a certain dimensional manifold is mapped to an Euclidean space. ISOMap connects data points to form an adjacent Graph to discretely approximate the original manifold, and geodesic distances are correspondingly approximated by the shortest path on Graph. On the basis, Maaten has recently written a paper to improve the t-SNE algorithm, and various tree-based algorithms are used, which specifically include two parts: firstly, a kNN graph is adopted to represent the similarity of points in a high-dimensional space; and secondly, the solving process of the gradient is optimized, the gradient calculation is divided into two parts of attraction force and repulsion force, and some optimization skills are used. According to the scheme, various dimensionality reduction algorithms can reduce the number of the predictive variables, and can provide a frame explanation for the final result.
At present, the t-SNE algorithm in manifold learning is widely applied, but the following defects exist: when large-scale high-dimensional data is processed, the efficiency of t-SNE is remarkably reduced (including improved algorithm); the parameters in the t-SNE are sensitive to different data sets, the parameters are adjusted on one data set, a good visualization effect is obtained, the method is found to be not suitable for the other data set, a large amount of time is spent on searching for proper parameters, and the limitation on the whole classification model is very large; pure original high-dimensional data directly enter a model for training and classification through a dimension reduction mode, the precision is low, and the training time is long. In addition, at present, the dimension reduction method of data generally uses original data to reduce the dimension and uses the existing model to classify, but this may have the problems of low precision, no interpretability of the dimension reduced data, and the like.
The invention provides a random forest visualization data analysis algorithm based on LargeVis, aiming at high-dimensional data, forming new secondary high-dimensional data by using the characteristic importance trained by random forests, and sending the data subjected to LargeVis dimension reduction into random forests for predictive analysis and visualization. Therefore, the invention provides a new solution to the problem of feature extraction classification and visualization of the fetal heart rate.
Disclosure of Invention
The invention aims to provide a method for analyzing random forest visual data based on LargeVis, which is used for overcoming the defects in the prior art.
In order to achieve the purpose, the technical scheme of the invention is as follows: a method for analyzing random forest visual data based on LargeVis is realized according to the following steps:
step S1: preprocessing a training data set;
step S2: extracting sample characteristics with the specific gravity larger than a preset specific gravity threshold value in the training data set through random forests;
step S3: adopting LargeVis to carry out dimensionality reduction treatment
Step S4: and performing visualization processing on the random forest based on the LargeVis.
In an embodiment of the present invention, in the step S1, a SMOTE method is adopted to perform data imbalance processing, and data abnormal value processing is performed by replacing with a median and an unused number in the data.
In an embodiment of the present invention, in the step S2, the method further includes the following steps:
step S21: preliminary estimation and sorting;
step S211: sorting the characteristic variables in the random forest according to the VI descending order;
step S212: determining a deletion ratio; removing 20% of characteristic variables smaller than a preset specific gravity threshold value from the characteristic variables which are arranged in a descending order at present so as to obtain a new characteristic set;
step S213: establishing a new random forest by using the new feature set, calculating the VI of each feature in the feature set, and sequencing;
step S214: repeating the steps until m characteristics are left;
step S22: and calculating the corresponding out-of-bag error rate according to each feature set obtained in the step S21 and the random forest correspondingly established, and taking the feature set with the lowest out-of-bag error rate as the finally selected feature set.
In an embodiment of the present invention, in the step S3, according to the result obtained in the step S2, a partition space is obtained through a random projection tree, and on the basis of the partition space, K neighbors of each sample point are searched to obtain a preliminary K nearest neighbor; and (4) searching potential neighbors by using a neighbor search algorithm according to the direct neighbor, calculating the distances between the neighbors and the current point and between the neighbors and the current point, putting the distances into a small root heap, and taking k nodes with the minimum distances as k neighbors to obtain a final kNN graph.
In one embodiment of the invention, for an unweighted network, y is usediAnd yjRepresenting two points in a low-dimensional space, two points having a binary edge e in the kNN graphijThe probability of (c) is:
P(eij=1)=f(‖yi-yj2)
wherein f (-) is similarly used for the t distribution in the t-SNE,
Figure BDA0001584836010000031
if yiAnd yjThe smaller the distance between the two points is, the higher the probability that the two points have binary edges in the kNN graph is; on the contrary, if yiAnd yjThe larger the distance between the two points is, the smaller the probability that the two points have binary edges in the kNN graph is;
for a weighted network, the edge weight is wijThe probability of (c) is:
Figure BDA0001584836010000035
the whole optimization target is to maximize the probability that the node pairs of the positive samples have connecting edges in the kNN graph and minimize the probability that the node pairs of the negative samples have connecting edges in the kNN graph; and recording gamma as a weight value set by the negative sample side, and taking a logarithm, wherein the optimization target is changed into:
Figure BDA0001584836010000032
for each point i, according to a noise distribution Pn(j) Randomly selecting M points and i to form a negative sample, and adopting the noise distribution
Figure BDA0001584836010000033
Wherein d isjIn degrees at point j, the objective function is:
Figure BDA0001584836010000034
in an embodiment of the invention, after negative sampling and side sampling optimization are completed, asynchronous random gradient descent is adopted for training.
In an embodiment of the present invention, the time complexity of the LargeVis is linear with the number of nodes in the network.
In an embodiment of the invention, in the step S4, a distribution map of the low dimensional data is drawn according to the obtained low dimensional spatial data.
Compared with the prior art, the invention has the following beneficial effects:
(1) the method based on the LargeVis can improve the operation speed firstly, has good adaptability to different data sets and can effectively improve the performance of the whole model.
(2) According to the invention, a random forest interpretable model is adopted, a round of feature extraction is firstly carried out on data, unnecessary features are reduced, important features are left, a new feature sample is formed, dimension reduction is carried out, the data after dimension reduction is input into the random forest for classification, on one hand, the performance of the whole model is improved, on the other hand, the data after dimension reduction is visual and more intuitive, and the interpretability is stronger for a user.
(3) The model of the invention only has two basic models, but can realize classification, visualization, dimension reduction, data preprocessing and feature extraction, and has stronger availability compared with other algorithms.
Drawings
FIG. 1 is a flow chart of a method for analyzing random forest visual data based on LargeVis in the invention.
Detailed Description
The technical scheme of the invention is specifically explained below with reference to the accompanying drawings.
The invention relates to a method for analyzing random forest visual data based on LargeVis, which is realized according to the following steps:
step S1: preprocessing a training data set;
step S2: extracting sample characteristics with the specific gravity larger than a preset specific gravity threshold value in the training data set through random forests;
step S3: adopting LargeVis to carry out dimensionality reduction treatment
Step S4: and performing visualization processing on the random forest based on the LargeVis.
In this embodiment, since in practical applications, problems of data sample imbalance and abnormal values may occur, this may result in a poor classification result. The training data set is unbalanced, which can cause many problems in pattern recognition. For example, if the data set is unbalanced, the classifier tends to "learn" the largest proportion of samples, i.e., the classifier makes its accuracy the highest more biased towards the higher proportion of samples and clusters them with the highest accuracy. In practical applications, this prejudice is not acceptable. To achieve a uniform distribution of sample data, this problem is solved in this example by using a synthetic minority oversampling technique, creating a "synthetic" instance for each minority class with few samples, and solving the problem of outliers with a median.
In this embodiment, a few oversampling algorithms are synthesized as follows:
1. for each sample x in the minority class, calculating the distance from the sample x to all samples in the minority class sample set D by using the Euclidean distance as a standard to obtain the k neighbor of the sample x.
2. And setting a sampling ratio according to the sample imbalance ratio to determine a sampling multiplying factor N, and randomly selecting a plurality of samples from k neighbors of each few class sample x, wherein the selected neighbors are assumed to be y.
3 for each randomly selected neighbor y, respectively constructing a new sample with the original sample according to the following formula
New sample x + rand (0,1) | x-y-
In this embodiment, the imbalance of the training data set during the preprocessing stage causes many problems in pattern recognition. For example, if the data is not balanced. To solve this problem, the SMOTE method is adopted. The problem of abnormal points often appears in the data, which causes deviation of precision when training the model, so in this embodiment, the median and the number not existing in the data are adopted for replacement, and the data abnormal value is processed.
Further, in a stage of extracting important features from random forests, namely after training of the random forests is completed, sample feature proportion sorting is performed, and the stage is used for extracting features with higher proportion in the samples, and the method further comprises the following steps:
1: preliminary estimation and ranking
a) The feature variables in the random forest are sorted in descending order by VI (variable import).
b) Determining the deletion proportion, and removing 20% of the characteristic variables smaller than a preset specific gravity threshold value from the current characteristic variables which are arranged in a descending order, thereby obtaining a new characteristic set.
c) And establishing a new random forest by using the new feature set, and calculating and sequencing the VI of each feature in the feature set.
d) The above steps are repeated until m features remain. The m value is determined by the entire model, preferably the set of features with the lowest error rate.
2: and (3) calculating a corresponding out-of-bag error rate (OOB err) according to each feature set obtained in the step (1) and the random forest established by the feature sets, and taking the feature set with the lowest out-of-bag error rate as a final selected feature set.
In this embodiment, if high-dimensional data is directly input to the dimensionality reduction model for dimensionality reduction, the data processing time is too long, the calculation parameters are too many, and performance may be reduced.
Further, in the LargeVis dimensionality reduction processing stage:
inputting: data samples of new features selected by the random forest obtained through step S2.
Firstly, a space division is obtained by utilizing a random projection tree, and a K nearest neighbor (kNN, K-nearest neighbor) graph which does not require complete accuracy is obtained on the basis of searching the K nearest neighbor of each sample point. And then searching potential neighbors by using a neighbor search algorithm according to the idea of neighbor direct, calculating the distances between the neighbors and the current point and between the neighbors and the current point, putting the distances into a small root heap, and taking k nodes with the minimum distances as k neighbors to finally obtain an accurate kNN graph.
1. For the case of considering the unweighted network, use yiAnd yjRepresenting two points in a low-dimensional space, the two points having a binary edge e in the kNN graphijThe probability (edge with weight 1) is:
P(eij=1)=f(‖yi-yj2)
wherein f (-) is similarly used for the t distribution in the t-SNE,
Figure BDA0001584836010000064
if yiAnd yjThe smaller the distance between the two points is, the higher the probability that the two points have binary edges in the kNN graph is; on the contrary, if yiAnd yjThe larger the distance between the two points, the smaller the probability that two points have a binary edge in the kNN graph.
2. For the case of weighted networks, define the edge weight as wijThe probability of (c) is:
Figure BDA0001584836010000065
the whole optimization goal is to maximize the probability that the node pairs of the positive samples have connecting edges in the kNN graph, and minimize the probability that the node pairs of the negative samples have connecting edges in the kNN graph. Wherein γ is a weight value set for the negative sample side in a unified manner. Taking another logarithm, the optimization objective becomes:
Figure BDA0001584836010000061
the following formula uses all negative examples
Figure BDA0001584836010000062
The calculated amount is too large and is solved by using a negative sampling algorithm. For each point i, according to a noise distribution Pn(j) Randomly selecting M points and i to form a negative sample, the noise distribution taking a form similar to that used by Mikolov et al, i.e.
Figure BDA0001584836010000063
Wherein d isjIn degrees of point j. The objective function may now be redefined:
Figure BDA0001584836010000071
in this embodiment, the LargeVis is also trained with asynchronous random gradient descent after optimization with negative sampling and side sampling. This technique is very effective on sparse graphs because the two nodes connected by edges sampled by different threads are rarely duplicated and there is little conflict between different threads. From the aspect of time complexity, the time complexity of each round of descending random gradient is o (sm), where M is the number of negative samples, s is the dimension of the low-dimensional space (2 or 3), and the number of steps of the random gradient is generally proportional to the number of nodes N. Thus, the total temporal complexity is o (sm). It can be concluded that the time complexity of LargeVis is linear with the number of nodes in the network.
Further, in a random forest visualization stage based on the LargeVis, a distribution diagram of low-dimensional data is drawn.
In the embodiment, a data set is given, the original ultrasonic data is subjected to feature extraction to obtain data without dimension reduction, the obtained data is still a high-dimensional data space, a low-dimensional data is obtained through a LargeVis popular learning algorithm to be visualized, and the overall data performance can be observed.
In this embodiment, the algorithm process mainly includes the steps of:
inputting: data set { a1,a2,…an}, random forest parameters ntree, mtry
And (3) outputting: distribution map of low dimensional spatial data
In this embodiment, as shown in fig. 1, the specific process is as follows:
1. initialization
2. Read-in feature matrix
3. Obtaining a space division by adopting a random projection tree, and searching k neighbors of each point on the basis; and then searching potential neighbors by using a neighbor search algorithm, calculating the distances between the neighbors and the current point and between the neighbors and the current point, putting the distances into a small root heap, and taking k nodes with the minimum distances as k neighbors to finally obtain an accurate kNN graph.
4.For(iin1:k)
4.1. Asynchronous random gradient descent to train
4.2. The time complexity is linear with the number of nodes in the network
5. The calculated local optimal solution obtains the low-dimensional space representation of the data, and the distribution diagram of the low-dimensional data is drawn
The above are preferred embodiments of the present invention, and all changes made according to the technical scheme of the present invention that produce functional effects do not exceed the scope of the technical scheme of the present invention belong to the protection scope of the present invention.

Claims (6)

1. A method for analyzing random forest visual data based on LargeVis is characterized by comprising the following steps:
step S1: preprocessing a training data set;
step S2: extracting sample characteristics with the specific gravity larger than a preset specific gravity threshold value in the training data set through random forests;
in step S2, the method further includes the steps of:
step S21: preliminary estimation and sorting;
step S211: sorting the characteristic variables in the random forest according to the VI descending order;
step S212: determining a deletion ratio; removing 20% of characteristic variables smaller than a preset specific gravity threshold value from the characteristic variables which are arranged in a descending order at present so as to obtain a new characteristic set;
step S213: establishing a new random forest by using the new feature set, calculating the VI of each feature in the feature set, and sequencing;
step S214: repeating the steps until m characteristics are left;
step S22: calculating a corresponding out-of-bag error rate according to each feature set obtained in the step S21 and the random forest correspondingly established, and taking the feature set with the lowest out-of-bag error rate as a finally selected feature set;
step S3: performing dimensionality reduction treatment by adopting LargeVis;
in step S3, according to the result obtained in step S2, a partition space is obtained through a random projection tree, and on the basis of the partition space, K neighbors of each sample point are searched to obtain a preliminary K nearest neighbor; according to the direct neighbor, searching potential neighbors by using a neighbor search algorithm, calculating the distances between the neighbors and the current point and between the neighbors and the current point, putting the distances into a small root stack, and taking k nodes with the minimum distances as k neighbors to obtain a final kNN graph;
step S4: and performing visualization processing on the random forest based on the LargeVis.
2. The method as claimed in claim 1, wherein in step S1, SMOTE method is used to perform data imbalance processing, and data outlier processing is performed by using median and the number not existing in the data to replace.
3. The method for analyzing random forest visual data based on LargeVis according to claim 1,
for weightless networks, use yiAnd yjRepresenting two points in a low-dimensional space, two points having in the kNN graphA binary edge eijThe probability of (c) is:
P(eij=1)=f(‖yi-yj2)
wherein f (-) is similarly used for the t distribution in the t-SNE,
Figure FDA0003354561490000021
if yiAnd yjThe smaller the distance between the two points is, the higher the probability that the two points have binary edges in the kNN graph is; on the contrary, if yiAnd yjThe larger the distance between the two points is, the smaller the probability that the two points have binary edges in the kNN graph is;
for a weighted network, the edge weight is wijThe probability of (c) is:
Figure FDA0003354561490000022
the whole optimization target is to maximize the probability that the node pairs of the positive samples have connecting edges in the kNN graph and minimize the probability that the node pairs of the negative samples have connecting edges in the kNN graph; and recording gamma as a weight value set by the negative sample side, and taking a logarithm, wherein the optimization target is changed into:
Figure FDA0003354561490000023
for each point i, according to a noise distribution Pn(j) Randomly selecting M points and i to form a negative sample, and adopting the noise distribution
Figure FDA0003354561490000024
Wherein d isjIn degrees at point j, the objective function is:
Figure FDA0003354561490000025
4. the method as claimed in claim 3, wherein the training is performed by asynchronous random gradient descent after the optimization of negative sampling and side sampling is completed.
5. A method for analyzing random forest visual data based on LargeVis as claimed in claim 3, wherein the time complexity of LargeVis is in linear relation with the number of nodes in the network.
6. The method for analyzing random forest visual data based on LargeVis as claimed in claim 1, wherein in step S4, a distribution map of low dimensional data is drawn according to the obtained low dimensional spatial data.
CN201810170150.5A 2018-02-28 2018-02-28 Method for analyzing random forest visual data based on LargeVis Active CN108280236B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810170150.5A CN108280236B (en) 2018-02-28 2018-02-28 Method for analyzing random forest visual data based on LargeVis

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810170150.5A CN108280236B (en) 2018-02-28 2018-02-28 Method for analyzing random forest visual data based on LargeVis

Publications (2)

Publication Number Publication Date
CN108280236A CN108280236A (en) 2018-07-13
CN108280236B true CN108280236B (en) 2022-03-15

Family

ID=62808852

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810170150.5A Active CN108280236B (en) 2018-02-28 2018-02-28 Method for analyzing random forest visual data based on LargeVis

Country Status (1)

Country Link
CN (1) CN108280236B (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110491121B (en) * 2019-07-26 2022-04-05 同济大学 Heterogeneous traffic accident cause analysis method and equipment
CN111458145A (en) * 2020-03-30 2020-07-28 南京机电职业技术学院 Cable car rolling bearing fault diagnosis method based on road map characteristics
CN111783840A (en) * 2020-06-09 2020-10-16 苏宁金融科技(南京)有限公司 Visualization method and device for random forest model and storage medium
CN111815209A (en) * 2020-09-10 2020-10-23 上海冰鉴信息科技有限公司 Data dimension reduction method and device applied to wind control model
CN113792610B (en) * 2020-11-26 2024-05-31 上海智能制造功能平台有限公司 Health assessment method and device for harmonic reducer
CN112397146B (en) * 2020-12-02 2021-08-24 广东美格基因科技有限公司 Microbial omics data interaction analysis system based on cloud platform
CN113537281B (en) * 2021-05-26 2024-03-19 山东大学 Dimension reduction method for performing visual comparison on multiple high-dimension data

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106955097A (en) * 2017-03-31 2017-07-18 福州大学 A kind of fetal heart frequency state classification method
CN107169284A (en) * 2017-05-12 2017-09-15 北京理工大学 A kind of biomedical determinant attribute system of selection
CN107301331A (en) * 2017-07-20 2017-10-27 北京大学 A kind of method for digging of the sickness influence factor based on microarray data
CN107395590A (en) * 2017-07-19 2017-11-24 福州大学 A kind of intrusion detection method classified based on PCA and random forest
CN107607723A (en) * 2017-08-02 2018-01-19 兰州交通大学 A kind of protein-protein interaction assay method based on accidental projection Ensemble classifier

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9740957B2 (en) * 2014-08-29 2017-08-22 Definiens Ag Learning pixel visual context from object characteristics to generate rich semantic images

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106955097A (en) * 2017-03-31 2017-07-18 福州大学 A kind of fetal heart frequency state classification method
CN107169284A (en) * 2017-05-12 2017-09-15 北京理工大学 A kind of biomedical determinant attribute system of selection
CN107395590A (en) * 2017-07-19 2017-11-24 福州大学 A kind of intrusion detection method classified based on PCA and random forest
CN107301331A (en) * 2017-07-20 2017-10-27 北京大学 A kind of method for digging of the sickness influence factor based on microarray data
CN107607723A (en) * 2017-08-02 2018-01-19 兰州交通大学 A kind of protein-protein interaction assay method based on accidental projection Ensemble classifier

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Visualizing Large-scale and High-dimensional Data;Jian Tang,等;《WWW "16: Proceedings of the 25th International Conference on World Wide WebApril》;20160430;第287-297页 *

Also Published As

Publication number Publication date
CN108280236A (en) 2018-07-13

Similar Documents

Publication Publication Date Title
CN108280236B (en) Method for analyzing random forest visual data based on LargeVis
CN103559504B (en) Image target category identification method and device
CN107229757B (en) Video retrieval method based on deep learning and Hash coding
Fogel et al. Clustering-driven deep embedding with pairwise constraints
CN106648654A (en) Data sensing-based Spark configuration parameter automatic optimization method
CN109753589A (en) A kind of figure method for visualizing based on figure convolutional network
Kpotufe et al. A tree-based regressor that adapts to intrinsic dimension
CN103942571B (en) Graphic image sorting method based on genetic programming algorithm
CN102799614B (en) Image search method based on space symbiosis of visual words
CN111125469B (en) User clustering method and device of social network and computer equipment
Li et al. Feature statistics guided efficient filter pruning
CN106886569A (en) A kind of ML KNN multi-tag Chinese Text Categorizations based on MPI
Nayini et al. A novel threshold-based clustering method to solve K-means weaknesses
CN114299362A (en) Small sample image classification method based on k-means clustering
Ahmed et al. Branchconnect: Image categorization with learned branch connections
Dhoot et al. Efficient Dimensionality Reduction for Big Data Using Clustering Technique
Kouzani et al. Face classification by a random forest
Purnawansyah et al. K-Means clustering implementation in network traffic activities
CN112465054B (en) FCN-based multivariate time series data classification method
Zhang et al. Color clustering using self-organizing maps
JP6230501B2 (en) Reduced feature generation apparatus, information processing apparatus, method, and program
Jia A study on the improvement of K-means algorithm based on community discovery
Pullissery et al. Application of Feature Selection Methods for Improving Classifcation Accuracy and Run-Time: A Comparison of Performance on Real-World Datasets
Nonaka et al. Graph-based Deep Learning Analysis and Instance Selection
Mikhailov An indexing-based approach to pattern and video clip recognition

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant