CN108280236A - A kind of random forest visualization data analysing method based on LargeVis - Google Patents

A kind of random forest visualization data analysing method based on LargeVis Download PDF

Info

Publication number
CN108280236A
CN108280236A CN201810170150.5A CN201810170150A CN108280236A CN 108280236 A CN108280236 A CN 108280236A CN 201810170150 A CN201810170150 A CN 201810170150A CN 108280236 A CN108280236 A CN 108280236A
Authority
CN
China
Prior art keywords
random forest
largevis
data
feature
analysing method
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810170150.5A
Other languages
Chinese (zh)
Other versions
CN108280236B (en
Inventor
黄立勤
陈宋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fuzhou University
Original Assignee
Fuzhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fuzhou University filed Critical Fuzhou University
Priority to CN201810170150.5A priority Critical patent/CN108280236B/en
Publication of CN108280236A publication Critical patent/CN108280236A/en
Application granted granted Critical
Publication of CN108280236B publication Critical patent/CN108280236B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/26Visual data mining; Browsing structured data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Multimedia (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Analysis (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The present invention relates to a kind of, and the random forest based on LargeVis visualizes data analysing method.Training dataset pre-processes;Training dataset important feature is extracted by random forest;Dimension-reduction treatment is carried out using LargeVis;Random forest based on LargeVis carries out visualization processing.The present invention proposes a kind of visual data analysing method of the random forest based on LargeVis, for high dimensional data, the feature importance trained using random forest, form new secondary high dimensional data, the data after LargeVis dimensionality reductions are recycled, random forest is sent into and carries out forecast analysis and form visualization, nicety of grading can be improved, the visual time can be improved again, while adapting to different data.

Description

A kind of random forest visualization data analysing method based on LargeVis
Technical field
It is especially a kind of random gloomy based on LargeVis the present invention relates to pattern-recognition, machine learning, big data analysis Woods visualizes data analysing method.
Background technology
The dimension in big data epoch, data characteristics is higher and higher, and is divided data by the method for certain dimensionality reduction Analysis just becomes to be even more important, meanwhile, how to the research emphasis under high dimensional data visualization and current environment.Currently, most passing through The dimension reduction method of allusion quotation is that PCA (Principal Component Analysis) is not only to carry out dimensionality reduction to high dimensional data, more It is important that eliminating noise by dimensionality reduction, it was found that the pattern in data.PCA is the n original less m of feature number A feature substitution, new feature are the linear combination of old feature, these linear combinations maximize sample variance, make new m as possible Feature is orthogonal.The intrinsic variability in mapping capture data from old feature to new feature.Later, researcher proposes stream Shape learns, and increases visual research, the main algorithm of manifold learning, that is, Nonlinear Dimension Reduction has:ISOMap (Isometric Maps), LE (laplacian eigenmaps), LLE (being locally linear embedding into).The hypothesis of manifold learning:On data sampling Mr. Yu's Manifold.It is main Algorithm is wanted to have:ISOMap is a kind of non-iterative global optimization approach.ISOMap is to MDS (Multidimensional Scaling- Multidimensional Scalings) it is transformed, it uses geodesic curve distance (curve distance) as two point distance in space, is originally With Euclidean distance, be mapped on an Euclidean space to which the data in manifold will be tieed up positioned at certain.ISOMap connects data point Get up to constitute an adjacent Graph and carrys out discretely approximate original manifold, and geodesic distance is then accordingly by most short on Graph Path is come approximate.On this basis, recently, Maaten has write a paper and has been improved t-SNE algorithms again, uses The various algorithms based on tree, specifically include two parts content:First, using kNN figures to indicate the similar of higher dimensional space midpoint Property;It is gravitation and repulsion two parts by gradiometer point counting second is that optimizing the solution procedure of gradient, some has equally been used to optimize Skill.From said program it is found that the algorithm of various dimensionality reductions can reduce the number of predictive variable, last result can be provided One frame is explained.
Currently, the algorithm of t-SNE is widely applied in manifold learning, but have the following disadvantages:Processing is extensive high When dimension data, the efficiency of t-SNE significantly reduces (including improved algorithm);Parameter in t-SNE to different data collection more Sensitivity has mixed up parameter on a data set, has obtained a good effect of visualization, it has been found that cannot be in another number It is applicable according on collection, must take a significant amount of time and find suitable parameter, and this is very huge for the limitation of entire disaggregated model Greatly;Simple original high dimensional data is directly entered model training by dimensionality reduction mode and classifies, and precision is relatively low, the training time compared with It is more.Substantially all it is to carry out dimensionality reduction using initial data, and utilize existing model in addition, at present for the method for Data Dimensionality Reduction Classify, but this that there may be precision is not high, the data of dimensionality reduction do not have the problems such as explanatory.
The present invention proposes a kind of visual data analysis algorithm of the random forest based on LargeVis, for high dimension According to the feature importance trained using random forest forms new secondary high dimensional data, recycles the number after LargeVis dimensionality reductions According to feeding random forest carries out forecast analysis and forms visualization.Therefore, for the feature extraction of fetal heart frequency classification and visually The problem of change, the present invention propose new solution.
Invention content
The random forest that the purpose of the present invention is to provide a kind of based on LargeVis visualizes data analysing method, with gram Take defect existing in the prior art.
To achieve the above object, the technical scheme is that:A kind of random forest visualization number based on LargeVis According to analysis method, realize in accordance with the following steps:
Step S1:Training dataset pre-processes;
Step S2:The sample characteristics that training data concentrates proportion to be more than default gravity thresholds are extracted by random forest;
Step S3:Dimension-reduction treatment is carried out using LargeVis
Step S4:Random forest based on LargeVis carries out visualization processing.
In an embodiment of the present invention, in the step S1, data nonbalance processing is carried out using SMOTE methods, and It is replaced by using the number not having in median and data and carries out data outliers processing.
In an embodiment of the present invention, further include following steps in the step S2:
Step S21:According to a preliminary estimate and sort;
Step S211:To the characteristic variable in random forest according to VI descending sorts;
Step S212:Determine deletion ratio;20% is rejected from the characteristic variable that currently descending arranges is less than default ratio The characteristic variable of weight threshold value, to obtain a new feature set;
Step S213:New random forest is established with new feature set, and calculates the VI of each feature in feature set, side by side Sequence;
Step S214:Above step is repeated, until being left m feature;
Step S22:According to the random forest that each feature set and correspondence establishment that are obtained in step S21 are got up, calculating pair The outer error rate of bag answered, using the minimum feature set of error rate outside bag as finally selected feature set.
In an embodiment of the present invention, in the step S3, according to step S2 acquisitions as a result, passing through an accidental projection Tree obtains one and divides space, finds the k neighbours of each sample point on this basis, obtains a preliminary K arest neighbors;According to neighbour It is through, find potential neighbours using neighbor seaching algorithm, calculate neighbours and current point, the neighbours of neighbours and current point away from From, and be put into a bit of heap, k node for taking distance minimum obtains a final kNN figures as k neighbours.
In an embodiment of the present invention, for no weights network, y is usediAnd yjTwo points in expression lower dimensional space, two Point has a binary side e in the kNN figuresijProbability be:
P(eij=1)=f (‖ yi-yj2)
Wherein, the t distributions that f () was similar used in t-SNE,If yiAnd yjThe distance between more Small, 2 points have the probability on binary side larger in the kNN figures;If conversely, yiAnd yjThe distance between it is bigger, then 2 points in institute Stating has the probability on binary side smaller in kNN figures;
It is w for having weights network, side right valueijProbability be:
Entire optimization aim is to maximize the node of positive sample to there is the probability on connection side in the kNN figures, is minimized The node of negative sample in the kNN figures to there is the probability on connection side;Remember γ be negative sample side setting weights, then take one it is right Number, optimization aim become:
To each point i, according to a noise profile Pn(j) it randomly selects M point and constitutes negative sample, the noise point with i Cloth usesWherein djIt is for the degree namely object function of point j:
In an embodiment of the present invention, after by completing negative sampling and side sampling optimization, using asynchronous stochastic gradient descent It is trained.
In an embodiment of the present invention, the number of nodes in the time complexity and network of LargeVis is in a linear relationship.
In an embodiment of the present invention, in the step S4, according to the lower dimensional space data obtained, low-dimensional number is drawn out According to distribution map.
Compared to the prior art, the invention has the advantages that:
(1) present invention uses the method based on LargeVis, and first can improve the speed of service, second, for different Data set, which has, well adapts to ability, can effectively promote the performance of block mold.
(2) model can be explained using random forest in the present invention, first carries out a wheel feature extraction to data, reduces unnecessary Feature leaves important feature, forms new feature samples, and carry out dimensionality reduction, by after dimensionality reduction data input random forest into On the one hand row classification improves block mold performance, the data visualization after another aspect dimensionality reduction, more intuitively, for user For, it is explanatory stronger.
(3) model of the present invention only exists two basic models, but classification, visualization, dimensionality reduction, data prediction may be implemented And feature extraction, utilizability is stronger compared with other algorithms.
Description of the drawings
Fig. 1 is the flow chart that the random forest based on LargeVis visualizes data analysing method in the present invention.
Specific implementation mode
Below in conjunction with the accompanying drawings, technical scheme of the present invention is specifically described.
A kind of random forest based on LargeVis of the present invention visualizes data analysing method, real in accordance with the following steps It is existing:
Step S1:Training dataset pre-processes;
Step S2:The sample characteristics that training data concentrates proportion to be more than default gravity thresholds are extracted by random forest;
Step S3:Dimension-reduction treatment is carried out using LargeVis
Step S4:Random forest based on LargeVis carries out visualization processing.
In the present embodiment, due in practical applications, it may appear that the problem of data sample imbalance and exceptional value, this will Lead to bad classification results.Training dataset is uneven, can cause many problems in pattern-recognition.For example, if data Collection is uneven, then the accuracy rate highest that grader tends to " learn " sample of maximum ratio namely grader makes it is more inclined They are clustered in the high sample of ratio, and with highest precision.In practical applications, this prejudice is unacceptable 's.In order to realize being uniformly distributed for sample data, solve the problems, such as this using a small number of oversampling techniques of synthesis in this example, with Seldom sample is that each minority class creates " synthesis " example, and exceptional value is solved the problems, such as with median.
In the present embodiment, it is as follows that a small number of over-sampling algorithms are synthesized:
1. for each sample x in minority class, using Euclidean distance as criterion calculation it to owning in minority class sample set D The distance of sample obtains its k neighbour.
2. multiplying power N is sampled according to one oversampling ratio of sample imbalance ratio setting to determine, for each minority class Sample x randomly chooses several samples from its k neighbour, it is assumed that the neighbour selected is y.
The 3 neighbour y selected at random for each, build new sample according to following formula with original sample respectively
New samples=x+rand (0,1) * | x-y |
In the present embodiment, in pretreatment stage, training dataset imbalance can cause many problems in pattern-recognition. For example, if data nonbalance.In order to solve this problem, using SMOTE methods.Often occur asking for abnormal point in data Topic, when this can lead to training pattern, there is deviation in precision, therefore, using the number not having in median and data in the present embodiment Word is replaced, and carries out data outliers processing.
Further, the important feature stage is extracted in random forest, namely after the completion of random forest is trained, sample can be carried out Eigen ratio reorders, and extraction of the stage for the higher feature of proportion inside sample, further includes following steps:
1:According to a preliminary estimate and sort
A) to the characteristic variable in random forest according to VI (Variable Importance) descending sort.
B) it determines deletion ratio, 20% is rejected from the characteristic variable that currently descending arranges and is less than default gravity thresholds Characteristic variable, to obtain a new feature set.
C) new random forest is established with new feature set, and calculates the VI of each feature in feature set, and is sorted.
D) above step is repeated, until being left m feature.The m values are determined by entire model, preferably, by error rate Minimum feature set determines.
2:According to the random forest that each feature set obtained in step 1 is set up with them, calculate outside corresponding bag Error rate (OOB err), using the minimum feature set of error rate outside bag as finally selected feature set.
In the present embodiment, if high dimensional data, which is directly input to dimensionality reduction model, carries out dimensionality reduction, data processing time is long, Calculating parameter is excessive, may result in performance decline, and new feature collection is extracted using many decision tree Nearest Neighbor with Weighted Voting of random forest Data sample carry out dimensionality reduction, and classified using random forest, the accuracy and speed of entire model can be promoted.
Further, in the LargeVis dimension-reduction treatment stages:
Input:Pass through the data sample for the new feature that the random forest obtained by step S2 selects.
A space is obtained first with accidental projection tree to divide, and is found the k neighbours of each sample point on this basis, is obtained One does not tentatively require entirely accurate K arest neighbors (kNN, k-NearestNeighbor) figure.Further according to the thought that neighbour goes directly, Potential neighbours are found using neighbor seaching algorithm, neighbours is calculated and at a distance from current point, the neighbours of neighbours and current point and puts Among entering a rootlet heap, k node for taking distance minimum finally obtains an accurate kNN figure as k neighbours.
1. the case where for considering without weights network, using yiAnd yjIndicate two points in lower dimensional space, two points are in kNN There is a binary side e in figureijThe probability of (weights be 1 side) is:
P(eij=1)=f (‖ yi-yj2)
Wherein, the t distributions that f () was similar used in t-SNE,If yiAnd yjThe distance between more Small, 2 points have the probability on binary side larger in kNN figures;If conversely, yiAnd yjThe distance between it is bigger, then 2 points in kNN figures There is the probability on binary side smaller.
2. being w for having the case where weights network, definition side right valueijProbability be:
Entire optimization aim is exactly to maximize the node of positive sample to there is the probability on connection side in kNN figures, is minimized negative The node of sample in kNN figures to there is the probability on connection side.Wherein, γ is the weights for being unified for the setting of negative sample side.One is taken again A logarithm, optimization aim become:
Following formula uses all negative samplesCalculation amount is too big, is solved using negative sampling algorithm.To each point i, root According to a noise profile Pn(j) it randomly selects M point and constitutes negative sample with i, which uses and Mikolov et al. use The similar form of noise profile, i.e.,Wherein djFor the degree of point j.Redefinable object function at this time:
In the present embodiment, after using negative sampling and side sampling optimization, LargeVis has also been used under asynchronous stochastic gradient It drops to be trained.The technology is very effective on sparse graph, because of two sections that the side of different threads sampling is connected The few repetitions of point, conflict is hardly generated between different threads.From time complexity, each round stochastic gradient The time complexity of decline is O (sM), wherein M is negative sample number, and s is the dimension (2 or 3) of lower dimensional space, stochastic gradient Step number is usually again directly proportional to joint number amount N.Therefore, total time complexity is O (sM).So as to obtain, LargeVis's Time complexity is in a linear relationship with the number of nodes in network.
Further, the stage is visualized in the random forest based on LargeVis, draws the distribution map of low-dimensional data.
In the present embodiment, a data set is given, by carrying out feature extraction to raw ultrasound data, obtains non-dimensionality reduction Data, the data obtained at this time are still a high-dimensional data space, by LargeVis prevalence learning algorithms, obtain one Lower dimensional space data are visualized, the performance of Observable overall data.
In the present embodiment, algorithmic procedure mainly comprises the following steps:
Input:Data set { a1,a2,…an, random forest parameter ntree, mtry
Output:The distribution map of lower dimensional space data
In the present embodiment, as shown in Figure 1, detailed process is as follows:
1. initialization
2. reading in eigenmatrix
3. obtaining a space using accidental projection tree to divide, the k neighbours each put are found on this basis;It recycles adjacent It occupies searching algorithm and finds potential neighbours, calculate neighbours and at a distance from current point, the neighbours of neighbours and current point and be put into one Among rootlet heap, k node for taking distance minimum finally obtains an accurate kNN figure as k neighbours.
4.For(iin1:k)
4.1. asynchronous stochastic gradient descent is trained
4.2. time complexity is in a linear relationship with the number of nodes in network
5. calculated locally optimal solution show that data lower dimensional space indicates, draws out the distribution map of low-dimensional data
The above are preferred embodiments of the present invention, all any changes made according to the technical solution of the present invention, and generated function is made When with range without departing from technical solution of the present invention, all belong to the scope of protection of the present invention.

Claims (8)

1. a kind of random forest based on LargeVis visualizes data analysing method, which is characterized in that real in accordance with the following steps It is existing:
Step S1:Training dataset pre-processes;
Step S2:The sample characteristics that training data concentrates proportion to be more than default gravity thresholds are extracted by random forest;
Step S3:Dimension-reduction treatment is carried out using LargeVis
Step S4:Random forest based on LargeVis carries out visualization processing.
2. a kind of random forest based on LargeVis according to claim 1 visualizes data analysing method, feature It is, in the step S1, data nonbalance processing is carried out using SMOTE methods, and by using in median and data The number not having, which is replaced, carries out data outliers processing.
3. a kind of random forest based on LargeVis according to claim 1 visualizes data analysing method, feature It is, further includes following steps in the step S2:
Step S21:According to a preliminary estimate and sort;
Step S211:To the characteristic variable in random forest according to VI descending sorts;
Step S212:Determine deletion ratio;20% is rejected from the characteristic variable that currently descending arranges is less than default proportion threshold The characteristic variable of value, to obtain a new feature set;
Step S213:New random forest is established with new feature set, and calculates the VI of each feature in feature set, and is sorted;
Step S214:Above step is repeated, until being left m feature;
Step S22:According to the random forest that each feature set and correspondence establishment that are obtained in step S21 are got up, calculate corresponding The outer error rate of bag, using the minimum feature set of error rate outside bag as finally selected feature set.
4. a kind of random forest based on LargeVis according to claim 1 visualizes data analysing method, feature It is, in the step S3, according to step S2 acquisitions as a result, obtaining one by an accidental projection tree divides space, herein On the basis of find the k neighbours of each sample point, obtain a preliminary K arest neighbors;It is through according to neighbour, utilize neighbor seaching algorithm Potential neighbours are found, neighbours is calculated at a distance from current point, the neighbours of neighbours and current point, and be put into a bit of heap, takes K minimum node of distance obtains a final kNN figures as k neighbours.
5. a kind of random forest based on LargeVis according to claim 4 visualizes data analysing method, feature It is,
For no weights network, y is usediAnd yjIndicate that two points in lower dimensional space, two points have one two in the kNN figures First side eijProbability be:
P(eij=1)=f (‖ yi-yj2)
Wherein, the t distributions that f () was similar used in t-SNE,If yiAnd yjThe distance between it is smaller, two Point has the probability on binary side larger in the kNN figures;If conversely, yiAnd yjThe distance between it is bigger, then 2 points in the kNN There is the probability on binary side smaller in figure;
It is w for having weights network, side right valueijProbability be:
Entire optimization aim is to maximize the node of positive sample to there is the probability on connection side in the kNN figures, minimizes and bears sample This node in the kNN figures to there is the probability on connection side;Remember that γ is the weights of negative sample side setting, then take a logarithm, Optimization aim becomes:
To each point i, according to a noise profile Pn(j) it randomly selects M point and constitutes negative sample with i, which usesWherein djIt is for the degree namely object function of point j:
6. a kind of random forest based on LargeVis according to claim 5 visualizes data analysing method, feature It is, after completing negative sampling and side sampling optimization, is trained using asynchronous stochastic gradient descent.
7. a kind of random forest based on LargeVis according to claim 5 visualizes data analysing method, feature It is, the number of nodes in the time complexity and network of LargeVis is in a linear relationship.
8. a kind of random forest based on LargeVis according to claim 1 visualizes data analysing method, feature It is, in the step S4, according to the lower dimensional space data obtained, draws out the distribution map of low-dimensional data.
CN201810170150.5A 2018-02-28 2018-02-28 Method for analyzing random forest visual data based on LargeVis Active CN108280236B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810170150.5A CN108280236B (en) 2018-02-28 2018-02-28 Method for analyzing random forest visual data based on LargeVis

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810170150.5A CN108280236B (en) 2018-02-28 2018-02-28 Method for analyzing random forest visual data based on LargeVis

Publications (2)

Publication Number Publication Date
CN108280236A true CN108280236A (en) 2018-07-13
CN108280236B CN108280236B (en) 2022-03-15

Family

ID=62808852

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810170150.5A Active CN108280236B (en) 2018-02-28 2018-02-28 Method for analyzing random forest visual data based on LargeVis

Country Status (1)

Country Link
CN (1) CN108280236B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110491121A (en) * 2019-07-26 2019-11-22 同济大学 A kind of heterogeneity traffic accident causation analysis method and apparatus
CN111458145A (en) * 2020-03-30 2020-07-28 南京机电职业技术学院 Cable car rolling bearing fault diagnosis method based on road map characteristics
CN111783840A (en) * 2020-06-09 2020-10-16 苏宁金融科技(南京)有限公司 Visualization method and device for random forest model and storage medium
CN111815209A (en) * 2020-09-10 2020-10-23 上海冰鉴信息科技有限公司 Data dimension reduction method and device applied to wind control model
CN112397146A (en) * 2020-12-02 2021-02-23 广东美格基因科技有限公司 Microbial omics data interaction analysis system based on cloud platform
CN113537281A (en) * 2021-05-26 2021-10-22 山东大学 Dimension reduction method for carrying out visual comparison on multiple high-dimensional data
CN113792610A (en) * 2020-11-26 2021-12-14 上海智能制造功能平台有限公司 Harmonic reducer health assessment method and device

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160063308A1 (en) * 2014-08-29 2016-03-03 Definiens Ag Learning Pixel Visual Context from Object Characteristics to Generate Rich Semantic Images
CN106955097A (en) * 2017-03-31 2017-07-18 福州大学 A kind of fetal heart frequency state classification method
CN107169284A (en) * 2017-05-12 2017-09-15 北京理工大学 A kind of biomedical determinant attribute system of selection
CN107301331A (en) * 2017-07-20 2017-10-27 北京大学 A kind of method for digging of the sickness influence factor based on microarray data
CN107395590A (en) * 2017-07-19 2017-11-24 福州大学 A kind of intrusion detection method classified based on PCA and random forest
CN107607723A (en) * 2017-08-02 2018-01-19 兰州交通大学 A kind of protein-protein interaction assay method based on accidental projection Ensemble classifier

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160063308A1 (en) * 2014-08-29 2016-03-03 Definiens Ag Learning Pixel Visual Context from Object Characteristics to Generate Rich Semantic Images
CN106955097A (en) * 2017-03-31 2017-07-18 福州大学 A kind of fetal heart frequency state classification method
CN107169284A (en) * 2017-05-12 2017-09-15 北京理工大学 A kind of biomedical determinant attribute system of selection
CN107395590A (en) * 2017-07-19 2017-11-24 福州大学 A kind of intrusion detection method classified based on PCA and random forest
CN107301331A (en) * 2017-07-20 2017-10-27 北京大学 A kind of method for digging of the sickness influence factor based on microarray data
CN107607723A (en) * 2017-08-02 2018-01-19 兰州交通大学 A kind of protein-protein interaction assay method based on accidental projection Ensemble classifier

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
JIAN TANG,等: "Visualizing Large-scale and High-dimensional Data", 《WWW "16: PROCEEDINGS OF THE 25TH INTERNATIONAL CONFERENCE ON WORLD WIDE WEBAPRIL》 *

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110491121A (en) * 2019-07-26 2019-11-22 同济大学 A kind of heterogeneity traffic accident causation analysis method and apparatus
CN110491121B (en) * 2019-07-26 2022-04-05 同济大学 Heterogeneous traffic accident cause analysis method and equipment
CN111458145A (en) * 2020-03-30 2020-07-28 南京机电职业技术学院 Cable car rolling bearing fault diagnosis method based on road map characteristics
CN111783840A (en) * 2020-06-09 2020-10-16 苏宁金融科技(南京)有限公司 Visualization method and device for random forest model and storage medium
CN111815209A (en) * 2020-09-10 2020-10-23 上海冰鉴信息科技有限公司 Data dimension reduction method and device applied to wind control model
CN113792610A (en) * 2020-11-26 2021-12-14 上海智能制造功能平台有限公司 Harmonic reducer health assessment method and device
CN113792610B (en) * 2020-11-26 2024-05-31 上海智能制造功能平台有限公司 Health assessment method and device for harmonic reducer
CN112397146A (en) * 2020-12-02 2021-02-23 广东美格基因科技有限公司 Microbial omics data interaction analysis system based on cloud platform
CN112397146B (en) * 2020-12-02 2021-08-24 广东美格基因科技有限公司 Microbial omics data interaction analysis system based on cloud platform
CN113537281A (en) * 2021-05-26 2021-10-22 山东大学 Dimension reduction method for carrying out visual comparison on multiple high-dimensional data
CN113537281B (en) * 2021-05-26 2024-03-19 山东大学 Dimension reduction method for performing visual comparison on multiple high-dimension data

Also Published As

Publication number Publication date
CN108280236B (en) 2022-03-15

Similar Documents

Publication Publication Date Title
CN108280236A (en) A kind of random forest visualization data analysing method based on LargeVis
CN106096727B (en) A kind of network model building method and device based on machine learning
CN103559504B (en) Image target category identification method and device
CN109615014B (en) KL divergence optimization-based 3D object data classification system and method
CN107292350A (en) The method for detecting abnormality of large-scale data
CN110135494A (en) Feature selection method based on maximum information coefficient and Gini index
CN107066555B (en) On-line theme detection method for professional field
CN108280472A (en) A kind of density peak clustering method optimized based on local density and cluster centre
CN107832456B (en) Parallel KNN text classification method based on critical value data division
CN112861752B (en) DCGAN and RDN-based crop disease identification method and system
CN110598061A (en) Multi-element graph fused heterogeneous information network embedding method
CN111209939A (en) SVM classification prediction method with intelligent parameter optimization module
CN111259964A (en) Over-sampling method for unbalanced data set
CN111814979B (en) Fuzzy set automatic dividing method based on dynamic programming
Ibrahim et al. On feature selection methods for accurate classification and analysis of emphysema ct images
CN116759067A (en) Liver disease diagnosis method based on reconstruction and Tabular data
CN116030231A (en) Multistage classification BIM model intelligent light-weight processing method
CN113537339B (en) Method and system for identifying symbiotic or associated minerals based on multi-label image classification
CN109871894A (en) A kind of Method of Data Discretization of combination forest optimization and rough set
CN115017988A (en) Competitive clustering method for state anomaly diagnosis
Ma The Research of Stock Predictive Model based on the Combination of CART and DBSCAN
CN114626485A (en) Data tag classification method and device based on improved KNN algorithm
Mishra et al. Efficient intelligent framework for selection of initial cluster centers
CN108932550B (en) Method for classifying images based on fuzzy dense sparse dense algorithm
CN112308160A (en) K-means clustering artificial intelligence optimization algorithm

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant