CN108280236A - A kind of random forest visualization data analysing method based on LargeVis - Google Patents
A kind of random forest visualization data analysing method based on LargeVis Download PDFInfo
- Publication number
- CN108280236A CN108280236A CN201810170150.5A CN201810170150A CN108280236A CN 108280236 A CN108280236 A CN 108280236A CN 201810170150 A CN201810170150 A CN 201810170150A CN 108280236 A CN108280236 A CN 108280236A
- Authority
- CN
- China
- Prior art keywords
- random forest
- largevis
- data
- feature
- analysing method
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/26—Visual data mining; Browsing structured data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- General Engineering & Computer Science (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Computation (AREA)
- Evolutionary Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Artificial Intelligence (AREA)
- Multimedia (AREA)
- Life Sciences & Earth Sciences (AREA)
- Image Analysis (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The present invention relates to a kind of, and the random forest based on LargeVis visualizes data analysing method.Training dataset pre-processes;Training dataset important feature is extracted by random forest;Dimension-reduction treatment is carried out using LargeVis;Random forest based on LargeVis carries out visualization processing.The present invention proposes a kind of visual data analysing method of the random forest based on LargeVis, for high dimensional data, the feature importance trained using random forest, form new secondary high dimensional data, the data after LargeVis dimensionality reductions are recycled, random forest is sent into and carries out forecast analysis and form visualization, nicety of grading can be improved, the visual time can be improved again, while adapting to different data.
Description
Technical field
It is especially a kind of random gloomy based on LargeVis the present invention relates to pattern-recognition, machine learning, big data analysis
Woods visualizes data analysing method.
Background technology
The dimension in big data epoch, data characteristics is higher and higher, and is divided data by the method for certain dimensionality reduction
Analysis just becomes to be even more important, meanwhile, how to the research emphasis under high dimensional data visualization and current environment.Currently, most passing through
The dimension reduction method of allusion quotation is that PCA (Principal Component Analysis) is not only to carry out dimensionality reduction to high dimensional data, more
It is important that eliminating noise by dimensionality reduction, it was found that the pattern in data.PCA is the n original less m of feature number
A feature substitution, new feature are the linear combination of old feature, these linear combinations maximize sample variance, make new m as possible
Feature is orthogonal.The intrinsic variability in mapping capture data from old feature to new feature.Later, researcher proposes stream
Shape learns, and increases visual research, the main algorithm of manifold learning, that is, Nonlinear Dimension Reduction has:ISOMap (Isometric Maps),
LE (laplacian eigenmaps), LLE (being locally linear embedding into).The hypothesis of manifold learning:On data sampling Mr. Yu's Manifold.It is main
Algorithm is wanted to have:ISOMap is a kind of non-iterative global optimization approach.ISOMap is to MDS (Multidimensional
Scaling- Multidimensional Scalings) it is transformed, it uses geodesic curve distance (curve distance) as two point distance in space, is originally
With Euclidean distance, be mapped on an Euclidean space to which the data in manifold will be tieed up positioned at certain.ISOMap connects data point
Get up to constitute an adjacent Graph and carrys out discretely approximate original manifold, and geodesic distance is then accordingly by most short on Graph
Path is come approximate.On this basis, recently, Maaten has write a paper and has been improved t-SNE algorithms again, uses
The various algorithms based on tree, specifically include two parts content:First, using kNN figures to indicate the similar of higher dimensional space midpoint
Property;It is gravitation and repulsion two parts by gradiometer point counting second is that optimizing the solution procedure of gradient, some has equally been used to optimize
Skill.From said program it is found that the algorithm of various dimensionality reductions can reduce the number of predictive variable, last result can be provided
One frame is explained.
Currently, the algorithm of t-SNE is widely applied in manifold learning, but have the following disadvantages:Processing is extensive high
When dimension data, the efficiency of t-SNE significantly reduces (including improved algorithm);Parameter in t-SNE to different data collection more
Sensitivity has mixed up parameter on a data set, has obtained a good effect of visualization, it has been found that cannot be in another number
It is applicable according on collection, must take a significant amount of time and find suitable parameter, and this is very huge for the limitation of entire disaggregated model
Greatly;Simple original high dimensional data is directly entered model training by dimensionality reduction mode and classifies, and precision is relatively low, the training time compared with
It is more.Substantially all it is to carry out dimensionality reduction using initial data, and utilize existing model in addition, at present for the method for Data Dimensionality Reduction
Classify, but this that there may be precision is not high, the data of dimensionality reduction do not have the problems such as explanatory.
The present invention proposes a kind of visual data analysis algorithm of the random forest based on LargeVis, for high dimension
According to the feature importance trained using random forest forms new secondary high dimensional data, recycles the number after LargeVis dimensionality reductions
According to feeding random forest carries out forecast analysis and forms visualization.Therefore, for the feature extraction of fetal heart frequency classification and visually
The problem of change, the present invention propose new solution.
Invention content
The random forest that the purpose of the present invention is to provide a kind of based on LargeVis visualizes data analysing method, with gram
Take defect existing in the prior art.
To achieve the above object, the technical scheme is that:A kind of random forest visualization number based on LargeVis
According to analysis method, realize in accordance with the following steps:
Step S1:Training dataset pre-processes;
Step S2:The sample characteristics that training data concentrates proportion to be more than default gravity thresholds are extracted by random forest;
Step S3:Dimension-reduction treatment is carried out using LargeVis
Step S4:Random forest based on LargeVis carries out visualization processing.
In an embodiment of the present invention, in the step S1, data nonbalance processing is carried out using SMOTE methods, and
It is replaced by using the number not having in median and data and carries out data outliers processing.
In an embodiment of the present invention, further include following steps in the step S2:
Step S21:According to a preliminary estimate and sort;
Step S211:To the characteristic variable in random forest according to VI descending sorts;
Step S212:Determine deletion ratio;20% is rejected from the characteristic variable that currently descending arranges is less than default ratio
The characteristic variable of weight threshold value, to obtain a new feature set;
Step S213:New random forest is established with new feature set, and calculates the VI of each feature in feature set, side by side
Sequence;
Step S214:Above step is repeated, until being left m feature;
Step S22:According to the random forest that each feature set and correspondence establishment that are obtained in step S21 are got up, calculating pair
The outer error rate of bag answered, using the minimum feature set of error rate outside bag as finally selected feature set.
In an embodiment of the present invention, in the step S3, according to step S2 acquisitions as a result, passing through an accidental projection
Tree obtains one and divides space, finds the k neighbours of each sample point on this basis, obtains a preliminary K arest neighbors;According to neighbour
It is through, find potential neighbours using neighbor seaching algorithm, calculate neighbours and current point, the neighbours of neighbours and current point away from
From, and be put into a bit of heap, k node for taking distance minimum obtains a final kNN figures as k neighbours.
In an embodiment of the present invention, for no weights network, y is usediAnd yjTwo points in expression lower dimensional space, two
Point has a binary side e in the kNN figuresijProbability be:
P(eij=1)=f (‖ yi-yj‖2)
Wherein, the t distributions that f () was similar used in t-SNE,If yiAnd yjThe distance between more
Small, 2 points have the probability on binary side larger in the kNN figures;If conversely, yiAnd yjThe distance between it is bigger, then 2 points in institute
Stating has the probability on binary side smaller in kNN figures;
It is w for having weights network, side right valueijProbability be:
Entire optimization aim is to maximize the node of positive sample to there is the probability on connection side in the kNN figures, is minimized
The node of negative sample in the kNN figures to there is the probability on connection side;Remember γ be negative sample side setting weights, then take one it is right
Number, optimization aim become:
To each point i, according to a noise profile Pn(j) it randomly selects M point and constitutes negative sample, the noise point with i
Cloth usesWherein djIt is for the degree namely object function of point j:
In an embodiment of the present invention, after by completing negative sampling and side sampling optimization, using asynchronous stochastic gradient descent
It is trained.
In an embodiment of the present invention, the number of nodes in the time complexity and network of LargeVis is in a linear relationship.
In an embodiment of the present invention, in the step S4, according to the lower dimensional space data obtained, low-dimensional number is drawn out
According to distribution map.
Compared to the prior art, the invention has the advantages that:
(1) present invention uses the method based on LargeVis, and first can improve the speed of service, second, for different
Data set, which has, well adapts to ability, can effectively promote the performance of block mold.
(2) model can be explained using random forest in the present invention, first carries out a wheel feature extraction to data, reduces unnecessary
Feature leaves important feature, forms new feature samples, and carry out dimensionality reduction, by after dimensionality reduction data input random forest into
On the one hand row classification improves block mold performance, the data visualization after another aspect dimensionality reduction, more intuitively, for user
For, it is explanatory stronger.
(3) model of the present invention only exists two basic models, but classification, visualization, dimensionality reduction, data prediction may be implemented
And feature extraction, utilizability is stronger compared with other algorithms.
Description of the drawings
Fig. 1 is the flow chart that the random forest based on LargeVis visualizes data analysing method in the present invention.
Specific implementation mode
Below in conjunction with the accompanying drawings, technical scheme of the present invention is specifically described.
A kind of random forest based on LargeVis of the present invention visualizes data analysing method, real in accordance with the following steps
It is existing:
Step S1:Training dataset pre-processes;
Step S2:The sample characteristics that training data concentrates proportion to be more than default gravity thresholds are extracted by random forest;
Step S3:Dimension-reduction treatment is carried out using LargeVis
Step S4:Random forest based on LargeVis carries out visualization processing.
In the present embodiment, due in practical applications, it may appear that the problem of data sample imbalance and exceptional value, this will
Lead to bad classification results.Training dataset is uneven, can cause many problems in pattern-recognition.For example, if data
Collection is uneven, then the accuracy rate highest that grader tends to " learn " sample of maximum ratio namely grader makes it is more inclined
They are clustered in the high sample of ratio, and with highest precision.In practical applications, this prejudice is unacceptable
's.In order to realize being uniformly distributed for sample data, solve the problems, such as this using a small number of oversampling techniques of synthesis in this example, with
Seldom sample is that each minority class creates " synthesis " example, and exceptional value is solved the problems, such as with median.
In the present embodiment, it is as follows that a small number of over-sampling algorithms are synthesized:
1. for each sample x in minority class, using Euclidean distance as criterion calculation it to owning in minority class sample set D
The distance of sample obtains its k neighbour.
2. multiplying power N is sampled according to one oversampling ratio of sample imbalance ratio setting to determine, for each minority class
Sample x randomly chooses several samples from its k neighbour, it is assumed that the neighbour selected is y.
The 3 neighbour y selected at random for each, build new sample according to following formula with original sample respectively
New samples=x+rand (0,1) * | x-y |
In the present embodiment, in pretreatment stage, training dataset imbalance can cause many problems in pattern-recognition.
For example, if data nonbalance.In order to solve this problem, using SMOTE methods.Often occur asking for abnormal point in data
Topic, when this can lead to training pattern, there is deviation in precision, therefore, using the number not having in median and data in the present embodiment
Word is replaced, and carries out data outliers processing.
Further, the important feature stage is extracted in random forest, namely after the completion of random forest is trained, sample can be carried out
Eigen ratio reorders, and extraction of the stage for the higher feature of proportion inside sample, further includes following steps:
1:According to a preliminary estimate and sort
A) to the characteristic variable in random forest according to VI (Variable Importance) descending sort.
B) it determines deletion ratio, 20% is rejected from the characteristic variable that currently descending arranges and is less than default gravity thresholds
Characteristic variable, to obtain a new feature set.
C) new random forest is established with new feature set, and calculates the VI of each feature in feature set, and is sorted.
D) above step is repeated, until being left m feature.The m values are determined by entire model, preferably, by error rate
Minimum feature set determines.
2:According to the random forest that each feature set obtained in step 1 is set up with them, calculate outside corresponding bag
Error rate (OOB err), using the minimum feature set of error rate outside bag as finally selected feature set.
In the present embodiment, if high dimensional data, which is directly input to dimensionality reduction model, carries out dimensionality reduction, data processing time is long,
Calculating parameter is excessive, may result in performance decline, and new feature collection is extracted using many decision tree Nearest Neighbor with Weighted Voting of random forest
Data sample carry out dimensionality reduction, and classified using random forest, the accuracy and speed of entire model can be promoted.
Further, in the LargeVis dimension-reduction treatment stages:
Input:Pass through the data sample for the new feature that the random forest obtained by step S2 selects.
A space is obtained first with accidental projection tree to divide, and is found the k neighbours of each sample point on this basis, is obtained
One does not tentatively require entirely accurate K arest neighbors (kNN, k-NearestNeighbor) figure.Further according to the thought that neighbour goes directly,
Potential neighbours are found using neighbor seaching algorithm, neighbours is calculated and at a distance from current point, the neighbours of neighbours and current point and puts
Among entering a rootlet heap, k node for taking distance minimum finally obtains an accurate kNN figure as k neighbours.
1. the case where for considering without weights network, using yiAnd yjIndicate two points in lower dimensional space, two points are in kNN
There is a binary side e in figureijThe probability of (weights be 1 side) is:
P(eij=1)=f (‖ yi-yj‖2)
Wherein, the t distributions that f () was similar used in t-SNE,If yiAnd yjThe distance between more
Small, 2 points have the probability on binary side larger in kNN figures;If conversely, yiAnd yjThe distance between it is bigger, then 2 points in kNN figures
There is the probability on binary side smaller.
2. being w for having the case where weights network, definition side right valueijProbability be:
Entire optimization aim is exactly to maximize the node of positive sample to there is the probability on connection side in kNN figures, is minimized negative
The node of sample in kNN figures to there is the probability on connection side.Wherein, γ is the weights for being unified for the setting of negative sample side.One is taken again
A logarithm, optimization aim become:
Following formula uses all negative samplesCalculation amount is too big, is solved using negative sampling algorithm.To each point i, root
According to a noise profile Pn(j) it randomly selects M point and constitutes negative sample with i, which uses and Mikolov et al. use
The similar form of noise profile, i.e.,Wherein djFor the degree of point j.Redefinable object function at this time:
In the present embodiment, after using negative sampling and side sampling optimization, LargeVis has also been used under asynchronous stochastic gradient
It drops to be trained.The technology is very effective on sparse graph, because of two sections that the side of different threads sampling is connected
The few repetitions of point, conflict is hardly generated between different threads.From time complexity, each round stochastic gradient
The time complexity of decline is O (sM), wherein M is negative sample number, and s is the dimension (2 or 3) of lower dimensional space, stochastic gradient
Step number is usually again directly proportional to joint number amount N.Therefore, total time complexity is O (sM).So as to obtain, LargeVis's
Time complexity is in a linear relationship with the number of nodes in network.
Further, the stage is visualized in the random forest based on LargeVis, draws the distribution map of low-dimensional data.
In the present embodiment, a data set is given, by carrying out feature extraction to raw ultrasound data, obtains non-dimensionality reduction
Data, the data obtained at this time are still a high-dimensional data space, by LargeVis prevalence learning algorithms, obtain one
Lower dimensional space data are visualized, the performance of Observable overall data.
In the present embodiment, algorithmic procedure mainly comprises the following steps:
Input:Data set { a1,a2,…an, random forest parameter ntree, mtry
Output:The distribution map of lower dimensional space data
In the present embodiment, as shown in Figure 1, detailed process is as follows:
1. initialization
2. reading in eigenmatrix
3. obtaining a space using accidental projection tree to divide, the k neighbours each put are found on this basis;It recycles adjacent
It occupies searching algorithm and finds potential neighbours, calculate neighbours and at a distance from current point, the neighbours of neighbours and current point and be put into one
Among rootlet heap, k node for taking distance minimum finally obtains an accurate kNN figure as k neighbours.
4.For(iin1:k)
4.1. asynchronous stochastic gradient descent is trained
4.2. time complexity is in a linear relationship with the number of nodes in network
5. calculated locally optimal solution show that data lower dimensional space indicates, draws out the distribution map of low-dimensional data
The above are preferred embodiments of the present invention, all any changes made according to the technical solution of the present invention, and generated function is made
When with range without departing from technical solution of the present invention, all belong to the scope of protection of the present invention.
Claims (8)
1. a kind of random forest based on LargeVis visualizes data analysing method, which is characterized in that real in accordance with the following steps
It is existing:
Step S1:Training dataset pre-processes;
Step S2:The sample characteristics that training data concentrates proportion to be more than default gravity thresholds are extracted by random forest;
Step S3:Dimension-reduction treatment is carried out using LargeVis
Step S4:Random forest based on LargeVis carries out visualization processing.
2. a kind of random forest based on LargeVis according to claim 1 visualizes data analysing method, feature
It is, in the step S1, data nonbalance processing is carried out using SMOTE methods, and by using in median and data
The number not having, which is replaced, carries out data outliers processing.
3. a kind of random forest based on LargeVis according to claim 1 visualizes data analysing method, feature
It is, further includes following steps in the step S2:
Step S21:According to a preliminary estimate and sort;
Step S211:To the characteristic variable in random forest according to VI descending sorts;
Step S212:Determine deletion ratio;20% is rejected from the characteristic variable that currently descending arranges is less than default proportion threshold
The characteristic variable of value, to obtain a new feature set;
Step S213:New random forest is established with new feature set, and calculates the VI of each feature in feature set, and is sorted;
Step S214:Above step is repeated, until being left m feature;
Step S22:According to the random forest that each feature set and correspondence establishment that are obtained in step S21 are got up, calculate corresponding
The outer error rate of bag, using the minimum feature set of error rate outside bag as finally selected feature set.
4. a kind of random forest based on LargeVis according to claim 1 visualizes data analysing method, feature
It is, in the step S3, according to step S2 acquisitions as a result, obtaining one by an accidental projection tree divides space, herein
On the basis of find the k neighbours of each sample point, obtain a preliminary K arest neighbors;It is through according to neighbour, utilize neighbor seaching algorithm
Potential neighbours are found, neighbours is calculated at a distance from current point, the neighbours of neighbours and current point, and be put into a bit of heap, takes
K minimum node of distance obtains a final kNN figures as k neighbours.
5. a kind of random forest based on LargeVis according to claim 4 visualizes data analysing method, feature
It is,
For no weights network, y is usediAnd yjIndicate that two points in lower dimensional space, two points have one two in the kNN figures
First side eijProbability be:
P(eij=1)=f (‖ yi-yj‖2)
Wherein, the t distributions that f () was similar used in t-SNE,If yiAnd yjThe distance between it is smaller, two
Point has the probability on binary side larger in the kNN figures;If conversely, yiAnd yjThe distance between it is bigger, then 2 points in the kNN
There is the probability on binary side smaller in figure;
It is w for having weights network, side right valueijProbability be:
Entire optimization aim is to maximize the node of positive sample to there is the probability on connection side in the kNN figures, minimizes and bears sample
This node in the kNN figures to there is the probability on connection side;Remember that γ is the weights of negative sample side setting, then take a logarithm,
Optimization aim becomes:
To each point i, according to a noise profile Pn(j) it randomly selects M point and constitutes negative sample with i, which usesWherein djIt is for the degree namely object function of point j:
。
6. a kind of random forest based on LargeVis according to claim 5 visualizes data analysing method, feature
It is, after completing negative sampling and side sampling optimization, is trained using asynchronous stochastic gradient descent.
7. a kind of random forest based on LargeVis according to claim 5 visualizes data analysing method, feature
It is, the number of nodes in the time complexity and network of LargeVis is in a linear relationship.
8. a kind of random forest based on LargeVis according to claim 1 visualizes data analysing method, feature
It is, in the step S4, according to the lower dimensional space data obtained, draws out the distribution map of low-dimensional data.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810170150.5A CN108280236B (en) | 2018-02-28 | 2018-02-28 | Method for analyzing random forest visual data based on LargeVis |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810170150.5A CN108280236B (en) | 2018-02-28 | 2018-02-28 | Method for analyzing random forest visual data based on LargeVis |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108280236A true CN108280236A (en) | 2018-07-13 |
CN108280236B CN108280236B (en) | 2022-03-15 |
Family
ID=62808852
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810170150.5A Active CN108280236B (en) | 2018-02-28 | 2018-02-28 | Method for analyzing random forest visual data based on LargeVis |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108280236B (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110491121A (en) * | 2019-07-26 | 2019-11-22 | 同济大学 | A kind of heterogeneity traffic accident causation analysis method and apparatus |
CN111458145A (en) * | 2020-03-30 | 2020-07-28 | 南京机电职业技术学院 | Cable car rolling bearing fault diagnosis method based on road map characteristics |
CN111783840A (en) * | 2020-06-09 | 2020-10-16 | 苏宁金融科技(南京)有限公司 | Visualization method and device for random forest model and storage medium |
CN111815209A (en) * | 2020-09-10 | 2020-10-23 | 上海冰鉴信息科技有限公司 | Data dimension reduction method and device applied to wind control model |
CN112397146A (en) * | 2020-12-02 | 2021-02-23 | 广东美格基因科技有限公司 | Microbial omics data interaction analysis system based on cloud platform |
CN113537281A (en) * | 2021-05-26 | 2021-10-22 | 山东大学 | Dimension reduction method for carrying out visual comparison on multiple high-dimensional data |
CN113792610A (en) * | 2020-11-26 | 2021-12-14 | 上海智能制造功能平台有限公司 | Harmonic reducer health assessment method and device |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160063308A1 (en) * | 2014-08-29 | 2016-03-03 | Definiens Ag | Learning Pixel Visual Context from Object Characteristics to Generate Rich Semantic Images |
CN106955097A (en) * | 2017-03-31 | 2017-07-18 | 福州大学 | A kind of fetal heart frequency state classification method |
CN107169284A (en) * | 2017-05-12 | 2017-09-15 | 北京理工大学 | A kind of biomedical determinant attribute system of selection |
CN107301331A (en) * | 2017-07-20 | 2017-10-27 | 北京大学 | A kind of method for digging of the sickness influence factor based on microarray data |
CN107395590A (en) * | 2017-07-19 | 2017-11-24 | 福州大学 | A kind of intrusion detection method classified based on PCA and random forest |
CN107607723A (en) * | 2017-08-02 | 2018-01-19 | 兰州交通大学 | A kind of protein-protein interaction assay method based on accidental projection Ensemble classifier |
-
2018
- 2018-02-28 CN CN201810170150.5A patent/CN108280236B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160063308A1 (en) * | 2014-08-29 | 2016-03-03 | Definiens Ag | Learning Pixel Visual Context from Object Characteristics to Generate Rich Semantic Images |
CN106955097A (en) * | 2017-03-31 | 2017-07-18 | 福州大学 | A kind of fetal heart frequency state classification method |
CN107169284A (en) * | 2017-05-12 | 2017-09-15 | 北京理工大学 | A kind of biomedical determinant attribute system of selection |
CN107395590A (en) * | 2017-07-19 | 2017-11-24 | 福州大学 | A kind of intrusion detection method classified based on PCA and random forest |
CN107301331A (en) * | 2017-07-20 | 2017-10-27 | 北京大学 | A kind of method for digging of the sickness influence factor based on microarray data |
CN107607723A (en) * | 2017-08-02 | 2018-01-19 | 兰州交通大学 | A kind of protein-protein interaction assay method based on accidental projection Ensemble classifier |
Non-Patent Citations (1)
Title |
---|
JIAN TANG,等: "Visualizing Large-scale and High-dimensional Data", 《WWW "16: PROCEEDINGS OF THE 25TH INTERNATIONAL CONFERENCE ON WORLD WIDE WEBAPRIL》 * |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110491121A (en) * | 2019-07-26 | 2019-11-22 | 同济大学 | A kind of heterogeneity traffic accident causation analysis method and apparatus |
CN110491121B (en) * | 2019-07-26 | 2022-04-05 | 同济大学 | Heterogeneous traffic accident cause analysis method and equipment |
CN111458145A (en) * | 2020-03-30 | 2020-07-28 | 南京机电职业技术学院 | Cable car rolling bearing fault diagnosis method based on road map characteristics |
CN111783840A (en) * | 2020-06-09 | 2020-10-16 | 苏宁金融科技(南京)有限公司 | Visualization method and device for random forest model and storage medium |
CN111815209A (en) * | 2020-09-10 | 2020-10-23 | 上海冰鉴信息科技有限公司 | Data dimension reduction method and device applied to wind control model |
CN113792610A (en) * | 2020-11-26 | 2021-12-14 | 上海智能制造功能平台有限公司 | Harmonic reducer health assessment method and device |
CN113792610B (en) * | 2020-11-26 | 2024-05-31 | 上海智能制造功能平台有限公司 | Health assessment method and device for harmonic reducer |
CN112397146A (en) * | 2020-12-02 | 2021-02-23 | 广东美格基因科技有限公司 | Microbial omics data interaction analysis system based on cloud platform |
CN112397146B (en) * | 2020-12-02 | 2021-08-24 | 广东美格基因科技有限公司 | Microbial omics data interaction analysis system based on cloud platform |
CN113537281A (en) * | 2021-05-26 | 2021-10-22 | 山东大学 | Dimension reduction method for carrying out visual comparison on multiple high-dimensional data |
CN113537281B (en) * | 2021-05-26 | 2024-03-19 | 山东大学 | Dimension reduction method for performing visual comparison on multiple high-dimension data |
Also Published As
Publication number | Publication date |
---|---|
CN108280236B (en) | 2022-03-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108280236A (en) | A kind of random forest visualization data analysing method based on LargeVis | |
CN106096727B (en) | A kind of network model building method and device based on machine learning | |
CN103559504B (en) | Image target category identification method and device | |
CN109615014B (en) | KL divergence optimization-based 3D object data classification system and method | |
CN107292350A (en) | The method for detecting abnormality of large-scale data | |
CN110135494A (en) | Feature selection method based on maximum information coefficient and Gini index | |
CN107066555B (en) | On-line theme detection method for professional field | |
CN108280472A (en) | A kind of density peak clustering method optimized based on local density and cluster centre | |
CN107832456B (en) | Parallel KNN text classification method based on critical value data division | |
CN112861752B (en) | DCGAN and RDN-based crop disease identification method and system | |
CN110598061A (en) | Multi-element graph fused heterogeneous information network embedding method | |
CN111209939A (en) | SVM classification prediction method with intelligent parameter optimization module | |
CN111259964A (en) | Over-sampling method for unbalanced data set | |
CN111814979B (en) | Fuzzy set automatic dividing method based on dynamic programming | |
Ibrahim et al. | On feature selection methods for accurate classification and analysis of emphysema ct images | |
CN116759067A (en) | Liver disease diagnosis method based on reconstruction and Tabular data | |
CN116030231A (en) | Multistage classification BIM model intelligent light-weight processing method | |
CN113537339B (en) | Method and system for identifying symbiotic or associated minerals based on multi-label image classification | |
CN109871894A (en) | A kind of Method of Data Discretization of combination forest optimization and rough set | |
CN115017988A (en) | Competitive clustering method for state anomaly diagnosis | |
Ma | The Research of Stock Predictive Model based on the Combination of CART and DBSCAN | |
CN114626485A (en) | Data tag classification method and device based on improved KNN algorithm | |
Mishra et al. | Efficient intelligent framework for selection of initial cluster centers | |
CN108932550B (en) | Method for classifying images based on fuzzy dense sparse dense algorithm | |
CN112308160A (en) | K-means clustering artificial intelligence optimization algorithm |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |