CN110175191B - Modeling method for data filtering rule in data analysis - Google Patents

Modeling method for data filtering rule in data analysis Download PDF

Info

Publication number
CN110175191B
CN110175191B CN201910401717.XA CN201910401717A CN110175191B CN 110175191 B CN110175191 B CN 110175191B CN 201910401717 A CN201910401717 A CN 201910401717A CN 110175191 B CN110175191 B CN 110175191B
Authority
CN
China
Prior art keywords
data
column
analysis
cnt
filtering
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910401717.XA
Other languages
Chinese (zh)
Other versions
CN110175191A (en
Inventor
周鹏程
荆一楠
何震瀛
王晓阳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fudan University
Original Assignee
Fudan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fudan University filed Critical Fudan University
Priority to CN201910401717.XA priority Critical patent/CN110175191B/en
Publication of CN110175191A publication Critical patent/CN110175191A/en
Application granted granted Critical
Publication of CN110175191B publication Critical patent/CN110175191B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2457Query processing with adaptation to user needs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2474Sequence data queries, e.g. querying versioned data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/248Presentation of query results

Abstract

The invention belongs to the technical field of data analysis, and particularly relates to a data filtering rule modeling method in data analysis. The data filtering rule modeling method mainly comprises three parts: (1) data column analysis filtering (2) data range analysis filtering (3) automatic visualization of the result set. According to the invention, by reasonably setting related rules, how to apply the data filtering rules in data analysis to establish an analysis filtering model is solved, and the model is utilized to analyze and filter data and intuitively display the data. The invention can facilitate the user to quickly screen the data and find out the interested data subset, and analyze and mine the connection between the data items.

Description

Modeling method for data filtering rule in data analysis
Technical Field
The invention belongs to the technical field of data analysis, and particularly relates to a data filtering rule modeling method in data analysis.
Background
In the ubiquitous age of data, users' decisions are increasingly driven by data. Often, differences in the results of data analysis can significantly impact the decision making process. Selecting improper data, whether intentional or unintentional, may result in erroneous, misleading, or "fragile" decisions. Especially for users who have data analysis experience with millidata analysis, the results of these poor data analysis may lead to serious economic losses. So that the user is guided to perform good data selection energy band to better quality data analysis exploration experience.
In order to enable users without data analysis experience to eliminate error-prone data exploration processes and complicated analysis filtering condition setting as much as possible, good data analysis filtering effects are obtained in a straightforward manner. It is undoubtedly necessary to use a standardized process to determine how to perform the filtering analysis selection of the data, and how to automatically perform the modeling of the data filtering rules according to the characteristics of the data.
Disclosure of Invention
The invention aims to provide a data filtering rule modeling method for an interactive data exploration scene, so that data on a data set can be quickly analyzed and mined, and a user can conveniently explore and analyze the data.
For recommendation rule modeling on a dataset, we expect the characteristics as follows:
1. interpretability: how to properly generate recommendations within a visualization system;
2. feasibility: generating recommendations should have sufficient analytical significance to be able to mine potential associations between data;
3. quality: because of the characteristics explored by users, the construction of the model is efficient and robust.
The data filtering rule modeling method provided by the invention comprises the following specific steps:
(1) Given a data set D composed of a large amount of data, the importance of a data column is calculated by adopting a random forest feature selection method according to whether key data are specified by a user or not. The specific flow is as follows:
(1.1) importance score (variable importance measures), expressed in VIM, gini index in GI, assuming m data columns X 1 ,X 2 ,X 3 ,...,X m Now each column X is calculated j Gini index score VIM of (a) j (Gini) That is, the j-th column represents the average amount of change in node splitting uncertainty in all decision trees of the Random Forest (RF); wherein Gini index:
Figure 511690DEST_PATH_IMAGE001
wherein K represents that m nodes have K categories and p in RF all decision trees mk Represents the proportion of class k in node m, p mk′ A complement representing the proportion of class k in node m; intuitively, two samples are randomly extracted from the node m, and the class labels of the samples are inconsistent.
(1.2)Data column X j The significance of node m, i.e. the Gini index change before and after branching of node m, is
Figure 121663DEST_PATH_IMAGE002
Figure 26165DEST_PATH_IMAGE003
And
Figure 328971DEST_PATH_IMAGE004
respectively, the Gini index of two new nodes after branching.
(1.3) data column X j The nodes that appear in decision tree i are in set M, then X j The importance in the ith tree is:
Figure 686134DEST_PATH_IMAGE005
(1.4) n trees in random forest, data column X j The importance of (2) is:
Figure 99798DEST_PATH_IMAGE006
(1.5) according to the ranking of the calculated importance, returning the two most important columns of data of the analysis and filtration result to the user, wherein the ranking of the importance of A is A, B and is higher than that of B.
(2) Data range analysis filtering. The invention takes A, B two columns as an example to describe how to analyze and filter the data range, and the specific flow is as follows:
(2.1) the invention is first divided into three categories according to A, B two column data types: a numerical value type N, a discrete value type X and a time sequence type T; for the numerical N, discretization processing is firstly carried out, namely, the data are subjected to box division processing to obtain each box record N ', and the count record of each box is calculated to be CNT (N'); for discrete value type X, calculating a count of each discrete value as CNT (X);
because the time sequence type data often has the characteristic of quartering, the invention can divide the time segment box automatically according to the time sequence data range of the data column T, and the data column T is divided into time sequence boxes to be marked as T'; such as: the time sequence box T 'is divided by taking the year as a unit in the data range of T from 2017 to 2019, and the time sequence box T' is divided by taking the month as a unit in the data range of T only in 2019; the data range of the same column T is only 1 month of 2019, and the time sequence box T' is divided in units of days.
(2.2) forming two data analysis filtering combination models according to three different data types, and performing data filtering analysis on the data set D (wherein all "/" means "or" and are not represented as division); the method comprises the following steps:
(2.2.1) A is time-series data, and B is discrete value type or numerical value type; a selecting proper near-segment time as a first filtering condition t according to the unit of the time sequence box t' obtained in the step (2.1) recent (e.g., last three years, last six months, last seven days, insufficient to produce this filtering); the data set after the condition screening of the A column is D * The data column B is filtered to obtain a discrete data column B * X of (2) 1 * ,x 2 * ,...,x k * Or a numerical data column B * Re-binning to obtain (n) 1 * )′,(n 2 * )′,...,(n k * ) ' wherein the number of boxes is k, x * /(n * ) The three values CNT (x * ) top3 /CNT((n * )′) top3 Three discrete data x max * Or box (n) max * ) The' numerical range serves as a second filtering condition; with two filtering conditions t recent And x max * /(n max * ) Intersection t of recent ∩x max * /(n max * ) ' as an analysis filtering condition of the analysis filtering combination model, performing data filtering analysis on the data set D;
(2.2.2) A is discrete value type or numerical value type, and B is time sequence type data; a calculates CNT (x) for each discrete value quantity or binCNT (n'), selecting the five constants x with the highest count top5 Or box (n) top5 ) The numerical range corresponding to' (discrete value or insufficient bin number would not produce this filtering) is used as the first filtering condition; the data set after the condition screening of the A column is D * The method comprises the steps of carrying out a first treatment on the surface of the Selecting the constant x with the most counting in A max Or box (n) max ) ' corresponding data column B * Time sequence range t of (2) max As a second filtering condition; with two filtering conditions x top5 /(n top5 ) ' and t max Is the intersection x of (2) top5 /(n top5 )′∩t max As analysis filter conditions of the analysis filter combination model, data filter analysis was performed on the data set D.
(3) In order to present the analysis-filtered data to the user, the present invention automatically visualizes the resulting dataset obtained by the two-step analysis filtering of steps (1), (2). The specific flow is as follows:
(3.1) visualizing the result data set to obtain a cardinal value d (X) of a column X, a maximum value max (X) of the column X, a minimum value min (X), a record number |X| of the column X, a data type (X) of the column X, a count CNT (X ') of each bin data X' and a corresponding X 'thereof (each discrete value of the discrete value column X can be regarded as a bin), and a correlation coefficient correlation (X, CNT (X') of each bin data X 'and a corresponding count CNT (X').
(3.2) defining a set of clipping rules according to the column type (X) obtained in (3.1); when the data type of column x is time-sequential: the visual chart may be a bar chart or a line chart; when the data type of the column x is discrete value type or numerical value type: the visualization chart may be a histogram, pie chart, or scatter chart.
(3.3) the invention provides a data analysis method-relative information entropy to determine how the result data set obtained from the analysis and filtration in the steps (1) and (2) is visualized automatically; the core idea of the method is to calculate the ratio of the information entropy of each data column X visualization to the normalized chart information entropy, and record as C (X) 1 ,C(X) 2 ,...,C(X) k The method comprises the steps of carrying out a first treatment on the surface of the Comparing each relative informationEntropy, maximum C (X) max The corresponding chart type is the visualization type of the data column X. The specific method comprises the following steps:
(3.3.1) the bar graph is one of the most commonly used charts by analysts, and the height difference of the bar is utilized to improve the recognition degree of the user on the data difference; the bar graph is suitable for various scenes, and can better show the details of the data when the number of x' elements (namely the number of boxes) is more; calculating the relative information entropy of the histogram using the cardinal value d (X) of column X, |d (X) | representing the value of cardinal value d (X) of column X;
Figure 858806DEST_PATH_IMAGE007
(3.3.2) the pie chart may show multiple sets of data representing the overall ratio of each set of data; in the pie chart we need a differentiated CNT (x') to highlight the fraction of each fraction, for which shannon entropy is introduced:
Figure 332513DEST_PATH_IMAGE008
as part of the decision criteria; where y represents each value of CNT (x '), and P (y) represents the number ratio of y, i.e., the occurrence probability of y at CNT (x');
Figure 176972DEST_PATH_IMAGE009
the advantage of the (3.3.3) line graph can reflect the situation that the same thing changes in development in different time; when the data CNT (X ') and X' conform to a certain distribution (such as linear distribution, exponential distribution, logarithmic distribution, low power distribution), the expression of the distribution is denoted as distribution (X ', CNT (X')), and the information entropy C (X) is 1; otherwise, the information entropy C (X) is 0;
C(X)= distribution(x′,CNT(x′));
(3.3.4) the scatter plot represents the relationship between the two variables by coordinate axes; calculation using correlation coefficient corridation (x ', CNT (x'));
C(X)= correlation (x′,CNT(x′))。
(3.4) obtaining the relative information entropy sequence under various visual charts by comparing the columns X, and obtaining the maximum value C (X) of the relative information entropy max . (1) (2) analysis of the filtered resulting dataset Using C (X) max And visually displaying the corresponding chart type.
According to the invention, by reasonably setting related rules, how to apply the data filtering rules in data analysis to establish an analysis filtering model is solved, and the model is utilized to analyze and filter data and intuitively display the data. The invention can facilitate the user to quickly screen the data and find out the interested data subset, and analyze and mine the connection between the data items.
Drawings
FIG. 1 is a diagram of an example of data column analysis.
FIG. 2 is a process of data analysis filtering.
FIG. 3 is an example of data analysis filtering. Wherein (a) is a sales date filtering instance graph and (b) is a sales price filtering instance graph.
FIG. 4 is a comparison of the visualization of the result dataset. Wherein (a) is a result dataset histogram display and (b) is a result dataset ray diagram display.
FIG. 5 is a flow chart of the method of the present invention.
Detailed Description
In this section we describe the invention by means of a specific data analysis system.
The data selected by the invention comprises 33 columns 344355 pieces of data. Operating in accordance with the procedure described above, the data columns and data ranges are analyzed and the data resulting from the analysis is visualized and then returned to the user for presentation. As shown in the following FIG. 1, the data column analysis method of the present invention analyzes all the remaining data columns by using profit columns as key columns, and the analysis result is that the importance of both the sales date and the sales price is the highest.
The invention establishes a data filtering rule model based on the scheme provided in the step (2), combines the target column sales date and the selling price under the screening condition, and the data analysis system obtains the operation sequence of the analysis data shown in the following figure 2 based on the data filtering rule model, so as to obtain the maximum box data range 0-57 of the selling price with the sales date of the last month. Finally, the example of the filtering result system shown in fig. 3 is obtained and displayed.
The invention takes the form of an automated visualization. The result dataset will be analyzed autonomously and presented in a suitable visual chart. As shown in fig. 4 below, the left plot shows less suitable data as a bar graph, while visualizing data as a right plot line graph is easier to see trends than visualizing as a bar graph. Therefore, the invention adopts the right line graph to display the selling price of the data array.

Claims (1)

1. A data filtering rule modeling method in data analysis comprises the following specific steps:
(1) Given a data set D formed by a large amount of data, calculating the importance of a data column according to whether key data are designated by a user or not by adopting a random forest feature selection method; the specific flow is as follows:
(1.1) an importance score, expressed in VIM; the Gini index is expressed in GI assuming m data columns X 1 ,X 2 ,X 3 ,...,X m To calculate each column X j Gini index score VIM of (a) j (Gini) That is, the j-th column represents the average amount of change in node splitting uncertainty in all decision trees in the random forest RF; gini index is:
Figure FDA0004154738440000011
wherein K represents that m nodes have K categories and p in RF all decision trees mk Represents the proportion of class k in node m, p mk′ A complement representing the proportion of class k in node m;
(1.2) data column X j The significance at node m, i.e., the Gini index variation before and after branching at node m, is:
Figure FDA0004154738440000012
GI l and GI r Gini indexes respectively representing two new nodes after branching;
(1.3) data column X j The nodes that appear in decision tree i are in set M, then X j The importance in the ith tree is:
Figure FDA0004154738440000013
(1.4) n trees in random forest, data column X j The importance of (2) is:
Figure FDA0004154738440000014
(1.5) according to the calculated importance ranking, returning the two columns of data with the most important analysis and filtration results to the user, wherein the importance ranking of A is A, B and is higher than that of B;
(2) Analyzing and filtering a data range; the specific flow is as follows:
(2.1) first three classes are classified according to A, B two column data types: a numerical value type N, a discrete value type X and a time sequence type T; for the numerical N, discretizing is firstly carried out, namely, the data are subjected to box division processing to obtain each box record N ', and the count record of each box division is calculated to be CNT (N'); for discrete value type X, calculating a count of each discrete value as CNT (X);
time sequence type T, dividing a time segment box according to the time sequence data range of the data column T, and dividing the data column T into time sequence boxes to obtain each time sequence box record as T';
(2.2) forming two data analysis and filtration combined modes according to three different data types, and carrying out data filtration analysis on the data set D; the method comprises the following steps:
(2.2.1) A is time-series data, and B is discrete value type or numerical value type; a selecting proper near-segment time as the first filtering according to the unit of the time sequence box t' obtained in the step (2.1)Condition t recent The method comprises the steps of carrying out a first treatment on the surface of the The data set after the condition screening of column A is marked as D * The data column B is filtered to obtain a discrete data column B * X of (2) 1 * ,x 2 * ,...,x k * Or a numerical data column B * Re-binning to obtain (n) 1 * )′,(n 2 * )′,...,(n k * ) ' wherein the number of boxes is k, x * /(n * ) The three values CNT (x * ) top3 /CNT((n * )′) top3 Three discrete data x max * Or box (n) max * ) The' numerical range serves as a second filtering condition; with two filtering conditions t recent And x max * /(n max * ) Intersection t of recent ∩x max * /(n max * ) ' as an analysis filtering condition of the analysis filtering combination model, performing data filtering analysis on the data set D;
(2.2.2) A is discrete value type or numerical value type, and B is time sequence type data; a calculating CNT (x)/CNT (n') for each discrete value quantity or bin, selecting the five constants x with the highest count top5 Or box (n) top5 ) The' corresponding numerical range is taken as the first filtering condition; the data set after the condition screening of the A column is D * The method comprises the steps of carrying out a first treatment on the surface of the Selecting the constant x with the most counting in A max Or box (n) max ) ' corresponding data column B * Time sequence range t of (2) max As a second filtering condition; with two filtering conditions x top5 /(n top5 ) ' and t max Is the intersection x of (2) top5 /(n top5 )′∩t max As the analysis and filtration conditions of the analysis and filtration combined model, carrying out data filtration and analysis on the data set D;
(3) Automatically visualizing the resulting dataset resulting from the analysis filtering of steps (1), (2) for presenting the analysis filtered data to a user; the specific flow is as follows:
(3.1) visualizing the result data set to obtain a cardinal value d (X) of a column X, a maximum value max (X) of the column X, a minimum value min (X), a record bar number |X| of the column X, a data type (X) of the column X, correlation coefficients correlation (X, CNT (X ')) of each bin data X' and a count CNT (X ') corresponding to each bin data X';
(3.2) defining a set of clipping rules according to the column type (X) obtained in (3.1); when the data type of column x is time-sequential: the visual chart is a bar chart and a line chart; when the data type of the column x is discrete value type or numerical value type: the visual chart is a histogram, a pie chart and a scatter chart;
(3.3) adopting a data analysis method-relative information entropy to determine how to automatically visualize the result data set obtained after analysis and filtration in the steps (1) and (2); the core idea of the method is to calculate the ratio of the information entropy of each data column X visualization into various charts relative to the normalized chart information entropy, and record as C (X) 1 ,C(X) 2 ,...,C(X) k The method comprises the steps of carrying out a first treatment on the surface of the Comparing the magnitude of each relative information entropy, maximum C (X) max The corresponding chart type is the visualization type of the data column X; the method comprises the following steps:
(3.3.1) in the bar graph, the height difference of the bar is used for improving the recognition degree of the user on the data difference; calculating the relative information entropy of the histogram uses the cardinal value d (X) of column X, |d (X) | represents the value of cardinal value d (X) of column X:
Figure FDA0004154738440000031
(3.3.2) the pie chart may show multiple sets of data representing the overall ratio of each set of data; in the pie chart, a differentiated CNT (x') is required to highlight the fraction of each fraction, for which shannon entropy is introduced: sigma (sigma) y∈CNT(x′) -P (y) log P (y) as part of the decision criterion; where y represents each value of CNT (x '), and P (y) represents the number ratio of y, i.e., the occurrence probability of y at CNT (x');
Figure FDA0004154738440000032
(3.3.3) the line graph may reflect the situation of the same thing developing changes in different times; when the data CNT (x ') and x' conform to a certain distribution: when linear distribution, exponential distribution, logarithmic distribution or low power distribution is performed, the distribution expression is marked as distribution (X ', CNT (X')), and the information entropy C (X) is 1; otherwise, the information entropy C (X) is 0;
C(X)=distribution(x′,CNT(x′))
(3.3.4) in the scatter diagram, the relationship between the two variables is represented by the coordinate axes; calculation using correlation coefficient corridation (x ', CNT (x'));
C(X)=correlation(x′,CNT(x′))
(3.4) obtaining the relative information entropy sequence under various visual charts by comparing the columns X, and obtaining the maximum value C (X) of the relative information entropy max The method comprises the steps of carrying out a first treatment on the surface of the Analyzing and filtering the obtained result data set in the steps (1) and (2) by adopting C (X) max And visually displaying the corresponding chart type.
CN201910401717.XA 2019-05-14 2019-05-14 Modeling method for data filtering rule in data analysis Active CN110175191B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910401717.XA CN110175191B (en) 2019-05-14 2019-05-14 Modeling method for data filtering rule in data analysis

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910401717.XA CN110175191B (en) 2019-05-14 2019-05-14 Modeling method for data filtering rule in data analysis

Publications (2)

Publication Number Publication Date
CN110175191A CN110175191A (en) 2019-08-27
CN110175191B true CN110175191B (en) 2023-06-27

Family

ID=67691033

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910401717.XA Active CN110175191B (en) 2019-05-14 2019-05-14 Modeling method for data filtering rule in data analysis

Country Status (1)

Country Link
CN (1) CN110175191B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110766167B (en) * 2019-10-29 2021-08-06 深圳前海微众银行股份有限公司 Interactive feature selection method, device and readable storage medium

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105550374A (en) * 2016-01-29 2016-05-04 湖南大学 Random forest parallelization machine studying method for big data in Spark cloud service environment

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106295983A (en) * 2016-08-08 2017-01-04 烟台海颐软件股份有限公司 Power marketing data visualization statistical analysis technique and system
CN106599325A (en) * 2017-01-18 2017-04-26 河海大学 Method for constructing data mining visualization platform based on R and HighCharts
CN107103050A (en) * 2017-03-31 2017-08-29 海通安恒(大连)大数据科技有限公司 A kind of big data Modeling Platform and method
CN107193967A (en) * 2017-05-25 2017-09-22 南开大学 A kind of multi-source heterogeneous industry field big data handles full link solution
CN108171617A (en) * 2017-12-08 2018-06-15 全球能源互联网研究院有限公司 A kind of power grid big data analysis method and device
CN109409647A (en) * 2018-09-10 2019-03-01 昆明理工大学 A kind of analysis method of the salary level influence factor based on random forests algorithm

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105550374A (en) * 2016-01-29 2016-05-04 湖南大学 Random forest parallelization machine studying method for big data in Spark cloud service environment

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于非平衡数据的随机森林算法研究;魏正韬;信息科技(第2018年第04期);全文 *

Also Published As

Publication number Publication date
CN110175191A (en) 2019-08-27

Similar Documents

Publication Publication Date Title
Kotu et al. Data science: concepts and practice
Kotu et al. Predictive analytics and data mining: concepts and practice with rapidminer
US9824469B2 (en) Determining alternative visualizations for data based on an initial data visualization
EP3201804B1 (en) Cloud process for rapid data investigation and data integrity analysis
US10970431B2 (en) Automated model development process
US7777743B2 (en) Viewing multi-dimensional data through hierarchical visualization
Yoon Discovering knowledge in corporate databases
CN108140025A (en) For the interpretation of result of graphic hotsopt
US20120059790A1 (en) Method for providing with a score an object, and decision-support system
Halim et al. Quantifying and optimizing visualization: An evolutionary computing-based approach
CN110737805B (en) Method and device for processing graph model data and terminal equipment
CN110738527A (en) feature importance ranking method, device, equipment and storage medium
CN112149737A (en) Selection model training method, model selection method, selection model training device and selection model selection device, and electronic equipment
CN112101574B (en) Machine learning supervised model interpretation method, system and equipment
CN109035025A (en) The method and apparatus for evaluating stock comment reliability
CN110175191B (en) Modeling method for data filtering rule in data analysis
Wulandari et al. Algorithm analysis of K-means and fuzzy C-means for clustering countries based on economy and health
Khoshnevis et al. Prioritizing ground‐motion validation metrics using semisupervised and supervised learning
CN107866072B (en) System for detecting plug-in by adopting incremental decision tree
CN107368506A (en) Unstructured data analysis system and method
Gunawan et al. C4. 5, K-Nearest Neighbor, Naïve Bayes, and Random Forest Algorithms Comparison to Predict Students' on TIME Graduation
Nasution A method for constructing a dataset to reveal the industrial behaviour of big data
US20180121811A1 (en) Profiling a population of examples in a precisely descriptive or tendency-based manner
Cornforth et al. Cluster evaluation, description, and interpretation for serious games: player profiling in Minecraft
Swarnalatha et al. Mining Educational Data for students' placement prediction using Sum of difference method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant