CN113868493A - Visual chart recommendation method - Google Patents
Visual chart recommendation method Download PDFInfo
- Publication number
- CN113868493A CN113868493A CN202111065907.2A CN202111065907A CN113868493A CN 113868493 A CN113868493 A CN 113868493A CN 202111065907 A CN202111065907 A CN 202111065907A CN 113868493 A CN113868493 A CN 113868493A
- Authority
- CN
- China
- Prior art keywords
- visualization
- chart
- meaningful
- data
- classification
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 38
- 230000000007 visual effect Effects 0.000 title claims abstract description 22
- 238000012800 visualization Methods 0.000 claims abstract description 81
- 238000012360 testing method Methods 0.000 claims description 15
- 230000006870 function Effects 0.000 claims description 12
- 238000005192 partition Methods 0.000 claims description 12
- 238000003066 decision tree Methods 0.000 claims description 10
- 238000012549 training Methods 0.000 claims description 10
- 238000013145 classification model Methods 0.000 claims description 9
- 230000000694 effects Effects 0.000 claims description 6
- 238000005457 optimization Methods 0.000 claims description 6
- 238000012706 support-vector machine Methods 0.000 claims description 4
- 230000002123 temporal effect Effects 0.000 claims description 4
- 238000004458 analytical method Methods 0.000 claims description 3
- 230000009977 dual effect Effects 0.000 claims description 3
- 238000007636 ensemble learning method Methods 0.000 claims description 3
- 238000000605 extraction Methods 0.000 claims description 3
- 238000010801 machine learning Methods 0.000 claims description 3
- 238000013507 mapping Methods 0.000 claims description 3
- 238000002474 experimental method Methods 0.000 claims description 2
- 238000013079 data visualisation Methods 0.000 description 4
- 238000002372 labelling Methods 0.000 description 2
- 238000006467 substitution reaction Methods 0.000 description 2
- 230000004075 alteration Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000012512 characterization method Methods 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 239000000126 substance Substances 0.000 description 1
- 238000007794 visualization technique Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/904—Browsing; Visualisation therefor
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/903—Querying
- G06F16/9035—Filtering based on additional data, e.g. user or group profiles
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/906—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/254—Fusion techniques of classification results, e.g. of results related to same input data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2411—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2415—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
- G06F18/24155—Bayesian classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/243—Classification techniques relating to the number of classes
- G06F18/24323—Tree-organised classifiers
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Computation (AREA)
- Evolutionary Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computational Linguistics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention relates to the field of chart visualization, and particularly provides a chart visualization recommendation method. Compared with the prior art, the method has the advantages that the most meaningful visual results are learned from numerous visual practical data sets, marked and indexed, the meaningful visual types are found by searching the index, the problems that visual type operations are responsible and numerous, and visual result redundancy caused by the large enumeration search space is solved, meanwhile, the method can be integrated on data analysis software as a chart visual recommendation engine, and the usability of the data analysis software is improved.
Description
Technical Field
The invention relates to the field of chart visualization, and particularly provides a chart visualization recommendation method.
Background
Data visualization is used by an increasing number of people as an important means of data analysis. The goal of visualization recommendation is to automatically generate results for analysts to explore and select by some technical means to reduce visualization obstacles. However, data visualization presents certain difficulties for most people who are not specialized in visualization techniques.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides a chart visualization recommendation method with strong practicability.
The technical scheme adopted by the invention for solving the technical problems is as follows:
a chart visualization recommendation method includes the steps of firstly, extracting a plurality of data features and corresponding meaningful visualization chart types from a real visualization data set, then respectively training a classification model by using classifiers, learning meaningful visualization from the classification model, testing accuracy by using a test set, and finally, fusing results of the plurality of classifiers to select a meaningful chart suitable for the data set.
Furthermore, the classifier adopts a decision tree and a support vector machine and naive Bayes, the three classifiers construct a classification model for data in visualization practice, the visualization result is divided into meaningful and meaningless, when new visualization exploration is carried out, the meaningless result is discarded, the meaningful visualization result is reserved, and the result is recommended to a user.
Further, the ID3 algorithm in the decision tree uses information gain as an attribute selection metric, entropy measures the uncertainty of things, the more uncertain things its entropy is, the more desirable information needed to assemble the tuples in D is given by the formula:
wherein p isiIs that any tuple in D belongs to CiIs the average amount of information needed to identify the class label of the tuple in D, the tuple in D is divided by some attribute A, where A has v different values { a }1,a2,…,avApplied to v outputs on A, D can be divided into v partitions or subsets { D with attribute A1,D2,…,DvIn which D isjContaining tuples in D, theirA has a value of aj;
Since partitions may contain tuples from different classes, more information is also needed for more accurate classification:
whereinServing as a weight for the jth partition. InfoA(D) The method is based on expected information required by classifying D tuples according to A division, and the smaller the required expected information is, the higher the partition purity is;
the information gain is defined as the difference between the original information requirement and the new information requirement, i.e. the information gain is defined as the difference between the original information requirement and the new information requirement
Gain(A)=Info(D)-InfoA(D)
The attribute a with the highest gain (a) is selected as the classification attribute for node N.
Further, the SVM classification is a machine learning method established on the basis of a statistical learning theory, is suitable for linear separable samples and non-separable samples, and is provided with samples (x)i,yi)(xi∈Rd;yi∈{-1,+1};i=1,2,...,n),xiIs a feature vector, yiAnd if the sample is a class label, converting the problem into a solution convex quadratic optimization problem if the sample is linearly separable, and obtaining a formula:
wherein ω is a weight vector; c is a penalty factor; xiiIs a relaxation factor; b is an offset; obtaining dual description of the optimization problem by Lagrange operator, satisfying yi[(ω·xi)+b]Under the condition of 1, a classification decision function can be obtained as follows:
if the samples are linearly inseparable, the samples in the input space can be mapped into a high-dimensional linearly separable feature space through nonlinear mapping, and the optimal classification decision function in the feature vector is obtained by utilizing the kernel function, and the method comprises the following steps:
Further, the bayesian classifier is a probability classifier, when classifying data according to a plurality of features, the plurality of features are assumed to be independent from each other, then each classification probability is obtained by using a conditional probability multiplication method, then the probability with the maximum probability is selected as the judgment of a machine, and a set of training data sets { (x) is given1,y1),(x2,y2),…,(xm,ym) Where m is the number of samples, each data set contains n features, i.e. Xi=(xi1,xi2,…,xim) And class label set is { y1,y2,…,ykJudging the category of a new sample, namely solving the maximum posterior probability argmaxp (y | x):
since the denominator of formula is for each p (y ═ y)i| x) are the same, and the final discrimination formula is:
further, when the data set is collected, the data with the best visualization experiment effect usually contains 10 attribute columns by using a BI analysis tool, and the graph with the better visualization effect is marked and displayed.
Further, when extracting the data features, for the training data set, the description of the column attribute features is divided into:
length, type, category (c), qualitative (q), and temporal (t) classified;
for the classification data, count0, radio, entrypy and gini of different values are counted;
for numerical data, a plurality of statistical features including maximum, minimum, mean, median, mode, variance, standard, median absolute deviation and distribution are calculated;
many of the paired-column features depend on a single column type determined by single-column feature extraction.
Further, in determining meaningful visual chart types, respectively marking a bar chart, a pie chart, a line chart and a scatter chart with 0-3, respectively enumerating four visual chart types of a single column and a double column and corresponding meaningful visual results, wherein for a data set with m column attributes, each column has 4m chart display types, and 2m (m-1) possible chart display types exist between each two columns.
Furthermore, when the accuracy rate is tested by using the test set, labels are attached to each data set according to the visualization practice in the BI analysis software, all possible visualization results are enumerated simultaneously, and the labels are respectively attached to all the visualization results by combining the BI software, so that a meaningful visualization result is finally obtained;
and finally, the output of the three classifiers is the judgment of all possible visualization results, 1 represents a 'meaningful' visualization result, 0 represents a 'meaningless' visualization result, indexes of the visualization results in the data set are marked, and the type of a meaningful visual representation chart can be positioned through the indexes.
Further, in order to further improve the accuracy of finding a deliberate chart suitable for a data set, an ensemble learning method is adopted, a model combining three simple classifiers is trained, the class with the largest number of votes is marked as an output result through a relative majority voting method, and if a plurality of class marks all obtain the highest number of votes, one class label is randomly selected as the output.
Compared with the prior art, the chart visualization recommendation method has the following outstanding beneficial effects:
the method and the device learn the most significant visual results from numerous visual practical data sets, mark the most significant visual results and establish indexes, find the significant visual types by searching the indexes, and avoid the problems that visual type operations are responsible and numerous and the visual result redundancy caused by the large enumeration search space is solved.
The adopted classifier effectively learns the meaningful visual result and obtains good accuracy performance on the test set.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
FIG. 1 is a schematic diagram of an ensemble learning model in a chart visualization recommendation method.
Detailed Description
The present invention will be described in further detail with reference to specific embodiments in order to better understand the technical solutions of the present invention. It is to be understood that the described embodiments are merely exemplary of the invention, and not restrictive of the full scope of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
A preferred embodiment is given below:
according to the chart visualization recommendation method, firstly, a plurality of data features and corresponding meaningful visualization chart types are extracted from a real visualization data set, then classifiers are used for training classification models respectively, meaningful visualization is learned from the classification models, a test set is used for accuracy test, and finally, a plurality of classifier results are fused to select the meaningful charts suitable for the data set.
The method comprises the following specific steps:
s1, when the data set is collected:
with the aid of the BI analysis tools used in work, a large number of data visualization results are accumulated in long-term data visualization practices, in which the number of columns and rows of data sets is also very different, although some data sets contain hundreds of columns of attributes, most data sets are smaller than 25 columns, the data set with the best visualization practice effect usually contains about 10 attribute columns, and the graph with the better visualization effect is marked and displayed. These data sets typically contain time series attributes, classification attributes and numerical attributes, wherein the particulars of a portion of the data set are shown in table (1):
watch (1)
S2, when data features are extracted:
for the training data set, the characterization of the column attributes is listed in table (2), and these features can be classified into 6 classes:
length is the number of data lines.
type is a data type divided into categorical (C), temporal (Q), and quantitative (temporal (T)).
For the classification data, the number of different values thereof (count0), the ratio (radio), the entropy (entrypy), and the kini coefficient (gini) were counted.
For numerical data, a number of statistical features are calculated, including maximum, minimum, mean, median, mode, variance, standard, median absolute deviation, and distribution.
Many of the paired-column features depend on a single column type determined by single-column feature extraction. For example, Pearson's correlation coefficient requires two columns of digits, χ2Two classification columns are required.
S3, when determining the type of the meaningful visual chart:
in daily visualization practice, more than 85% of the visualization results can be represented by bar graphs (bar), pie graphs (pie), line graphs (line), or scatter graphs (scatter), and only the recommendation of four visualization graphs is considered herein, and the bar graphs, pie graphs, line graphs, and scatter graphs are labeled with 0-3, respectively, enumerating four visualization graph types, single-column and two-column, respectively, and the corresponding "meaningful" visualization results, for a dataset with m column attributes, 4m graph presentation types per column, and 2m (m-1) possible graph presentation types between each two columns.
S4, when the accuracy test is carried out by using the test set:
the method comprises the steps of selecting data sets in the fields of communication, automobiles, chemical industry, transportation industry, sales and the like, labeling each data set according to visualization practices in BI analysis software, enumerating all possible visualization results, and labeling all 21453 visualization results respectively by combining the BI software. Finally, 680 meaningful visualization results are obtained. Then 21453 collected data are used for model training by adopting a Decision Tree (DT), a support vector machine (svm) and a Bayes classifier, and then 6 test sets (shown in the table 1) are input to obtain the test accuracy shown in the table (3). The DT method has the highest accuracy on 6 test data sets, the average accuracy is 0.8609, the Bayes accuracy is the lowest, and the average accuracy is 0.7223. The output of the three classifiers is the judgment of all possible visualization results, 1 represents a 'meaningful' visualization result, 0 represents a 'meaningless' visualization result, and marks the index of the 'meaningless' visualization result in the data set, and the type of the meaningful visualization representation can be positioned through the index.
Watch (3)
The classifier adopts a decision tree and a support vector machine and naive Bayes to construct a classification model for data in visualization practice, the visualization result is divided into a meaningful result and a meaningless result, the meaningless result is discarded when new visualization exploration is carried out, the meaningful visualization result is reserved, and the result is recommended to a user.
The ID3 algorithm in the decision tree uses information gain as an attribute selection metric, and the method is based on the concept of entropy in information theory, wherein the entropy measures the uncertainty of things, and the more uncertain things have larger entropy. The desired information required to assemble the tuples in D is given by the formula:
wherein p isiIs that any tuple in D belongs to CiIs the average amount of information needed to identify the class label of the tuple in D, the tuple in D is divided by some attribute A, where A has v different values { a }1,a2,…,avApplied to v outputs on A, D can be divided into v partitions or subsets { D with attribute A1,D2,…,DvIn which D isjContaining tuples in D, their A value being aj;
Since partitions may contain tuples from different classes, more information is also needed for more accurate classification:
whereinServing as a weight for the jth partition. InfoA(D) The method is based on expected information required by classifying D tuples according to A division, and the smaller the required expected information is, the higher the partition purity is;
the information gain is defined as the difference between the original information requirement and the new information requirement, i.e. the information gain is defined as the difference between the original information requirement and the new information requirement
Gain(A)=Info(D)-InfoA(D)
The attribute a with the highest gain (a) is selected as the classification attribute for node N.
SVM classification is a machine learning method established on the basis of a statistical learning theory, is suitable for linear divisible samples and non-divisible samples, and is provided with samples (x)i,yi)(xi∈Rd;yi∈{-1,+1};i=1,2,...,n), xiIs a feature vector, yiAnd if the sample is a class label, converting the problem into a solution convex quadratic optimization problem if the sample is linearly separable, and obtaining a formula:
wherein ω is a weight vector; c is a penalty factor; xiiIs a relaxation factor; b is an offset; obtaining dual description of the optimization problem by Lagrange operator, satisfying yi[(ω·xi)+b]Under the condition of 1, a classification decision function can be obtained as follows:
if the samples are linearly inseparable, the samples in the input space can be mapped into a high-dimensional linearly separable feature space through nonlinear mapping, and the optimal classification decision function in the feature vector is obtained by utilizing the kernel function, and the method comprises the following steps:
The Bayes classifier is a probability classifier, when classifying data according to a plurality of features, the plurality of features can be assumed to be independent, then each classification probability is obtained by utilizing a conditional probability multiplication method, then the probability is selected as the judgment of a machine, and a group of training data sets { (x) is given1,y1),(x2,y2),…,(xm,ym) Where m is the number of samples, each data set contains n features, i.e. Xi=(xi1,xi2,…,xim) And class label set is { y1,y2,…,ykJudging the category of a new sample, namely solving the maximum posterior probability argmaxp (y | x):
since the denominator of formula is for each p (y ═ y)i| x) are the same, and the final discrimination formula is:
s5, improving accuracy by adopting the idea of ensemble learning:
as shown in fig. 1, in order to further improve the accuracy, an ensemble learning method is adopted to train a model combining three simple classifiers, the class with the largest number of votes obtained is labeled as an output result by a relative majority voting method (pluralitic voting), and if a plurality of class labels all obtain the highest number of votes, one class label is randomly selected as the output.
The models and the accuracy are shown in the table (4), and the classification accuracy of the integrated learning model is slightly improved relative to the DT model with the highest accuracy as can be seen from the table.
Watch (4)
The above embodiments are only specific ones of the present invention, and the scope of the present invention includes but is not limited to the above embodiments, and any suitable changes or substitutions according to the claims of a chart visualization recommendation method of the present invention and by any person of ordinary skill in the art should fall within the scope of the present invention.
Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.
Claims (10)
1. A chart visualization recommendation method is characterized in that firstly, a plurality of data features and corresponding meaningful visualization chart types are extracted from a real visualization data set, then classifiers are used for training classification models respectively, meaningful visualization is learned from the classification models, a test set is used for carrying out accuracy test, and finally, the results of the classifiers are fused to select the meaningful charts suitable for the data set.
2. A chart visualization recommendation method according to claim 1, wherein said classifier employs decision tree, support vector machine and naive bayes, said three classifiers construct a classification model for data in visualization practice, classify visualization results into meaningful and meaningless, discard meaningless results when performing new visualization exploration, retain meaningful visualization results and recommend results to user.
3. A chart visualization recommendation method according to claim 2, characterized in that the ID3 algorithm in the decision tree uses information gain as the attribute selection metric, entropy measures the uncertainty of things, the more uncertain things its entropy is, the desired information needed to assign the tuple in D is given by the formula:
wherein p isiIs that any tuple in D belongs to CiIs the average amount of information needed to identify the class label of the tuple in D, the tuple in D is divided by some attribute A, where A has v different values { a }1,a2,…,avApplied to v outputs on A, D can be divided into v partitions or subsets { D with attribute A1,D2,…,DvIn which D isjContaining tuples in D, their A value being aj;
Since partitions may contain tuples from different classes, more information is also needed for more accurate classification:
whereinServing as a weight for the jth partition. InfoA(D) The method is based on expected information required by classifying D tuples according to A division, and the smaller the required expected information is, the higher the partition purity is;
the information gain is defined as the difference between the original information requirement and the new information requirement, i.e. the information gain is defined as the difference between the original information requirement and the new information requirement
Gain(A)=Info(D)-InfoA(D)
The attribute a with the highest gain (a) is selected as the classification attribute for node N.
4. A visual chart recommendation method according to claim 3, wherein said SVM classification is a machine learning method based on statistical learning theory, and is applicable to linear separable and inseparable samples, and is provided with samples (x)i,yi)(xi∈Rd;yi∈{-1,+1};i=1,2,...,n),xiIs a feature vector, yiAnd if the sample is a class label, converting the problem into a solution convex quadratic optimization problem if the sample is linearly separable, and obtaining a formula:
wherein ω is a weight vector; c is a penalty factor; xiiIs a relaxation factor; b is an offset; obtaining dual description of the optimization problem by Lagrange operator, satisfying yi[(ω·xi)+b]Under the condition of 1, a classification decision function can be obtained as follows:
if the samples are linearly inseparable, the samples in the input space can be mapped into a high-dimensional linearly separable feature space through nonlinear mapping, and the optimal classification decision function in the feature vector is obtained by utilizing the kernel function, and the method comprises the following steps:
5. According to claim 4The chart visualization recommendation method is characterized in that the Bayesian classifier is a probability classifier, when data are classified according to a plurality of features, the plurality of features can be assumed to be independent, then each classification probability is obtained by utilizing a conditional probability multiplication rule, then the probability is the maximum, the probability is selected as the judgment of a machine, and a group of training data sets { (x) is given1,y1),(x2,y2),…,(xm,ym) Where m is the number of samples, each data set contains n features, i.e. Xi=(xi1,xi2,…,xim) And class label set is { y1,y2,…,ykJudging the category of a new sample, namely solving the maximum posterior probability argmaxp (y | x):
since the denominator of formula is for each p (y ═ y)i| x) are the same, and the final discrimination formula is:
6. a chart visualization recommendation method according to claim 5, wherein when collecting data sets, using BI analysis tools, the data with the best visualization experiment effect usually contains 10 attribute columns, and the chart with the better visualization effect is marked and displayed.
7. The chart visualization recommendation method according to claim 6, wherein when extracting the data features, for the training data set, the description of the column attribute features is divided into:
length, type, category (c), qualitative (q), and temporal (t) classified;
for the classification data, count0, radio, entrypy and gini of different values are counted;
for numerical data, a plurality of statistical features including maximum, minimum, mean, median, mode, variance, standard, median absolute deviation and distribution are calculated;
many of the paired-column features depend on a single column type determined by single-column feature extraction.
8. A chart visualization recommendation method according to claim 7, wherein in determining meaningful visualization chart types, 0-3 labels are respectively used for bar chart, pie chart, line chart and scatter chart to respectively enumerate four visualization chart types of single column and double column and corresponding meaningful visualization results, and for a data set with m column attributes, there are 4m chart display types in each column and 2m (m-1) possible chart display types between each two columns.
9. The chart visualization recommendation method according to claim 8, wherein when the accuracy test is performed by using the test set, labels are attached to each data set according to visualization practices in the BI analysis software, all possible visualization results are enumerated simultaneously, and all visualization results are respectively attached with labels in combination with the BI analysis software, so that meaningful visualization results are finally obtained;
and finally, the output of the three classifiers is the judgment of all possible visualization results, 1 represents a 'meaningful' visualization result, 0 represents a 'meaningless' visualization result, indexes of the visualization results in the data set are marked, and the type of a meaningful visual representation chart can be positioned through the indexes.
10. A chart visualization recommendation method according to claim 9, wherein in order to further improve the accuracy of finding a meaningful chart suitable for a data set, an ensemble learning method is adopted, a model combining three simple classifiers is trained, the class with the largest number of votes obtained is labeled as an output result by a relative majority voting method, and if a plurality of class labels all obtain the highest number of votes, one class label is randomly selected as an output.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111065907.2A CN113868493A (en) | 2021-09-13 | 2021-09-13 | Visual chart recommendation method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111065907.2A CN113868493A (en) | 2021-09-13 | 2021-09-13 | Visual chart recommendation method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113868493A true CN113868493A (en) | 2021-12-31 |
Family
ID=78995333
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111065907.2A Pending CN113868493A (en) | 2021-09-13 | 2021-09-13 | Visual chart recommendation method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113868493A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117540107A (en) * | 2024-01-09 | 2024-02-09 | 浙江同花顺智能科技有限公司 | Data processing method and device, electronic equipment and storage medium |
-
2021
- 2021-09-13 CN CN202111065907.2A patent/CN113868493A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117540107A (en) * | 2024-01-09 | 2024-02-09 | 浙江同花顺智能科技有限公司 | Data processing method and device, electronic equipment and storage medium |
CN117540107B (en) * | 2024-01-09 | 2024-05-07 | 浙江同花顺智能科技有限公司 | Data processing method and device, electronic equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Waegeman et al. | ROC analysis in ordinal regression learning | |
CA2886581C (en) | Method and system for analysing sentiments | |
CN106570109B (en) | Method for automatically generating question bank knowledge points through text analysis | |
Govindasamy et al. | Analysis of student academic performance using clustering techniques | |
Yang et al. | Using weighted k-means to identify Chinese leading venture capital firms incorporating with centrality measures | |
CN115359873B (en) | Control method for operation quality | |
CN112347352A (en) | Course recommendation method and device and storage medium | |
CN115312183A (en) | Intelligent interpretation method and system for medical inspection report | |
Hric et al. | Stochastic block model reveals maps of citation patterns and their evolution in time | |
CN113868493A (en) | Visual chart recommendation method | |
CN114817454A (en) | NLP knowledge graph construction method combining information content and BERT-BilSTM-CRF | |
Wang et al. | SpecVAT: Enhanced visual cluster analysis | |
Nanayakkara et al. | Evaluation measure for group-based record linkage | |
Mazanec et al. | Usage patterns of advanced analytical methods in tourism research 1988–2008: A six journal survey | |
Elouataoui et al. | An End-to-End Big Data Deduplication Framework based on Online Continuous Learning | |
Suerdem | Multidimensional scaling of qualitative data | |
Papayiannis et al. | On clustering uncertain and structured data with Wasserstein barycenters and a geodesic criterion for the number of clusters | |
CN112488236B (en) | Integrated unsupervised student behavior clustering method | |
Abdelfattah | Variables Selection Procedure for the DEA Overall Efficiency Assessment Based Plithogenic Sets and Mathematical Programming | |
Cohen et al. | Factor analysis, cluster analysis and structural equation modelling | |
Dey | Prediction and analysis of student performance by data mining in WEKA | |
Ćurić et al. | Improvement of hierarchical clustering results by refinement of variable types and distance measures | |
An et al. | Multi-Attribute Classification of Text Documents as a Tool for Ranking and Categorization of Educational Innovation Projects | |
WO2024131524A1 (en) | Depression diet management method based on food image segmentation | |
Dobrska et al. | Ordinal regression with continuous pairwise preferences |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |