CN113868493A - Visual chart recommendation method - Google Patents

Visual chart recommendation method Download PDF

Info

Publication number
CN113868493A
CN113868493A CN202111065907.2A CN202111065907A CN113868493A CN 113868493 A CN113868493 A CN 113868493A CN 202111065907 A CN202111065907 A CN 202111065907A CN 113868493 A CN113868493 A CN 113868493A
Authority
CN
China
Prior art keywords
visualization
chart
meaningful
data
classification
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111065907.2A
Other languages
Chinese (zh)
Inventor
魏世超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inspur Communication Information System Co Ltd
Original Assignee
Inspur Communication Information System Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inspur Communication Information System Co Ltd filed Critical Inspur Communication Information System Co Ltd
Priority to CN202111065907.2A priority Critical patent/CN113868493A/en
Publication of CN113868493A publication Critical patent/CN113868493A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/904Browsing; Visualisation therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/9035Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/906Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/254Fusion techniques of classification results, e.g. of results related to same input data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • G06F18/24155Bayesian classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to the field of chart visualization, and particularly provides a chart visualization recommendation method. Compared with the prior art, the method has the advantages that the most meaningful visual results are learned from numerous visual practical data sets, marked and indexed, the meaningful visual types are found by searching the index, the problems that visual type operations are responsible and numerous, and visual result redundancy caused by the large enumeration search space is solved, meanwhile, the method can be integrated on data analysis software as a chart visual recommendation engine, and the usability of the data analysis software is improved.

Description

Visual chart recommendation method
Technical Field
The invention relates to the field of chart visualization, and particularly provides a chart visualization recommendation method.
Background
Data visualization is used by an increasing number of people as an important means of data analysis. The goal of visualization recommendation is to automatically generate results for analysts to explore and select by some technical means to reduce visualization obstacles. However, data visualization presents certain difficulties for most people who are not specialized in visualization techniques.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides a chart visualization recommendation method with strong practicability.
The technical scheme adopted by the invention for solving the technical problems is as follows:
a chart visualization recommendation method includes the steps of firstly, extracting a plurality of data features and corresponding meaningful visualization chart types from a real visualization data set, then respectively training a classification model by using classifiers, learning meaningful visualization from the classification model, testing accuracy by using a test set, and finally, fusing results of the plurality of classifiers to select a meaningful chart suitable for the data set.
Furthermore, the classifier adopts a decision tree and a support vector machine and naive Bayes, the three classifiers construct a classification model for data in visualization practice, the visualization result is divided into meaningful and meaningless, when new visualization exploration is carried out, the meaningless result is discarded, the meaningful visualization result is reserved, and the result is recommended to a user.
Further, the ID3 algorithm in the decision tree uses information gain as an attribute selection metric, entropy measures the uncertainty of things, the more uncertain things its entropy is, the more desirable information needed to assemble the tuples in D is given by the formula:
Figure BDA0003258353270000021
wherein p isiIs that any tuple in D belongs to CiIs the average amount of information needed to identify the class label of the tuple in D, the tuple in D is divided by some attribute A, where A has v different values { a }1,a2,…,avApplied to v outputs on A, D can be divided into v partitions or subsets { D with attribute A1,D2,…,DvIn which D isjContaining tuples in D, theirA has a value of aj
Since partitions may contain tuples from different classes, more information is also needed for more accurate classification:
Figure BDA0003258353270000022
wherein
Figure BDA0003258353270000031
Serving as a weight for the jth partition. InfoA(D) The method is based on expected information required by classifying D tuples according to A division, and the smaller the required expected information is, the higher the partition purity is;
the information gain is defined as the difference between the original information requirement and the new information requirement, i.e. the information gain is defined as the difference between the original information requirement and the new information requirement
Gain(A)=Info(D)-InfoA(D)
The attribute a with the highest gain (a) is selected as the classification attribute for node N.
Further, the SVM classification is a machine learning method established on the basis of a statistical learning theory, is suitable for linear separable samples and non-separable samples, and is provided with samples (x)i,yi)(xi∈Rd;yi∈{-1,+1};i=1,2,...,n),xiIs a feature vector, yiAnd if the sample is a class label, converting the problem into a solution convex quadratic optimization problem if the sample is linearly separable, and obtaining a formula:
Figure BDA0003258353270000033
wherein ω is a weight vector; c is a penalty factor; xiiIs a relaxation factor; b is an offset; obtaining dual description of the optimization problem by Lagrange operator, satisfying yi[(ω·xi)+b]Under the condition of 1, a classification decision function can be obtained as follows:
Figure BDA0003258353270000034
if the samples are linearly inseparable, the samples in the input space can be mapped into a high-dimensional linearly separable feature space through nonlinear mapping, and the optimal classification decision function in the feature vector is obtained by utilizing the kernel function, and the method comprises the following steps:
Figure BDA0003258353270000041
in the formula
Figure BDA0003258353270000042
Is a kernel function.
Further, the bayesian classifier is a probability classifier, when classifying data according to a plurality of features, the plurality of features are assumed to be independent from each other, then each classification probability is obtained by using a conditional probability multiplication method, then the probability with the maximum probability is selected as the judgment of a machine, and a set of training data sets { (x) is given1,y1),(x2,y2),…,(xm,ym) Where m is the number of samples, each data set contains n features, i.e. Xi=(xi1,xi2,…,xim) And class label set is { y1,y2,…,ykJudging the category of a new sample, namely solving the maximum posterior probability argmaxp (y | x):
Figure BDA0003258353270000043
since the denominator of formula is for each p (y ═ y)i| x) are the same, and the final discrimination formula is:
Figure BDA0003258353270000044
further, when the data set is collected, the data with the best visualization experiment effect usually contains 10 attribute columns by using a BI analysis tool, and the graph with the better visualization effect is marked and displayed.
Further, when extracting the data features, for the training data set, the description of the column attribute features is divided into:
length, type, category (c), qualitative (q), and temporal (t) classified;
for the classification data, count0, radio, entrypy and gini of different values are counted;
for numerical data, a plurality of statistical features including maximum, minimum, mean, median, mode, variance, standard, median absolute deviation and distribution are calculated;
many of the paired-column features depend on a single column type determined by single-column feature extraction.
Further, in determining meaningful visual chart types, respectively marking a bar chart, a pie chart, a line chart and a scatter chart with 0-3, respectively enumerating four visual chart types of a single column and a double column and corresponding meaningful visual results, wherein for a data set with m column attributes, each column has 4m chart display types, and 2m (m-1) possible chart display types exist between each two columns.
Furthermore, when the accuracy rate is tested by using the test set, labels are attached to each data set according to the visualization practice in the BI analysis software, all possible visualization results are enumerated simultaneously, and the labels are respectively attached to all the visualization results by combining the BI software, so that a meaningful visualization result is finally obtained;
and finally, the output of the three classifiers is the judgment of all possible visualization results, 1 represents a 'meaningful' visualization result, 0 represents a 'meaningless' visualization result, indexes of the visualization results in the data set are marked, and the type of a meaningful visual representation chart can be positioned through the indexes.
Further, in order to further improve the accuracy of finding a deliberate chart suitable for a data set, an ensemble learning method is adopted, a model combining three simple classifiers is trained, the class with the largest number of votes is marked as an output result through a relative majority voting method, and if a plurality of class marks all obtain the highest number of votes, one class label is randomly selected as the output.
Compared with the prior art, the chart visualization recommendation method has the following outstanding beneficial effects:
the method and the device learn the most significant visual results from numerous visual practical data sets, mark the most significant visual results and establish indexes, find the significant visual types by searching the indexes, and avoid the problems that visual type operations are responsible and numerous and the visual result redundancy caused by the large enumeration search space is solved.
The adopted classifier effectively learns the meaningful visual result and obtains good accuracy performance on the test set.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
FIG. 1 is a schematic diagram of an ensemble learning model in a chart visualization recommendation method.
Detailed Description
The present invention will be described in further detail with reference to specific embodiments in order to better understand the technical solutions of the present invention. It is to be understood that the described embodiments are merely exemplary of the invention, and not restrictive of the full scope of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
A preferred embodiment is given below:
according to the chart visualization recommendation method, firstly, a plurality of data features and corresponding meaningful visualization chart types are extracted from a real visualization data set, then classifiers are used for training classification models respectively, meaningful visualization is learned from the classification models, a test set is used for accuracy test, and finally, a plurality of classifier results are fused to select the meaningful charts suitable for the data set.
The method comprises the following specific steps:
s1, when the data set is collected:
with the aid of the BI analysis tools used in work, a large number of data visualization results are accumulated in long-term data visualization practices, in which the number of columns and rows of data sets is also very different, although some data sets contain hundreds of columns of attributes, most data sets are smaller than 25 columns, the data set with the best visualization practice effect usually contains about 10 attribute columns, and the graph with the better visualization effect is marked and displayed. These data sets typically contain time series attributes, classification attributes and numerical attributes, wherein the particulars of a portion of the data set are shown in table (1):
Figure BDA0003258353270000081
watch (1)
S2, when data features are extracted:
for the training data set, the characterization of the column attributes is listed in table (2), and these features can be classified into 6 classes:
length is the number of data lines.
type is a data type divided into categorical (C), temporal (Q), and quantitative (temporal (T)).
For the classification data, the number of different values thereof (count0), the ratio (radio), the entropy (entrypy), and the kini coefficient (gini) were counted.
For numerical data, a number of statistical features are calculated, including maximum, minimum, mean, median, mode, variance, standard, median absolute deviation, and distribution.
Many of the paired-column features depend on a single column type determined by single-column feature extraction. For example, Pearson's correlation coefficient requires two columns of digits, χ2Two classification columns are required.
Figure BDA0003258353270000091
S3, when determining the type of the meaningful visual chart:
in daily visualization practice, more than 85% of the visualization results can be represented by bar graphs (bar), pie graphs (pie), line graphs (line), or scatter graphs (scatter), and only the recommendation of four visualization graphs is considered herein, and the bar graphs, pie graphs, line graphs, and scatter graphs are labeled with 0-3, respectively, enumerating four visualization graph types, single-column and two-column, respectively, and the corresponding "meaningful" visualization results, for a dataset with m column attributes, 4m graph presentation types per column, and 2m (m-1) possible graph presentation types between each two columns.
S4, when the accuracy test is carried out by using the test set:
the method comprises the steps of selecting data sets in the fields of communication, automobiles, chemical industry, transportation industry, sales and the like, labeling each data set according to visualization practices in BI analysis software, enumerating all possible visualization results, and labeling all 21453 visualization results respectively by combining the BI software. Finally, 680 meaningful visualization results are obtained. Then 21453 collected data are used for model training by adopting a Decision Tree (DT), a support vector machine (svm) and a Bayes classifier, and then 6 test sets (shown in the table 1) are input to obtain the test accuracy shown in the table (3). The DT method has the highest accuracy on 6 test data sets, the average accuracy is 0.8609, the Bayes accuracy is the lowest, and the average accuracy is 0.7223. The output of the three classifiers is the judgment of all possible visualization results, 1 represents a 'meaningful' visualization result, 0 represents a 'meaningless' visualization result, and marks the index of the 'meaningless' visualization result in the data set, and the type of the meaningful visualization representation can be positioned through the index.
Figure BDA0003258353270000101
Watch (3)
The classifier adopts a decision tree and a support vector machine and naive Bayes to construct a classification model for data in visualization practice, the visualization result is divided into a meaningful result and a meaningless result, the meaningless result is discarded when new visualization exploration is carried out, the meaningful visualization result is reserved, and the result is recommended to a user.
The ID3 algorithm in the decision tree uses information gain as an attribute selection metric, and the method is based on the concept of entropy in information theory, wherein the entropy measures the uncertainty of things, and the more uncertain things have larger entropy. The desired information required to assemble the tuples in D is given by the formula:
Figure BDA0003258353270000111
wherein p isiIs that any tuple in D belongs to CiIs the average amount of information needed to identify the class label of the tuple in D, the tuple in D is divided by some attribute A, where A has v different values { a }1,a2,…,avApplied to v outputs on A, D can be divided into v partitions or subsets { D with attribute A1,D2,…,DvIn which D isjContaining tuples in D, their A value being aj
Since partitions may contain tuples from different classes, more information is also needed for more accurate classification:
Figure BDA0003258353270000112
wherein
Figure BDA0003258353270000113
Serving as a weight for the jth partition. InfoA(D) The method is based on expected information required by classifying D tuples according to A division, and the smaller the required expected information is, the higher the partition purity is;
the information gain is defined as the difference between the original information requirement and the new information requirement, i.e. the information gain is defined as the difference between the original information requirement and the new information requirement
Gain(A)=Info(D)-InfoA(D)
The attribute a with the highest gain (a) is selected as the classification attribute for node N.
SVM classification is a machine learning method established on the basis of a statistical learning theory, is suitable for linear divisible samples and non-divisible samples, and is provided with samples (x)i,yi)(xi∈Rd;yi∈{-1,+1};i=1,2,...,n), xiIs a feature vector, yiAnd if the sample is a class label, converting the problem into a solution convex quadratic optimization problem if the sample is linearly separable, and obtaining a formula:
Figure BDA0003258353270000121
wherein ω is a weight vector; c is a penalty factor; xiiIs a relaxation factor; b is an offset; obtaining dual description of the optimization problem by Lagrange operator, satisfying yi[(ω·xi)+b]Under the condition of 1, a classification decision function can be obtained as follows:
Figure BDA0003258353270000122
if the samples are linearly inseparable, the samples in the input space can be mapped into a high-dimensional linearly separable feature space through nonlinear mapping, and the optimal classification decision function in the feature vector is obtained by utilizing the kernel function, and the method comprises the following steps:
Figure BDA0003258353270000123
in the formula
Figure BDA0003258353270000124
Is a kernel function.
The Bayes classifier is a probability classifier, when classifying data according to a plurality of features, the plurality of features can be assumed to be independent, then each classification probability is obtained by utilizing a conditional probability multiplication method, then the probability is selected as the judgment of a machine, and a group of training data sets { (x) is given1,y1),(x2,y2),…,(xm,ym) Where m is the number of samples, each data set contains n features, i.e. Xi=(xi1,xi2,…,xim) And class label set is { y1,y2,…,ykJudging the category of a new sample, namely solving the maximum posterior probability argmaxp (y | x):
Figure BDA0003258353270000131
since the denominator of formula is for each p (y ═ y)i| x) are the same, and the final discrimination formula is:
Figure BDA0003258353270000132
s5, improving accuracy by adopting the idea of ensemble learning:
as shown in fig. 1, in order to further improve the accuracy, an ensemble learning method is adopted to train a model combining three simple classifiers, the class with the largest number of votes obtained is labeled as an output result by a relative majority voting method (pluralitic voting), and if a plurality of class labels all obtain the highest number of votes, one class label is randomly selected as the output.
The models and the accuracy are shown in the table (4), and the classification accuracy of the integrated learning model is slightly improved relative to the DT model with the highest accuracy as can be seen from the table.
Figure BDA0003258353270000133
Watch (4)
The above embodiments are only specific ones of the present invention, and the scope of the present invention includes but is not limited to the above embodiments, and any suitable changes or substitutions according to the claims of a chart visualization recommendation method of the present invention and by any person of ordinary skill in the art should fall within the scope of the present invention.
Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims (10)

1. A chart visualization recommendation method is characterized in that firstly, a plurality of data features and corresponding meaningful visualization chart types are extracted from a real visualization data set, then classifiers are used for training classification models respectively, meaningful visualization is learned from the classification models, a test set is used for carrying out accuracy test, and finally, the results of the classifiers are fused to select the meaningful charts suitable for the data set.
2. A chart visualization recommendation method according to claim 1, wherein said classifier employs decision tree, support vector machine and naive bayes, said three classifiers construct a classification model for data in visualization practice, classify visualization results into meaningful and meaningless, discard meaningless results when performing new visualization exploration, retain meaningful visualization results and recommend results to user.
3. A chart visualization recommendation method according to claim 2, characterized in that the ID3 algorithm in the decision tree uses information gain as the attribute selection metric, entropy measures the uncertainty of things, the more uncertain things its entropy is, the desired information needed to assign the tuple in D is given by the formula:
Figure FDA0003258353260000011
wherein p isiIs that any tuple in D belongs to CiIs the average amount of information needed to identify the class label of the tuple in D, the tuple in D is divided by some attribute A, where A has v different values { a }1,a2,…,avApplied to v outputs on A, D can be divided into v partitions or subsets { D with attribute A1,D2,…,DvIn which D isjContaining tuples in D, their A value being aj
Since partitions may contain tuples from different classes, more information is also needed for more accurate classification:
Figure FDA0003258353260000021
wherein
Figure FDA0003258353260000022
Serving as a weight for the jth partition. InfoA(D) The method is based on expected information required by classifying D tuples according to A division, and the smaller the required expected information is, the higher the partition purity is;
the information gain is defined as the difference between the original information requirement and the new information requirement, i.e. the information gain is defined as the difference between the original information requirement and the new information requirement
Gain(A)=Info(D)-InfoA(D)
The attribute a with the highest gain (a) is selected as the classification attribute for node N.
4. A visual chart recommendation method according to claim 3, wherein said SVM classification is a machine learning method based on statistical learning theory, and is applicable to linear separable and inseparable samples, and is provided with samples (x)i,yi)(xi∈Rd;yi∈{-1,+1};i=1,2,...,n),xiIs a feature vector, yiAnd if the sample is a class label, converting the problem into a solution convex quadratic optimization problem if the sample is linearly separable, and obtaining a formula:
Figure FDA0003258353260000031
wherein ω is a weight vector; c is a penalty factor; xiiIs a relaxation factor; b is an offset; obtaining dual description of the optimization problem by Lagrange operator, satisfying yi[(ω·xi)+b]Under the condition of 1, a classification decision function can be obtained as follows:
Figure FDA0003258353260000032
if the samples are linearly inseparable, the samples in the input space can be mapped into a high-dimensional linearly separable feature space through nonlinear mapping, and the optimal classification decision function in the feature vector is obtained by utilizing the kernel function, and the method comprises the following steps:
Figure FDA0003258353260000033
in the formula
Figure FDA0003258353260000034
Is a kernel function.
5. According to claim 4The chart visualization recommendation method is characterized in that the Bayesian classifier is a probability classifier, when data are classified according to a plurality of features, the plurality of features can be assumed to be independent, then each classification probability is obtained by utilizing a conditional probability multiplication rule, then the probability is the maximum, the probability is selected as the judgment of a machine, and a group of training data sets { (x) is given1,y1),(x2,y2),…,(xm,ym) Where m is the number of samples, each data set contains n features, i.e. Xi=(xi1,xi2,…,xim) And class label set is { y1,y2,…,ykJudging the category of a new sample, namely solving the maximum posterior probability argmaxp (y | x):
Figure FDA0003258353260000041
since the denominator of formula is for each p (y ═ y)i| x) are the same, and the final discrimination formula is:
Figure FDA0003258353260000042
6. a chart visualization recommendation method according to claim 5, wherein when collecting data sets, using BI analysis tools, the data with the best visualization experiment effect usually contains 10 attribute columns, and the chart with the better visualization effect is marked and displayed.
7. The chart visualization recommendation method according to claim 6, wherein when extracting the data features, for the training data set, the description of the column attribute features is divided into:
length, type, category (c), qualitative (q), and temporal (t) classified;
for the classification data, count0, radio, entrypy and gini of different values are counted;
for numerical data, a plurality of statistical features including maximum, minimum, mean, median, mode, variance, standard, median absolute deviation and distribution are calculated;
many of the paired-column features depend on a single column type determined by single-column feature extraction.
8. A chart visualization recommendation method according to claim 7, wherein in determining meaningful visualization chart types, 0-3 labels are respectively used for bar chart, pie chart, line chart and scatter chart to respectively enumerate four visualization chart types of single column and double column and corresponding meaningful visualization results, and for a data set with m column attributes, there are 4m chart display types in each column and 2m (m-1) possible chart display types between each two columns.
9. The chart visualization recommendation method according to claim 8, wherein when the accuracy test is performed by using the test set, labels are attached to each data set according to visualization practices in the BI analysis software, all possible visualization results are enumerated simultaneously, and all visualization results are respectively attached with labels in combination with the BI analysis software, so that meaningful visualization results are finally obtained;
and finally, the output of the three classifiers is the judgment of all possible visualization results, 1 represents a 'meaningful' visualization result, 0 represents a 'meaningless' visualization result, indexes of the visualization results in the data set are marked, and the type of a meaningful visual representation chart can be positioned through the indexes.
10. A chart visualization recommendation method according to claim 9, wherein in order to further improve the accuracy of finding a meaningful chart suitable for a data set, an ensemble learning method is adopted, a model combining three simple classifiers is trained, the class with the largest number of votes obtained is labeled as an output result by a relative majority voting method, and if a plurality of class labels all obtain the highest number of votes, one class label is randomly selected as an output.
CN202111065907.2A 2021-09-13 2021-09-13 Visual chart recommendation method Pending CN113868493A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111065907.2A CN113868493A (en) 2021-09-13 2021-09-13 Visual chart recommendation method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111065907.2A CN113868493A (en) 2021-09-13 2021-09-13 Visual chart recommendation method

Publications (1)

Publication Number Publication Date
CN113868493A true CN113868493A (en) 2021-12-31

Family

ID=78995333

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111065907.2A Pending CN113868493A (en) 2021-09-13 2021-09-13 Visual chart recommendation method

Country Status (1)

Country Link
CN (1) CN113868493A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117540107A (en) * 2024-01-09 2024-02-09 浙江同花顺智能科技有限公司 Data processing method and device, electronic equipment and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117540107A (en) * 2024-01-09 2024-02-09 浙江同花顺智能科技有限公司 Data processing method and device, electronic equipment and storage medium
CN117540107B (en) * 2024-01-09 2024-05-07 浙江同花顺智能科技有限公司 Data processing method and device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
Waegeman et al. ROC analysis in ordinal regression learning
CN106570109B (en) Method for automatically generating question bank knowledge points through text analysis
Yang et al. A feature-metric-based affinity propagation technique for feature selection in hyperspectral image classification
CA2886581A1 (en) Method and system for analysing sentiments
Govindasamy et al. Analysis of student academic performance using clustering techniques
CN115312183A (en) Intelligent interpretation method and system for medical inspection report
CN112347352A (en) Course recommendation method and device and storage medium
Hric et al. Stochastic block model reveals maps of citation patterns and their evolution in time
CN113868493A (en) Visual chart recommendation method
CN107992613A (en) A kind of Text Mining Technology protection of consumers' rights index analysis method based on machine learning
Wang et al. SpecVAT: Enhanced visual cluster analysis
Nanayakkara et al. Evaluation measure for group-based record linkage
Suerdem Multidimensional scaling of qualitative data
CN114238439B (en) Task-driven relational data view recommendation method based on joint embedding
Abdelfattah Variables Selection Procedure for the DEA Overall Efficiency Assessment Based Plithogenic Sets and Mathematical Programming
Papayiannis et al. On clustering uncertain and structured data with Wasserstein barycenters and a geodesic criterion for the number of clusters
CN114817454A (en) NLP knowledge graph construction method combining information content and BERT-BilSTM-CRF
Barret et al. Predicting the Environment of a Neighborhood: A Use Case for France.
Cohen et al. Factor analysis, cluster analysis and structural equation modelling
Dey Prediction and analysis of student performance by data mining in WEKA
Gnaldi et al. The role of extended IRT models for composite indicators construction
Ćurić et al. Improvement of hierarchical clustering results by refinement of variable types and distance measures
CN113139143B (en) Web page table data and relational database data integration method oriented to smart campus
Gensch et al. Issues and advances in product positioning models in marketing research
PART Department of Information Technology

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination