CN110489449B - Chart recommendation method and device and electronic equipment - Google Patents

Chart recommendation method and device and electronic equipment Download PDF

Info

Publication number
CN110489449B
CN110489449B CN201910693374.9A CN201910693374A CN110489449B CN 110489449 B CN110489449 B CN 110489449B CN 201910693374 A CN201910693374 A CN 201910693374A CN 110489449 B CN110489449 B CN 110489449B
Authority
CN
China
Prior art keywords
field
combined
aggregation
target
chart
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910693374.9A
Other languages
Chinese (zh)
Other versions
CN110489449A (en
Inventor
刘译璟
于帮付
代其锋
肖洋
徐林杰
赵丹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Percent Technology Group Co ltd
Original Assignee
Beijing Percent Technology Group Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Percent Technology Group Co ltd filed Critical Beijing Percent Technology Group Co ltd
Priority to CN201910693374.9A priority Critical patent/CN110489449B/en
Publication of CN110489449A publication Critical patent/CN110489449A/en
Application granted granted Critical
Publication of CN110489449B publication Critical patent/CN110489449B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24553Query execution of query operations
    • G06F16/24554Unary operations; Data partitioning operations
    • G06F16/24556Aggregation; Duplicate elimination
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2457Query processing with adaptation to user needs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/26Visual data mining; Browsing structured data

Abstract

The embodiment of the specification discloses a chart recommendation method, a chart recommendation device and electronic equipment, wherein the chart recommendation scheme mainly comprises the following steps: inputting a combined field determined based on a dimension field and an index field into a preset aggregation model in a word vector mode, and determining a target aggregation function corresponding to the combined field; and generating at least one chart based on the at least one combined field, the target aggregation function corresponding to the combined field and the target chart type, and recommending the chart. Therefore, dimensions and indexes in the recommended chart have significance, the target aggregation function is predicted in a word vector mode, and the accuracy and the effectiveness of chart recommendation are improved.

Description

Chart recommendation method and device and electronic equipment
Technical Field
The specification relates to the technical field of computer software, in particular to a chart recommendation method and device and electronic equipment.
Background
At present, in order to visually display a worksheet of a user, data in the worksheet is generally visualized in a chart form.
However, the existing chart recommendation scheme is too general to recommend an accurate and effective chart according to the worksheet of the user.
Disclosure of Invention
The embodiment of the specification aims to provide a chart recommendation method, a chart recommendation device and electronic equipment so as to improve the accuracy and effectiveness of chart recommendation.
In order to solve the above technical problem, the embodiments of the present specification are implemented as follows:
in a first aspect, a chart recommendation method is provided, including:
determining at least one combined field based on a set of data fields of a user's worksheet, wherein each combined field comprises a dimension field for describing data and an index field for measuring data;
inputting the combined field into a preset aggregation model in a word vector mode, and determining a target aggregation function corresponding to the combined field;
generating at least one chart based on the at least one combined field, a target aggregation function corresponding to the combined field and a target chart type, wherein the target chart type is determined according to a preset adaptation rule based on the data field set;
the chart is recommended.
In a second aspect, a chart recommendation apparatus is provided, including:
a first determination module that determines at least one combined field based on a set of data fields of a user's worksheet, wherein each combined field includes a dimension field for describing data and an index field for measuring data;
the second determining module is used for inputting the combined field to a preset aggregation model in a word vector mode and determining a target aggregation function corresponding to the combined field;
a generating module, configured to generate at least one graph based on the at least one combined field, a target aggregation function corresponding to the combined field, and a target graph type, where the target graph type is determined according to a preset adaptation rule based on the data field set;
and the recommending module is used for recommending the chart.
In a third aspect, an electronic device is provided, including:
a processor; and
a memory arranged to store computer executable instructions that, when executed, cause the processor to:
determining at least one combined field based on a set of data fields of a user's worksheet, wherein each combined field comprises a dimension field for describing data and an index field for measuring data;
inputting the combined field into a preset aggregation model in a word vector mode, and determining a target aggregation function corresponding to the combined field;
generating at least one chart based on the at least one combined field, a target aggregation function corresponding to the combined field and a target chart type, wherein the target chart type is determined according to a preset adaptation rule based on the data field set;
the chart is recommended.
In a fourth aspect, a computer-readable storage medium is presented, the computer-readable storage medium storing one or more programs that, when executed by an electronic device that includes a plurality of application programs, cause the electronic device to:
determining at least one combined field based on a set of data fields of a user's worksheet, wherein each combined field comprises a dimension field for describing data and an index field for measuring data;
inputting the combined field into a preset aggregation model in a word vector mode, and determining a target aggregation function corresponding to the combined field;
generating at least one chart based on the at least one combined field, a target aggregation function corresponding to the combined field and a target chart type, wherein the target chart type is determined according to a preset adaptation rule based on the data field set;
the chart is recommended.
According to the technical scheme provided by the embodiment of the specification, the combined field determined based on the dimension field and the index field is input to the preset aggregation model in a word vector mode, and the target aggregation function corresponding to the combined field is determined; and generating at least one chart based on the at least one combined field, the target aggregation function corresponding to the combined field and the target chart type, and recommending the chart. Therefore, dimensions and indexes in the recommended chart have significance, the target aggregation function is predicted in a word vector mode, and the accuracy and the effectiveness of chart recommendation are improved.
Drawings
In order to more clearly illustrate the embodiments of the present specification or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, it is obvious that the drawings in the following description are only some embodiments described in the present specification, and for those skilled in the art, other drawings can be obtained according to the drawings without any creative effort.
Fig. 1 is one of steps of a chart recommendation method provided in an embodiment of the present specification.
FIG. 2 is a schematic diagram of a principle of predicting an objective aggregation function using a preset aggregation model — LSTM model according to an embodiment of the present disclosure.
Fig. 3 is a second schematic diagram illustrating steps of a chart recommendation method according to an embodiment of the present disclosure.
Fig. 4 is a schematic step diagram of a chart recommendation device provided in an embodiment of the present specification.
Fig. 5 is a schematic structural diagram of an electronic device provided in an embodiment of the present specification.
Detailed Description
In order to make those skilled in the art better understand the technical solutions in the present specification, the technical solutions in the embodiments of the present specification will be clearly and completely described below with reference to the drawings in the embodiments of the present specification, and it is obvious that the described embodiments are only a part of the embodiments of the present specification, and not all of the embodiments. All other embodiments obtained by a person skilled in the art based on the embodiments in the present specification without any inventive step should fall within the scope of protection of the present specification.
Example one
Referring to fig. 1, a schematic diagram of steps of a chart recommendation method provided in an embodiment of the present disclosure is shown, where the chart recommendation method may include the following steps:
step 102: determining at least one combined field based on a set of data fields of a user's worksheet; wherein each of the combination fields includes a dimension field for describing data and an index field for measuring data.
In this embodiment, the data analysis management module may be utilized to obtain all data fields from the user's worksheet and determine the data fields as a set of data fields. These data fields may include field names, field types, enumerated values for the fields, field sample data, and so forth.
It is contemplated that a graph may generally include dimensions and indices, where dimensions refer to descriptive attributes or characteristics of objects that may specify different values, primarily for describing data. The index refers to specific dimension elements which can be measured according to total number or ratio and the like, and is mainly used for measuring data. For example, the dimensions of the geographic location may include: "Brand", "department", or "City name"; the dimension of "city name" may be "san Francisco", "Berlin" or "Singapore". For another example, the dimension "city" is associated with the "population" category index, the data value of the "city" dimension is "hong kong", and the "population" category index may be "total number of residents" or "total number of female residents". As another example, the dimension "city name" is a pair of combinations with the index "population", then the contents of the combined field or chart may be "hong kong" has "740 ten thousand" populations; alternatively, "Singapore" has "560 ten thousand" population, etc.
Some of the data fields contained in the set of data fields determined from the worksheet may be dimensions, and some may be indicators. Although the dimension and the index can be used independently, in order to ensure that the data displayed has meaning and richness in chart types, the dimension and the index can be used in combination and determined as a combination field.
Optionally, step 102, in determining at least one combined field based on the set of data fields of the user's worksheet, may specifically be performed as:
the method comprises the following steps that firstly, data fields in a set of the data fields are classified into a dimension field class and an index field class;
and secondly, combining any dimension field in the dimension field types with any index field in the index field types to obtain at least one combined field.
For example, assume that the data fields in the set of data fields include: "province", "year", "profit" and "cost". Classifying the four data fields into a dimension field class and a metric field class, wherein the dimension field in the dimension field class comprises: "province", "year"; the index field in the index field class includes: "profit", "cost". Any dimension field and any index field are combined to obtain province-profit, province-cost, year-profit and year-cost.
In fact, when the dimension field and the index field are combined, exhaustive combination can be performed in the above manner to ensure that all dimensions and indexes can be covered.
Optionally, in an embodiment of this specification, classifying the data fields in the data field set into a dimension field class and an index field class may be specifically implemented as:
classifying the data fields in the data field set into a dimension field class and an index field class according to the field types; the dimension field in the dimension field class comprises a text field and a date field, and the index field in the index field class comprises a numerical value field and a text field.
Considering that the index field is provided with a preset aggregation function, while numerical fields such as an identity card, a telephone number and the like cannot be aggregated, and meanwhile, some text fields in the dimension field have no representation significance, the data fields can be filtered, and specifically, a two-classification method of a support vector machine can be adopted to filter the data fields in the dimension field and/or the index field which do not have the representation significance. For example, an SVM binary method may be used to remove the value fields that cannot be aggregated in the indicator field.
In addition, the method of checking enumerated values or sample data can be used to check whether the dimension fields are all numerical values, for example, the number is a text field, but the field is not suitable for being used as a dimension and has no representation meaning, so that the dimension fields can be eliminated.
It should be understood that for the index field being a text field, the LSTM model output aggregation function is COUNT or OTHER; wherein OTHER indicates that this combined field is not appropriate; for example, for the combined field "city-province", the index is a text field, and the LSTM model outputs results as otecher, since it is not logical to calculate how many provinces per city, but how many cities per province are reasonable, the combined field "city-province" can be culled.
Step 104: and inputting the combined field into a preset aggregation model in a word vector mode to obtain a target aggregation function corresponding to the combined field.
It should be understood that the pre-set aggregation model may be a Long Short-Term Memory (LSTM) model that is trained in advance. The LSTM model is a time-recursive neural network suitable for processing and predicting significant events of relatively long intervals and delays in a time series.
The LSTM model structure used in the embodiment of the specification is that a plurality of word vectors are input, and an aggregation function type is output; specifically, referring to fig. 2, it is described that the word vector of a single word is input into the model in the graph, and compared with the learning prediction by using the word vector in the prior art, the prediction result is more accurate because the Chinese word has a larger magnitude, the Chinese characters can be exhaustive, the parameters are fewer, and the learning effect is better. For example, when the input is "area name, profit", the 7 characters (6 characters and 1 symbol) are input into the LSTM model in a word vector manner for prediction to obtain the probability of each aggregation type, and the maximum probability value is selected as the target aggregation function corresponding to the current combination field. Wherein the aggregation type may include: SUM function, average AVG function, COUNT function. The dimension of the word vector may be 128 dimensions or 64 dimensions, etc., and the embodiments in this specification do not limit this. It should be understood that in the embodiment of the present specification, the specific internal structure of the LSTM model may be shown with reference to a in fig. 2, and the connection parts and connection relationships of the internal structure are not described in detail again, and may be implemented with reference to the prior art.
In the LSTM model training process, historical input is used for predicting aggregation function types, each aggregation type has specific probability (such as SUM is 0.7, AVG is 0.3), a cross entropy loss function is calculated according to probability distribution and real results (such as SUM is 0 and AVG is 1), then a gradient descent method is used for optimizing the loss function to update system parameters, and a learning effect is achieved. It should be understood that the model training in this step can be performed with reference to existing training schemes, which are not limited in this application, and the composition field should be input in a word vector manner.
Because the number of sample data required by training is limited, and may be only hundreds, and a model is not enough to learn a polymerization function with enough capability and accurate prediction, the embodiment of the specification tries to train a numerical field and a text field separately, so that the prediction effect is improved remarkably, and the accuracy is improved by 80% and 90% from the original 70%. In other words, when training using sample data, the numeric fields are trained independently to obtain one aggregate model, while the text fields are trained independently to obtain another aggregate model.
Specifically, the preset aggregation model comprises a first aggregation model obtained based on numerical field training and a second aggregation model obtained based on text field training; step 106 may be specifically executed when the combined field is input to a preset aggregation model in a word vector manner to obtain a target aggregation function corresponding to the combined field:
judging whether the index field in the combined field is a numerical value type;
if yes, inputting the combined field into a first type of aggregation model in a word vector mode to obtain a target aggregation function corresponding to the combined field;
otherwise, inputting the combined field into a second type aggregation model in a word vector mode to obtain a target aggregation function corresponding to the combined field.
The numerical field and the text field are trained separately to obtain two types of polymerization models, and the numerical field and the text field are predicted respectively based on the two types of polymerization models, so that the field types are separated in the training stage, different polymerization models are used for different field types, and the prediction accuracy of the polymerization models is further improved.
When the target aggregation function is obtained by using the preset aggregation model, after the combined field is input, the probability corresponding to each aggregation function can be output, and specifically, the aggregation function with the maximum probability value can be selected as the target aggregation function. For example, if a word vector sequence combining the fields "region-profit" is input and the probability of prediction as SUM is 0.8 and the probability of prediction as AVG is 0.2, then the probabilistic max aggregation function may be selected as the prediction result, SUM.
Optionally, after determining the plurality of combined fields, the method further comprises:
combining the plurality of combined fields with the same dimension field and index field similarity larger than a threshold value again; and the combined field obtained by recombination is used for determining the type of the target chart according to a preset adaptation rule.
In specific implementation, the combined fields with the same dimensionality can be found firstly, and then other combinations with similar or consistent index field name lengths are found for indexes in each combined field; it should be understood that the name length here is the name length in chinese for chinese and the word length after word segmentation for english and french; for example, for the province-male population, the combined fields of province-female population and province-rural population, rather than province-male population, with the same index field name and length can be searched; the cosine similarity of the different index fields is then calculated using the word vector, and the two index fields that are most similar are taken to recombine the two combined fields, e.g., the two combined fields "province-male population" and "province-female population" into one combined field "province- (male population; female population)".
Step 106: and generating at least one chart based on the at least one combined field, the target aggregation function corresponding to the combined field and a target chart type, wherein the target chart type is determined according to a preset adaptation rule based on the data field set.
After determining the combined field and the target aggregation function corresponding to the combined field, the corresponding graph may be rendered on the display interface based on the target graph type. The number of the rendered graphs can be determined based on the number of types of the target graphs, and in fact, the number of the rendered graphs is also related to the number of the combined fields, the number of the combined fields is large, and the number of the corresponding rendered graphs is large.
The target graph type is determined according to a preset adaptation rule based on the data field set; the preset adaptation rule may be determined according to an empirical value or according to user preference, or may be determined according to a chart attribute. For example, if there is only one index, an index card may be used; if only one dimension and one index exist, a pie chart, a bar chart, a horizontal bar chart, a word cloud and the like can be selectively used according to the enumeration value number of the dimension field; if the dimension is a time field, a line graph may be used; if the dimension is province, a map may be used.
Wherein the chart types may include: an indicator card, pie chart, bar chart, horizontal bar chart, word cloud, line chart, map, etc.
Step 108: and recommending the chart.
After recommending a chart to the user, the user may select a chart therefrom and further render the selected chart using the specific data in the worksheet, i.e., visually present the data in the worksheet using the selected chart.
Optionally, in this embodiment of the present specification, when recommending a chart, step 108 may be specifically performed as:
firstly, sequencing a plurality of generated charts; then, a chart with the sorting sequence number larger than N is recommended, wherein N is a positive integer. In the embodiment of the present specification, N may take a value of 10 or 20, and may specifically be set according to a display requirement of a user and a system setting requirement.
Specifically, the sorting of the generated plurality of charts may include:
based on the value of the sorting attribute, distributing corresponding weight to the sorting attribute of each chart;
calculating a sum of weights for each graph based on the weights;
ranking the plurality of graphs based on the calculated sum of weights;
wherein the sorting attribute comprises at least one or more of the following combinations:
dimension de-duplication counting; dimension sample length; word frequency of the dimension field; word frequency of the indicator field; confusion of the combined field; a chart type; similarity to chart name.
Taking the confusion of the combined field as an example, whether the combined field is common or not is judged according to the language model, such as that "product-profit" is more common than "business-unit price". The language model can be a binary language model, and at least supports three languages of the middle English and the French. The language model is also obtained based on statistics, the binary language model statistics is that the probability of a current word such as eating followed by a certain word such as apple is calculated based on the statistics of the number of times of appearance of eating apple and the number of times of appearance of eating, and the probability P of eating followed by apple can be calculated by using a Bayes formula (apple 'eating'). For another example, for the probability P of "profit for different product", the "profit for different product" may be participated as: the words "different", "different products", "profits of products", etc. use bayesian formulas to derive the probability P of "profits of different products" (profits of different products) P ("different"), "P (" product "| product"), "P (" profit "|"). According to the word segmentation probability of the combined field, the higher the probability is, the higher the weight is, and on the contrary, the lower the weight is.
Taking the word frequency of the index field as an example, the word frequency is calculated based on big data. For example, two fields, which are also indicators, of "profit" are more frequent than "tax", and the weight is assigned according to the frequency of the words, so that if the frequency of the words is high, the weight is high, and otherwise, the weight is low.
It is undesirable to show with dimension sample lengths, e.g., the length of the value of the dimension field, that the chart readability is reduced and the assigned weight is low.
In fact, during sorting, several graphs which are continuously adjacent in the sequence are considered to be the same graph type, in order to avoid continuous occurrence of three times or more, graphs which are interspersed with different graph types can be considered to be sorted during sorting, for example, 20 graphs are obtained according to the sorting attribute, wherein the graphs with the sorting sequence numbers of 1-5 are all column graphs, the graphs with the sorting sequence numbers of 6-8 are all broken line graphs, and the other graphs are sorted more uniformly. Then, the first 5 charts can be inserted among the charts with the sequence number of 6-20, and similarly, the line charts with the sequence number of 6-8 are similarly processed, so that the charts with different chart types can be uniformly inserted among the charts, the charts with different chart types can be uniformly distributed in the whole sequence, the charts recommended to the user are richer, and the chart types are prevented from being single.
It should be noted that, in the embodiment of the present specification, the sorting operation is unsupervised, and may support online optimization, and each component of the sorting (for example, an implementation module corresponding to each sorting attribute, or other modules participating in the sorting) has a weight set by someone, and if a user stores a certain chart in use, the user has positive feedback on the system, and the system may correspondingly increase the weight of the chart.
Optionally, after determining the set of data fields in the user's worksheet, before determining at least one combined field based on the set of data fields, referring to fig. 3, the method further comprises:
step 110: performing word segmentation processing on the data fields in the data field set;
step 112: and predicting a language model corresponding to the chart recommending operation based on the probability of the language type of the word field obtained by word segmentation.
In fact, the above-described preferred steps can be understood as a language identification operation because, considering that the data fields of the user's worksheet may be in english and the system language in chinese, the language to which the data fields belong can be determined before the subsequent operations are performed. The judging method can be that all data fields are traversed, word segmentation processing is carried out on the data fields, if the data fields have Chinese, the Chinese score is added with 1, if the data fields are all English, the English score is added with 1, and the French and the like are carried out; and finally, comparing the scores of all languages, wherein the language with the highest score is the identified language corresponding to the worksheet.
It should be appreciated that this step identifies the language and that subsequent operations may perform a series of operations based on the identified language.
According to the technical scheme, a combined field determined based on a dimension field and an index field is input to a preset aggregation model in a word vector mode, and a target aggregation function corresponding to the combined field is determined; and generating at least one chart based on the at least one combined field, the target aggregation function corresponding to the combined field and the target chart type, and recommending the chart. Therefore, dimensions and indexes in the recommended chart have significance, the target aggregation function is predicted in a word vector mode, and the accuracy and the effectiveness of chart recommendation are improved.
Example two
Fig. 4 is a schematic structural diagram of a chart recommendation device 200 according to an embodiment of the present disclosure. Referring to FIG. 4, in one software implementation, the chart recommendation device 200 may include:
a first determination module 202 for determining at least one combined field based on a set of data fields of a user's worksheet, wherein each combined field comprises a dimension field for describing data and an index field for measuring data;
a second determining module 204, configured to input the combined field to a preset aggregation model in a word vector manner, and determine a target aggregation function corresponding to the combined field;
a generating module 206, configured to generate at least one graph based on the at least one combined field, a target aggregation function corresponding to the combined field, and a target graph type, where the target graph type is determined according to a preset adaptation rule based on the data field set;
a recommending module 208 for recommending the chart.
According to the technical scheme, a combined field determined based on a dimension field and an index field is input to a preset aggregation model in a word vector mode, and a target aggregation function corresponding to the combined field is determined; and generating at least one chart based on the at least one combined field, the target aggregation function corresponding to the combined field and the target chart type, and recommending the chart. Therefore, dimensions and indexes in the recommended chart have significance, the target aggregation function is predicted in a word vector mode, and the accuracy and the effectiveness of chart recommendation are improved.
Optionally, as an embodiment, the first determining module 202 is specifically configured to:
classifying the data fields in the data field set into a dimension field class and an index field class;
and combining any dimension field in the dimension field types with any index field in the index field types to obtain at least one combined field.
In a specific implementation manner of the embodiment of this specification, when classifying the data fields in the data field set into the dimension field class and the index field class, the first determining module 202 may specifically be configured to:
classifying the data fields in the data field set into a dimension field class and an index field class according to the field types; the dimension field in the dimension field class comprises a text field and a date field, and the index field in the index field class comprises a numerical value field and a text field.
In a specific implementation manner of the embodiment of the present specification, the chart recommendation apparatus 200 further includes:
and the filtering module is used for filtering the data fields which do not have the representation significance in the dimension field class and/or the index field class by adopting a support vector machine binary classification method.
In a specific implementation manner of the embodiment of the present specification, after determining the plurality of combined fields, the first determining module 202 may further be configured to:
combining the plurality of combined fields with the same dimension field and index field similarity larger than a threshold value again;
and the combined field obtained by recombination is used for determining the type of the target chart according to a preset adaptation rule.
In a specific implementation manner of the embodiment of the present specification, the preset aggregation model includes a first aggregation model obtained based on numerical field training and a second aggregation model obtained based on text field training; the second determining module 204 is specifically configured to:
judging whether the index field in the combined field is a numerical value type;
if yes, inputting the combined field into a first type of aggregation model in a word vector mode, and determining a target aggregation function corresponding to the combined field;
otherwise, inputting the combined field into a second type of aggregation model in a word vector mode, and determining a target aggregation function corresponding to the combined field.
In a specific implementation manner of the embodiment of the present specification, the recommending module 208 is specifically configured to:
sorting the generated plurality of charts;
and recommending a chart with the sequencing serial number larger than N, wherein N is a positive integer.
In a specific implementation manner of the embodiment of the present specification, when sorting the generated multiple charts, the recommending module 208 may specifically be configured to:
based on the value of the sorting attribute, distributing corresponding weight to the sorting attribute of each chart;
calculating a sum of weights for each graph based on the weights;
ranking the plurality of graphs based on the calculated sum of weights;
wherein the sorting attribute comprises at least one or more of the following combinations: dimension de-duplication counting; dimension sample length; word frequency of the dimension field; word frequency of the indicator field; confusion of the combined field; a chart type; similarity to chart name.
In a specific implementation manner of the embodiment of the present specification, the chart recommendation apparatus 200 further includes:
a language identification module, configured to perform word segmentation processing on data fields in a data field set before the first determination module 202 determines at least one combined field based on a set of data fields of a user's worksheet; and predicting a language model corresponding to the chart recommendation operation based on the probability of the language type of the word field obtained by word segmentation.
It should be understood that the chart recommendation apparatus in the embodiments of this specification may also perform the method performed by the chart recommendation apparatus (or device) in fig. 1 and 3, and implement the functions of the chart recommendation apparatus (or device) in the embodiments shown in fig. 1 and 3, which are not described herein again.
EXAMPLE III
Fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present specification. Referring to fig. 5, at a hardware level, the electronic device includes a processor, and optionally further includes an internal bus, a network interface, and a memory. The Memory may include a Memory, such as a Random-Access Memory (RAM), and may further include a non-volatile Memory, such as at least 1 disk Memory. Of course, the electronic device may also include hardware required for other services.
The processor, the network interface, and the memory may be connected to each other via an internal bus, which may be an ISA (Industry Standard Architecture) bus, a PCI (Peripheral Component Interconnect) bus, an EISA (Extended Industry Standard Architecture) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one double-headed arrow is shown in FIG. 5, but this does not indicate only one bus or one type of bus.
And the memory is used for storing programs. In particular, the program may include program code comprising computer operating instructions. The memory may include both memory and non-volatile storage and provides instructions and data to the processor.
The processor reads the corresponding computer program from the nonvolatile memory into the memory and then runs the computer program to form the shared resource access control device on the logic level. The processor is used for executing the program stored in the memory and is specifically used for executing the following operations:
determining at least one combined field based on a set of data fields of a user's worksheet, wherein each combined field comprises a dimension field for describing data and an index field for measuring data;
inputting the combined field into a preset aggregation model in a word vector mode, and determining a target aggregation function corresponding to the combined field;
generating at least one chart based on the at least one combined field, a target aggregation function corresponding to the combined field and a target chart type, wherein the target chart type is determined according to a preset adaptation rule based on the data field set;
the chart is recommended.
According to the technical scheme, a combined field determined based on a dimension field and an index field is input to a preset aggregation model in a word vector mode, and a target aggregation function corresponding to the combined field is determined; and generating at least one chart based on the at least one combined field, the target aggregation function corresponding to the combined field and the target chart type, and recommending the chart. Therefore, dimensions and indexes in the recommended chart have significance, the target aggregation function is predicted in a word vector mode, and the accuracy and the effectiveness of chart recommendation are improved.
The method performed by the graph recommendation device disclosed in the embodiments of fig. 1 and fig. 3 in this specification may be applied to or implemented by a processor. The processor may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware in a processor or instructions in the form of software. The Processor may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but also Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components. The various methods, steps and logic blocks disclosed in the embodiments of the present specification may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of a method disclosed in connection with the embodiments of the present specification may be embodied directly in a hardware decoding processor, or in a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in a memory, and a processor reads information in the memory and completes the steps of the method in combination with hardware of the processor.
The electronic device may further execute the method in fig. 1, and implement the functions of the chart recommendation device in the embodiments shown in fig. 1 and fig. 3, which are not described herein again in this specification.
Of course, besides the software implementation, the electronic device of the embodiment of the present disclosure does not exclude other implementations, such as a logic device or a combination of software and hardware, and the like, that is, the execution subject of the following processing flow is not limited to each logic unit, and may also be hardware or a logic device.
Example four
Embodiments of the present specification also propose a computer-readable storage medium storing one or more programs, the one or more programs comprising instructions, which when executed by a portable electronic device comprising a plurality of application programs, are capable of causing the portable electronic device to perform the method of the embodiment shown in fig. 1, and in particular for performing the method of:
determining at least one combined field based on a set of data fields of a user's worksheet, wherein each combined field comprises a dimension field for describing data and an index field for measuring data;
inputting the combined field into a preset aggregation model in a word vector mode, and determining a target aggregation function corresponding to the combined field;
generating at least one chart based on the at least one combined field, a target aggregation function corresponding to the combined field and a target chart type, wherein the target chart type is determined according to a preset adaptation rule based on the data field set;
the chart is recommended.
According to the technical scheme, a combined field determined based on a dimension field and an index field is input to a preset aggregation model in a word vector mode, and a target aggregation function corresponding to the combined field is determined; and generating at least one chart based on the at least one combined field, the target aggregation function corresponding to the combined field and the target chart type, and recommending the chart. Therefore, dimensions and indexes in the recommended chart have significance, the target aggregation function is predicted in a word vector mode, and the accuracy and the effectiveness of chart recommendation are improved.
In short, the above description is only a preferred embodiment of the present disclosure, and is not intended to limit the scope of the present disclosure. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present specification shall be included in the protection scope of the present specification.
The systems, devices, modules or units illustrated in the above embodiments may be implemented by a computer chip or an entity, or by a product with certain functions. One typical implementation device is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smartphone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.
Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the system embodiment, since it is substantially similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.

Claims (10)

1. A chart recommendation method, comprising:
determining at least one combined field based on a set of data fields of a user's worksheet, wherein each combined field comprises a dimension field for describing data and an index field for measuring data;
inputting the combined field into a preset aggregation model in a word vector mode, and determining a target aggregation function corresponding to the combined field;
generating at least one chart based on the at least one combined field, a target aggregation function corresponding to the combined field and a target chart type, wherein the target chart type is determined according to a preset adaptation rule based on the data field set; after the combination fields and the target aggregation functions corresponding to the combination fields are determined, corresponding graphs are rendered on a display interface based on the types of the target graphs, the number of the rendered graphs is determined based on the number of the types of the target graphs, the number of the rendered graphs is also related to the number of the combination fields, the number of the combination fields is large, and the number of the correspondingly rendered graphs is large;
recommending the chart;
separately training a numerical field and a text field, wherein when sample data is used for training, the numerical field is independently trained to obtain one aggregation model, and the text field is independently trained to obtain the other aggregation model; the preset aggregation model comprises a first aggregation model obtained based on numerical field training and a second aggregation model obtained based on text field training;
inputting the combined field into a preset aggregation model in a word vector manner, and determining a target aggregation function corresponding to the combined field, specifically comprising:
judging whether the index field in the combined field is a numerical value type;
if yes, inputting the combined field into a first type of aggregation model in a word vector mode, and determining a target aggregation function corresponding to the combined field;
otherwise, inputting the combined field into a second type of aggregation model in a word vector mode, and determining a target aggregation function corresponding to the combined field;
the method comprises the steps of training a numerical field and a text field separately to obtain two types of aggregation models, and predicting the numerical field and the text field respectively based on the two types of aggregation models, so that the field types are separated in a training stage, and different aggregation models are used for different field types;
when a target aggregation function is obtained by using a preset aggregation model, after the combined field is input, the probability corresponding to each aggregation function is output, and the aggregation function with the maximum probability value is selected as the target aggregation function.
2. The method of claim 1, wherein determining at least one combined field based on a set of data fields of a user's worksheet comprises:
determining a set of data fields of a user's worksheet;
classifying the data fields in the set of data fields into a dimension field class and an index field class;
and combining any dimension field in the dimension field types with any index field in the index field types to obtain at least one combined field.
3. The method of claim 2, wherein classifying the data fields in the set of data fields into a dimension field class and a metric field class specifically comprises:
classifying the data fields in the data field set into a dimension field class and an index field class according to the field types; the dimension field in the dimension field class comprises a text field and a date field, and the index field in the index field class comprises a numerical value field and a text field.
4. The method of claim 3, further comprising:
and filtering data fields which do not have representation significance in the dimension field class and/or the index field class by adopting a support vector machine binary classification method.
5. The method of claim 1, wherein after determining the plurality of combined fields, the method further comprises:
combining the plurality of combined fields with the same dimension field and index field similarity larger than a threshold value again;
and the combined field obtained by recombination is used for determining the type of the target chart according to a preset adaptation rule.
6. The method of claim 1, wherein recommending the chart specifically comprises:
sorting the generated plurality of charts;
and recommending a chart with the sequencing serial number larger than N, wherein N is a positive integer.
7. The method of claim 6, wherein ranking the generated plurality of charts comprises:
based on the value of the sorting attribute, distributing corresponding weight to the sorting attribute of each chart;
calculating a sum of weights for each graph based on the weights;
ranking the plurality of graphs based on the calculated sum of weights;
wherein the sorting attribute comprises at least one or more of the following combinations:
dimension de-duplication counting; dimension sample length; word frequency of the dimension field; word frequency of the indicator field; confusion of the combined field; a chart type; similarity to chart name.
8. The method of any of claims 1-7, wherein after determining a set of data fields in a user's worksheet, before determining at least one combined field based on the set of data fields, the method further comprises:
performing word segmentation processing on the data fields in the data field set;
and predicting a language model corresponding to the chart recommending operation based on the probability of the language type of the word field obtained by word segmentation.
9. A chart recommendation device, comprising:
a first determination module that determines at least one combined field based on a set of data fields of a user's worksheet, wherein each combined field includes a dimension field for describing data and an index field for measuring data;
the second determining module is used for inputting the combined field to a preset aggregation model in a word vector mode and determining a target aggregation function corresponding to the combined field;
a generating module, configured to generate at least one graph based on the at least one combined field, a target aggregation function corresponding to the combined field, and a target graph type, where the target graph type is determined according to a preset adaptation rule based on the data field set; after the combination fields and the target aggregation functions corresponding to the combination fields are determined, corresponding graphs are rendered on a display interface based on the types of the target graphs, the number of the rendered graphs is determined based on the number of the types of the target graphs, the number of the rendered graphs is also related to the number of the combination fields, the number of the combination fields is large, and the number of the correspondingly rendered graphs is large;
the recommendation module is used for recommending the chart;
separately training a numerical field and a text field, wherein when sample data is used for training, the numerical field is independently trained to obtain one aggregation model, and the text field is independently trained to obtain the other aggregation model; the preset aggregation model comprises a first aggregation model obtained based on numerical field training and a second aggregation model obtained based on text field training; the second determining module is specifically configured to:
judging whether the index field in the combined field is a numerical value type;
if yes, inputting the combined field into a first type of aggregation model in a word vector mode, and determining a target aggregation function corresponding to the combined field;
otherwise, inputting the combined field into a second type of aggregation model in a word vector mode, and determining a target aggregation function corresponding to the combined field;
the method comprises the steps of training a numerical field and a text field separately to obtain two types of aggregation models, and predicting the numerical field and the text field respectively based on the two types of aggregation models, so that the field types are separated in a training stage, and different aggregation models are used for different field types;
when a target aggregation function is obtained by using a preset aggregation model, after the combined field is input, the probability corresponding to each aggregation function is output, and the aggregation function with the maximum probability value is selected as the target aggregation function.
10. An electronic device, comprising:
a processor; and
a memory arranged to store computer executable instructions that, when executed, cause the processor to:
determining at least one combined field based on a set of data fields of a user's worksheet, wherein each combined field comprises a dimension field for describing data and an index field for measuring data;
inputting the combined field into a preset aggregation model in a word vector mode, and determining a target aggregation function corresponding to the combined field;
generating at least one chart based on the at least one combined field, a target aggregation function corresponding to the combined field and a target chart type, wherein the target chart type is determined according to a preset adaptation rule based on the data field set; after the combination fields and the target aggregation functions corresponding to the combination fields are determined, corresponding graphs are rendered on a display interface based on the types of the target graphs, the number of the rendered graphs is determined based on the number of the types of the target graphs, the number of the rendered graphs is also related to the number of the combination fields, the number of the combination fields is large, and the number of the correspondingly rendered graphs is large;
recommending the chart;
separately training a numerical field and a text field, wherein when sample data is used for training, the numerical field is independently trained to obtain one aggregation model, and the text field is independently trained to obtain the other aggregation model; the preset aggregation model comprises a first aggregation model obtained based on numerical field training and a second aggregation model obtained based on text field training;
inputting the combined field into a preset aggregation model in a word vector manner, and determining a target aggregation function corresponding to the combined field, specifically comprising:
judging whether the index field in the combined field is a numerical value type;
if yes, inputting the combined field into a first type of aggregation model in a word vector mode, and determining a target aggregation function corresponding to the combined field;
otherwise, inputting the combined field into a second type of aggregation model in a word vector mode, and determining a target aggregation function corresponding to the combined field;
the method comprises the steps of training a numerical field and a text field separately to obtain two types of aggregation models, and predicting the numerical field and the text field respectively based on the two types of aggregation models, so that the field types are separated in a training stage, and different aggregation models are used for different field types;
when a target aggregation function is obtained by using a preset aggregation model, after the combined field is input, the probability corresponding to each aggregation function is output, and the aggregation function with the maximum probability value is selected as the target aggregation function.
CN201910693374.9A 2019-07-30 2019-07-30 Chart recommendation method and device and electronic equipment Active CN110489449B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910693374.9A CN110489449B (en) 2019-07-30 2019-07-30 Chart recommendation method and device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910693374.9A CN110489449B (en) 2019-07-30 2019-07-30 Chart recommendation method and device and electronic equipment

Publications (2)

Publication Number Publication Date
CN110489449A CN110489449A (en) 2019-11-22
CN110489449B true CN110489449B (en) 2022-02-22

Family

ID=68548618

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910693374.9A Active CN110489449B (en) 2019-07-30 2019-07-30 Chart recommendation method and device and electronic equipment

Country Status (1)

Country Link
CN (1) CN110489449B (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111460102B (en) * 2020-03-31 2022-09-09 成都数之联科技股份有限公司 Chart recommendation system and method based on natural language processing
CN112015774B (en) * 2020-09-25 2023-08-29 北京百度网讯科技有限公司 Chart recommending method and device, electronic equipment and storage medium
CN112256789B (en) * 2020-10-19 2022-06-17 杭州比智科技有限公司 Intelligent visual data analysis method and device
CN113763502B (en) * 2020-11-13 2024-04-16 北京京东尚科信息技术有限公司 Chart generation method, device, equipment and storage medium
CN112434198A (en) * 2020-11-24 2021-03-02 深圳市明源云科技有限公司 Chart component recommendation method and device
CN112749224A (en) * 2020-12-31 2021-05-04 清华大学 Task-oriented visual recommendation method and device
CN116089474B (en) * 2023-03-07 2023-08-04 深圳市明源云科技有限公司 Data caching method, device, equipment and medium in custom editing mode
CN117350276B (en) * 2023-12-05 2024-02-13 卓世未来(天津)科技有限公司 Data enhancement method and system

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107180117A (en) * 2017-06-30 2017-09-19 东软集团股份有限公司 Chart recommends method, device and computer equipment
CN108268435A (en) * 2016-12-30 2018-07-10 北京国双科技有限公司 Chart matching process and device
CN109101631A (en) * 2018-08-14 2018-12-28 成都四方伟业软件股份有限公司 Data Modeling Method and device
CN109145277A (en) * 2018-08-24 2019-01-04 东软集团股份有限公司 Chart generation method, device, storage medium and electronic equipment
CN109446221A (en) * 2018-10-29 2019-03-08 北京百分点信息科技有限公司 A kind of interactive data method for surveying based on semantic analysis

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105117442B (en) * 2015-08-12 2018-05-04 东北大学 A kind of big data querying method based on probability

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108268435A (en) * 2016-12-30 2018-07-10 北京国双科技有限公司 Chart matching process and device
CN107180117A (en) * 2017-06-30 2017-09-19 东软集团股份有限公司 Chart recommends method, device and computer equipment
CN109101631A (en) * 2018-08-14 2018-12-28 成都四方伟业软件股份有限公司 Data Modeling Method and device
CN109145277A (en) * 2018-08-24 2019-01-04 东软集团股份有限公司 Chart generation method, device, storage medium and electronic equipment
CN109446221A (en) * 2018-10-29 2019-03-08 北京百分点信息科技有限公司 A kind of interactive data method for surveying based on semantic analysis

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
"基于大众信息源的城市管理商业智能系统设计与实现";张璐;《万方》;20140917;论文正文第2-4章 *
"多维数据可视化在应用软件统计分析中的研究";陈维民;《中国优秀硕士学位论文全文数据库 信息科技辑》;20190115;论文正文第2-4章 *

Also Published As

Publication number Publication date
CN110489449A (en) 2019-11-22

Similar Documents

Publication Publication Date Title
CN110489449B (en) Chart recommendation method and device and electronic equipment
CN112380859A (en) Public opinion information recommendation method and device, electronic equipment and computer storage medium
CN110334356A (en) Article matter method for determination of amount, article screening technique and corresponding device
CN110928992A (en) Text search method, text search device, text search server and storage medium
CN111061979A (en) User label pushing method and device, electronic equipment and medium
US20230045330A1 (en) Multi-term query subsumption for document classification
CN111966886A (en) Object recommendation method, object recommendation device, electronic equipment and storage medium
CN105989066A (en) Information processing method and device
Widiyaningtyas et al. Sentiment Analysis Of Hotel Review Using N-Gram And Naive Bayes Methods
CN109522275B (en) Label mining method based on user production content, electronic device and storage medium
CN108563786B (en) Text classification and display method and device, computer equipment and storage medium
CN109783175B (en) Application icon management method and device, readable storage medium and terminal equipment
CN110866000B (en) Data quality evaluation method and device, electronic equipment and storage medium
US11487835B2 (en) Information processing system, information processing method, and program
CN113705201B (en) Text-based event probability prediction evaluation algorithm, electronic device and storage medium
CN115203556A (en) Score prediction model training method and device, electronic equipment and storage medium
CN111090805A (en) Recommendation index attribution method and device and electronic equipment
CN111191049B (en) Information pushing method and device, computer equipment and storage medium
CN112732891A (en) Office course recommendation method and device, electronic equipment and medium
CN109344386B (en) Text content identification method, apparatus, device and computer readable storage medium
CN111311201A (en) Intelligent project matching analysis tool and implementation method thereof
CN110941714A (en) Classification rule base construction method, application classification method and device
CN111159398B (en) Method and device for identifying merchant types
CN113407680B (en) Heterogeneous integrated model screening method and electronic equipment
CN111259209B (en) User intention prediction method based on artificial intelligence, electronic device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Address after: 100081 No.101, 1st floor, building 14, 27 Jiancai Chengzhong Road, Haidian District, Beijing

Applicant after: Beijing PERCENT Technology Group Co.,Ltd.

Address before: 100081 16 / F, block a, Beichen Century Center, building 2, courtyard 8, Beichen West Road, Chaoyang District, Beijing

Applicant before: BEIJING BAIFENDIAN INFORMATION SCIENCE & TECHNOLOGY Co.,Ltd.

GR01 Patent grant
GR01 Patent grant