WO2019127744A1 - 一种olap数据模型自动建模的方法、分类器 - Google Patents
一种olap数据模型自动建模的方法、分类器 Download PDFInfo
- Publication number
- WO2019127744A1 WO2019127744A1 PCT/CN2018/073320 CN2018073320W WO2019127744A1 WO 2019127744 A1 WO2019127744 A1 WO 2019127744A1 CN 2018073320 W CN2018073320 W CN 2018073320W WO 2019127744 A1 WO2019127744 A1 WO 2019127744A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- query
- column
- modes
- rule
- difference degree
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
- G06F16/211—Schema design and management
- G06F16/212—Schema design and management with details for data modelling support
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/25—Integrating or interfacing systems involving database management systems
- G06F16/254—Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/2282—Tablespace storage structures; Management thereof
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2455—Query execution
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2465—Query processing support for facilitating data mining operations in structured databases
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/25—Integrating or interfacing systems involving database management systems
- G06F16/258—Data format conversion from or to a database
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/28—Databases characterised by their database models, e.g. relational or object models
- G06F16/283—Multi-dimensional databases or data warehouses, e.g. MOLAP or ROLAP
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/28—Databases characterised by their database models, e.g. relational or object models
- G06F16/284—Relational databases
- G06F16/285—Clustering or classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computing arrangements using knowledge-based models
- G06N5/04—Inference or reasoning models
Definitions
- the invention belongs to the field of OLAP big data information, and particularly relates to a method and a classifier for automatic modeling of an OLAP data model.
- the underlying architecture of OLAP analysis is a data warehouse, which contains a series of data tables.
- Modelers design data models based on these tables based on business analysis requirements for analysts to use; ultimately, analysts' analysis operations are transformed into a series of SQL query for the data table.
- the data model gives the data table business meaning, decoupling the relationship between the underlying data and business needs, is an integral part of the entire architecture.
- the data scale can reach 100 billion or trillion, and the number of dimensions is too large, the business scenario is complex and changeable, which increases the difficulty of data modeling; the use of OLAP Cube does not recommend frequent data models. The changes have increased the difficulty of trial and error in data modeling. These all pose great challenges for the modelers. How to implement automatic modeling through computer algorithms to reduce the cost of manual modeling is particularly important.
- the technical problem to be solved by the present invention is that the existing OLAP model is extremely dependent on the artificial understanding of the data table and the business requirements, and the modeling efficiency is low, the cost is large, and automatic modeling cannot be realized.
- the present invention provides a method for automatically modeling an OLAP data model, the method comprising:
- the beneficial effects of the present invention through the above method, parsing and extracting features of the input SQL sample, and finding patterns required by the data model; then clustering and merging these patterns to generate all required data models,
- the generated data model fully supports all input SQL and has certain generalization capabilities.
- the generated data model can fully support the Cube operation and fast query, and ensure that the Cube expansion rate is within 10 times, and the automatic modeling effectively reduces the user's learning difficulty and trial and error costs, and optimizes the user experience.
- the column information refers to all columns used in the SQL query statement, wherein each column includes: a column name of the column, a table in which the column is located, a type of the column, and a number of times the column appears.
- the table information refers to all the tables used in the SQL query statement, wherein each table includes: a table name of the table, a type of the table, and associated information of the table and the fact table.
- classifying the at least N groups of query modes in the S4 includes:
- S41 Perform a difference judgment on the at least N groups of query modes by using a static rule, where the static rule refers to calculating a difference degree of the same table in any two sets of query modes, and determining whether the difference degree of the table is greater than the first The preset threshold, if greater, the two sets of query modes cannot be classified into one category;
- the difference degree of the table After the difference degree of the table is calculated, the difference degree of the same column in any two sets of query modes is calculated, and it is determined whether the difference degree of the column is greater than the first preset threshold. one type;
- the at least N groups of query modes are clustered by using a statistical rule and a preset learning rule.
- the clustering of the at least N groups of query modes by using the statistical rule and the preset learning rule in the S42 includes:
- S421 Calculate a feature vector of each group of query modes according to column information and table information used in each group of query modes;
- the preset learning rule refers to determining whether the clustering result meets a preset criterion after clustering. If not, the supervised machine learning algorithm is used to adjust the clustering result, and the adjusted aggregate is recorded. Class result.
- converting the mode relationship tree into a corresponding data model in the S6 includes:
- the column information in the pattern relationship tree is converted into column information of the corresponding data model.
- the converting the column information in the mode relationship tree into the column information of the corresponding data model further includes:
- PartScore(i) PartFunc(Score(i), Stats(i)), where PartFunc() is the scoring function, Score(i) is the score in column (i) of each set of query patterns, and Stats(i) is the first ( i) The characteristic statistics of the column, when the (i)th column partition score PartScore(i) exceeds the predetermined scoring threshold, the (i)th column is set as the partition column.
- the invention also relates to a classifier, comprising: a static rule classification module, a statistical rule and a preset learning rule classification module;
- the static rule classification module is configured to perform a difference judgment on at least N groups of query modes by using a static rule, where the static rule refers to calculating a difference degree of the same table in any two sets of query modes, and determining a difference of the table. Whether the degree is greater than the first preset threshold, if greater than, the two sets of query modes cannot be classified into one category, where N is a natural number greater than or equal to 1;
- the difference degree of the table After the difference degree of the table is calculated, the difference degree of the same column in any two sets of query modes is calculated, and it is determined whether the difference degree of the column is greater than the first preset threshold. one type;
- the statistical rule and the preset learning rule classification module are configured to cluster the at least N groups of query modes by using a statistical rule and a preset learning rule after performing the difference judgment.
- the preset learning rule refers to determining whether the clustering result meets a preset criterion after clustering. If not, the supervised machine learning algorithm is used to adjust the clustering result, and the adjusted aggregate is recorded. Class result.
- FIG. 1 is a flow chart of a method for automatically modeling an OLAP data model according to the present invention
- FIG. 2 is a schematic structural view of a classifier of the present invention.
- a method for automatically modeling an OLAP data model comprising:
- step S1 is to perform pre-checking on the obtained SQL query statement, so that it can test whether each input SQL query statement can effectively function in a subsequent stage
- step S2 adopts SQL syntax.
- the parser parses the input SQL query statement to determine whether there is a lexical or grammatical error in each SQL query statement; if it exists, it gives an explicit error prompt to guide the user to correct the SQL query statement, if all the verification passes, It is allowed to proceed to the next step S3.
- step S3 the input SQL query statement is simulated.
- the engine does not return any meaningful query results. Instead, the analysis results of the query plan are collected and converted into a set of query modes during execution.
- a lot of SQL will contain: multiple sub-queries (such as SQL containing sub-queries, each sub-query and main query corresponds to a context (ie query structure)), each context will generate a query mode; therefore a SQL may generate multiple query patterns. For example: If the input SQL query statement is:
- Col(0) sellers.seller_name-name(0): seller_name, tbl(0): sellers, Cat(0): dimension, Score(0): calculated based on the number of occurrences 1, using by group.
- Col(1) order.location-name(1): location
- tbl(1) order
- Cat(1) dimension
- Score(1) calculated based on the number of occurrences 1, using by group.
- Col(2) order.price-name(2): price
- tbl(2) order
- Cat(2) metric
- Score(2) calculated based on the number of occurrences 1, usage (metric).
- Col(3) order.time-name(3): time
- tbl(3) order
- Cat(3) dimension
- Score(3) calculated based on the number of occurrences 1, the use of the filter.
- the query structure is extracted from the SQL query statement to obtain at least N sets of query modes.
- step S4 at least N sets of query modes are classified
- Col(0) sellers.seller_name-name(0): seller_name, tbl(0): sellers, Cat(0): dimension, Score(0): calculated based on the number of occurrences 1, using by group.
- Col(1) order.location-name(1): location
- tbl(1) order
- Cat(1) dimension
- Score(1) calculated based on the number of occurrences 1, using by group.
- Col(2) order.price-name(2): price
- tbl(2) order
- Cat(2) metric
- Score(2) calculated based on the number of occurrences 1, usage (metric).
- Col(0) order.time-name(0): time
- tbl(0) order
- Cat(0) dimension
- Score(0) based on the number of occurrences 1, the use of the filter (filter);
- Tbl(0) order-Name(0): order, Cat(0): fact table;
- step S5 the query patterns in each class are merged to obtain a corresponding pattern relationship tree, and the following merge process is to take a union:
- Col(0) sellers.seller_name-name(0): seller_name, tbl(0): sellers, Cat(0): dimension, Score(0): based on the number of occurrences 1, the use of (group by);
- Col(1) order.location-name(1):location,tbl(1):order,Cat(1): dimension, Score(1): based on the number of occurrences 1, the use of (group by);
- Col(2) order.price-name(2): price
- tbl(2) order
- Cat(2) metric
- Score(2) calculated based on the number of occurrences 1, usage (metric);
- Col(3) order.time-name(3):time,tbl(3):order,Cat(3): dimension, Score(3): based on the number of occurrences 1, the use of the filter (filter);
- Col(4) buyers.buyer_name-name(4): buyer_name, tbl(4): buyers, Cat(4): dimension, Score(4): based on the number of occurrences 1, use by (group by);
- step S6 the query mode is classified in the previous step in step S6, and a pattern relationship tree is generated. Each pattern relationship tree will generate a data model at this step.
- the process from the schema relation tree to the data model is mainly data conversion, that is, the table information in the pattern relationship tree (such as fact table, dimension table, JOIN, etc.) is directly converted into table information in the data model; similarly, the pattern relationship tree Column information (dimensions, measures, computed columns, and so on) is directly translated into column information in the data model. So after a successful set of data models is created, this set of data models can collectively support the input set of SQL queries and the business analysis needs behind those SQLs.
- the table information in the pattern relationship tree such as fact table, dimension table, JOIN, etc.
- Column information dimensions, measures, computed columns, and so on
- the column information refers to all columns used in the SQL query statement, wherein each column includes: a column name of the column, a table in which the column is located, and a type of the column. And the number of times the column appears.
- the table information refers to all the tables used in the SQL query statement, wherein each table includes: a table name of the table, a type of the table, and associated information of the table and the fact table.
- Col(i) column information of all the columns used in the SQL query statement in the second embodiment is denoted as Col(i), and includes all the columns used in the context.
- Name(i) represents the column name of this column
- Tab(i) represents the table in which the column is located
- Cat(i) represents the type of the column, such as dimensions, metrics, computed columns, etc.
- Score( i) represents a score calculated based on the number of occurrences, usage, etc. in this column.
- Table information Recorded as Tab(j), containing all the tables used in the context.
- Name(j) represents the table name of the table
- Cat(j) represents the type of the table, such as fact table, dimension table, etc.
- Join(j) represents the JOIN association of this table and the fact table.
- Information including JOIN type, JOIN condition.
- classifying the at least N groups of query modes in S4 in another embodiment 3 includes:
- S41 Perform a difference judgment on the at least N groups of query modes by using a static rule, where the static rule refers to calculating a difference degree of the same table in any two sets of query modes, and determining whether the difference degree of the table is greater than the first The preset threshold, if greater, the two sets of query modes cannot be classified into one category;
- the difference degree of the table After the difference degree of the table is calculated, the difference degree of the same column in any two sets of query modes is calculated, and it is determined whether the difference degree of the column is greater than the first preset threshold. one type;
- the at least N groups of query modes are clustered by using a statistical rule and a preset learning rule.
- the general query mode classification includes a plurality of rules, wherein the static rules include a series of mutually exclusive rules, that is, two modes that meet certain conditions cannot be classified into one class. mainly includes:
- the difference degree Diff(i,j) is calculated for the same column in the two query modes. If the degree of difference of the column is greater than the set threshold, the two query modes cannot be classified into one class.
- the at least N groups of query modes are clustered by using a statistical rule and a preset learning rule.
- clustering the at least N groups of query modes by using a statistical rule and a preset learning rule in the S42 in another embodiment 4 includes:
- S421 Calculate a feature vector of each group of query modes according to column information and table information used in each group of query modes;
- the feature vector is calculated based on all the column information and the table information of the query mode, for example:
- X(i) (col1, col2, col3, ...colN, tab1, tab2,...,tabM), where each colX represents the score Score(X) of the Col(X) column, and each tabY represents the Tab(Y) table. Score Score(Y).
- the preset learning rule refers to determining whether the clustering result meets a preset criterion after clustering, and if not, using a supervised machine learning algorithm to perform clustering results. Adjust and record the adjusted cluster results.
- the learning rule “static rule” and “statistical rule” generate a recommendation result, and the user can manually adjust or correct the classification result; for example, some query modes are “ After the static rule and the "statistical rule” are judged, they can be classified into one category, but the user manually modifies from the perspective of business and the like to be inseparable.
- the “learning rules” will save all the final classification results after the user has confirmed, and use the supervised machine learning to train the appropriate classification model to predict whether a group of patterns can be classified into one category, and based on this prediction, the clustering results. Make adjustments.
- the query mode in each class is merged in the S5, and the corresponding mode relationship tree is obtained:
- the structure formed by the merged query pattern of each class contains a set of tables and their JOIN relationship trees, and also defines the selected columns on each table.
- Such a structure is called a schema relation tree.
- converting the mode relationship tree into a corresponding data model in the S6 in another embodiment 7 includes:
- the column information in the pattern relationship tree is converted into column information of the corresponding data model.
- converting the column information in the mode relationship tree into the column information of the corresponding data model in another embodiment 8 further includes:
- PartScore(i) PartFunc(Score(i), Stats(i)), where PartFunc() is the scoring function, Score(i) is the score in column (i) of each set of query patterns, and Stats(i) is the first ( i) The characteristic statistics of the column (ie, the statistical value of the data feature), when the (i)th column partition score PartScore(i) exceeds the predetermined score threshold, the (i)th column is set as the partition column.
- Embodiment 9 of the present invention further relates to a classifier, including: a static rule classification module, a statistical rule, and a preset learning rule classification module;
- the static rule classification module is configured to perform a difference judgment on at least N groups of query modes by using a static rule, where the static rule refers to calculating a difference degree of the same table in any two sets of query modes, and determining a difference of the table. Whether the degree is greater than the first preset threshold, if greater than, the two groups of query modes cannot be classified into one category;
- the difference degree of the table After the difference degree of the table is calculated, the difference degree of the same column in any two sets of query modes is calculated, and it is determined whether the difference degree of the column is greater than the first preset threshold. one type;
- the statistical rule and the preset learning rule classification module are configured to cluster the at least N groups of query modes by using a statistical rule and a preset learning rule after performing the difference judgment.
- the classifier is used to classify multiple query modes, and the query mode classification includes three rules: a static rule: a series of mutually exclusive rules, that is, two modes that meet certain conditions. Can not be divided into one category.
- the method mainly includes: calculating a difference degree Diff(i, j) for the same table in the two modes, and if the degree of difference is greater than a set threshold, the two modes are not classified into one class;
- the difference degree Diff(i,j) is calculated for the same column in the two query modes, and if the degree of difference is greater than the set threshold, the two query modes cannot be classified into one class;
- the merge rules are:
- each type of query pattern is merged to form a structure containing a set of tables and their JOIN relationship tree, and also defines the selected column on each table.
- a structure is called a pattern relationship tree.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- Computational Linguistics (AREA)
- Mathematical Physics (AREA)
- Evolutionary Computation (AREA)
- Computing Systems (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Medical Informatics (AREA)
- Fuzzy Systems (AREA)
- Probability & Statistics with Applications (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
Claims (10)
- 一种OLAP数据模型自动建模的方法,其特征在于,该方法包括:S1,获取SQL查询语句;S2,对所述SQL查询语句进行解析,根据解析结果判断所述SQL查询语句是否存在语法或者词法错误;S3,若不存在,则根据所述SQL查询语句使用到的与源数据相关的列信息和表信息,对所述SQL查询语句进行查询结构抽取,得到至少N组查询模式,其中N为大于等于1的自然数;S4,对所述至少N组查询模式进行分类;S5,将每类中的查询模式进行合并,得到对应的模式关系树;S6,将所述模式关系树转换成对应的数据模型。
- 根据权利要求1所述的方法,其特征在于,所述列信息是指所述SQL查询语句中使用的所有列,其中每列中包括:该列的列名、该列所在的表、该列的类型和该列出现的次数。所述表信息是指所述SQL查询语句中使用的所有表,其中每表中包括:该表的表名、该表的类型、该表和事实表的关联信息。
- 根据权利要求2所述的方法,其特征在于,所述S4中对所述至少N组查询模式进行分类包括:S41,采用静态规则对所述至少N组查询模式中任意两组查询模式进行差异判断,其中,所述静态规则是指计算任意两组查询模式中相同的表的差异度,判断所述表的差异度是否大于第一预设阈值,若大于,则两组查询模式不可分为一类;且在计算完表的差异度后,计算任意两组查询模式中相同的列的差异度,判断所述列的差异度是否大于所述第一预设阈值,若大于则两组查询模式不可分为一类;S42,在进行差异判断后,采用统计规则和预设学习规则对所述至少N 组查询模式进行聚类。
- 根据权利要求3所述的方法,其特征在于,所述S42中采用统计规则和预设学习规则对所述至少N组查询模式进行聚类包括:S421,根据每组查询模式中使用的列信息和表信息,计算每组查询模式的特征向量;S422,根据无监督机器学习中的聚类算法和所述预设学习规则对所有的所述特征向量进行聚类。
- 根据权利要求4所述的方法,其特征在于,所述预设学习规则是指在聚类后,判断聚类结果是否符合预设标准,若不符合,则采用有监督机器学习算法对聚类结果进行调整,并记录调整后的聚类结果。
- 根据权利要求3或4所述的方法,其特征在于,所述S5中将每类中的查询模式进行合并,得到对应的模式关系树包括:计算每类中任意两组查询模式中相同的表的差异度,判断所述表的差异度是否小于第二预设阈值,若小于,则将两组查询模式合并,得到对应的模式关系树;且在计算完每类中表的差异度后,计算每类中任意两组查询模式中相同的列的差异度,判断所述列的差异度是否小于所述第二预设阈值,若小于,则将两组查询模式合并,得到对应的模式关系树。
- 根据权利要求3或4所述的方法,其特征在于,所述S6中将所述模式关系树中转换成对应的数据模型包括:将所述模式关系树中的表信息转换成对应的所述数据模型的表信息;将所述模式关系树中的列信息转换成对应的所述数据模型的列信息。
- 根据权利要求7所述的方法,其特征在于,所述将所述模式关系树中的列信息转换成对应的所述数据模型的列信息还包括:对所述模式关系树中的列进行分区评分,确定所述数据模型的分区列,其中对所述模式关系树中的列进行分区评分的计算公式为:PartScore(i)=PartFunc(Score(i),Stats(i)),其中PartFunc()是评分函数,Score(i)是每组查询模式上第(i)列评分,Stats(i)是第(i)列的特征统计值,当第(i)列分区评分PartScore(i)超过预定评分阈值,则将第(i)列设置为分区列。
- 一种分类器,其特征在于,该分类器包括:静态规则分类模块、统计规则和预设学习规则分类模块;所述静态规则分类模块,用于采用静态规则对至少N组查询模式进行差异判断,其中,所述静态规则是指计算任意两组查询模式中相同的表的差异度,判断所述表的差异度是否大于第一预设阈值,若大于,则两组查询模式不可分为一类,其中N为大于等于1的自然数;且在计算完表的差异度后,计算任意两组查询模式中相同的列的差异度,判断所述列的差异度是否大于所述第一预设阈值,若大于则两组查询模式不可分为一类;所述统计规则和预设学习规则分类模块,用于在进行差异判断后,采用统计规则和预设学习规则对所述至少N组查询模式进行聚类。
- 根据权利要求9所述的分类器,其特征在于,所述预设学习规则是指在聚类后,判断聚类结果是否符合预设标准,若不符合,则采用有监督机器学习算法对聚类结果进行调整,并记录调整后的聚类结果。
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP18894747.7A EP3709192A4 (en) | 2017-12-29 | 2018-01-19 | PROCESS AND CLASSIFIER FOR AUTOMATIC MODELING OF AN OLAP DATA MODEL |
US15/769,397 US11055307B2 (en) | 2017-12-29 | 2018-01-19 | Automatic modeling method and classifier for OLAP data model |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711487470.5 | 2017-12-29 | ||
CN201711487470.5A CN108153894B (zh) | 2017-12-29 | 2017-12-29 | 一种olap数据模型自动建模的方法、分类器装置 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2019127744A1 true WO2019127744A1 (zh) | 2019-07-04 |
Family
ID=62460107
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2018/073320 WO2019127744A1 (zh) | 2017-12-29 | 2018-01-19 | 一种olap数据模型自动建模的方法、分类器 |
Country Status (4)
Country | Link |
---|---|
US (1) | US11055307B2 (zh) |
EP (1) | EP3709192A4 (zh) |
CN (1) | CN108153894B (zh) |
WO (1) | WO2019127744A1 (zh) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112445812A (zh) * | 2020-11-27 | 2021-03-05 | 中原银行股份有限公司 | 一种结构化查询语句处理方法及装置 |
Families Citing this family (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111209328A (zh) * | 2018-11-22 | 2020-05-29 | 厦门白山耘科技有限公司 | 自动确定待展示的输入数据需使用的图表属性的方法、装置、客户端及服务器 |
CN110008239A (zh) * | 2019-03-22 | 2019-07-12 | 跬云(上海)信息科技有限公司 | 基于预计算优化的逻辑执行优化方法及系统 |
US11269880B2 (en) * | 2019-05-20 | 2022-03-08 | Google Llc | Retroreflective clustered join graph generation for relational database queries |
US11281671B2 (en) * | 2019-05-20 | 2022-03-22 | Google Llc | Retroreflective join graph generation for relational database queries |
CN110597876B (zh) * | 2019-08-30 | 2023-03-24 | 南开大学 | 一种基于离线学习历史查询预测未来查询的近似查询方法 |
CN111125147B (zh) * | 2019-12-12 | 2021-06-01 | 跬云(上海)信息科技有限公司 | 基于扩展预计算模型和sql函数的超大集合分析方法及装置 |
CN111832661B (zh) * | 2020-07-28 | 2024-04-02 | 平安国际融资租赁有限公司 | 分类模型构建方法、装置、计算机设备及可读存储介质 |
CN112132420B (zh) * | 2020-09-04 | 2023-11-28 | 广西大学 | 一种面向sql查询的细化评分方法 |
US11386053B2 (en) * | 2020-10-15 | 2022-07-12 | Google Llc | Automatic generation of a data model from a structured query language (SQL) statement |
CN113672615B (zh) * | 2021-07-22 | 2023-06-20 | 杭州未名信科科技有限公司 | 一种基于树型表间关系自动生成sql的数据分析方法与系统 |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7571182B1 (en) * | 2005-01-26 | 2009-08-04 | Star Analytics, Inc. | Emulation of a balanced hierarchy from a nonbalanced hierarchy |
CN105468702A (zh) * | 2015-11-18 | 2016-04-06 | 中国科学院计算机网络信息中心 | 一种大规模rdf数据关联路径发现方法 |
CN105930388A (zh) * | 2016-04-14 | 2016-09-07 | 中国人民大学 | 一种基于函数依赖关系的olap分组聚集方法 |
Family Cites Families (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7225343B1 (en) * | 2002-01-25 | 2007-05-29 | The Trustees Of Columbia University In The City Of New York | System and methods for adaptive model generation for detecting intrusions in computer systems |
US6947929B2 (en) * | 2002-05-10 | 2005-09-20 | International Business Machines Corporation | Systems, methods and computer program products to determine useful relationships and dimensions of a database |
US7716167B2 (en) * | 2002-12-18 | 2010-05-11 | International Business Machines Corporation | System and method for automatically building an OLAP model in a relational database |
US7707143B2 (en) * | 2004-06-14 | 2010-04-27 | International Business Machines Corporation | Systems, methods, and computer program products that automatically discover metadata objects and generate multidimensional models |
JP4682030B2 (ja) * | 2005-11-30 | 2011-05-11 | 富士通株式会社 | 図形検索プログラム、該プログラムを記録した記録媒体、図形検索装置、および図形検索方法 |
US10289637B2 (en) * | 2014-06-13 | 2019-05-14 | Excalibur Ip, Llc | Entity generation using queries |
CN104391895A (zh) * | 2014-11-12 | 2015-03-04 | 珠海世纪鼎利通信科技股份有限公司 | 一种基于云计算的sql语句处理系统 |
US20160371355A1 (en) * | 2015-06-19 | 2016-12-22 | Nuodb, Inc. | Techniques for resource description framework modeling within distributed database systems |
US20170031980A1 (en) * | 2015-07-28 | 2017-02-02 | InfoKarta, Inc. | Visual Aggregation Modeler System and Method for Performance Analysis and Optimization of Databases |
CN106997386B (zh) * | 2017-03-28 | 2019-12-27 | 上海跬智信息技术有限公司 | 一种olap预计算模型、自动建模方法及自动建模系统 |
-
2017
- 2017-12-29 CN CN201711487470.5A patent/CN108153894B/zh active Active
-
2018
- 2018-01-19 US US15/769,397 patent/US11055307B2/en active Active
- 2018-01-19 EP EP18894747.7A patent/EP3709192A4/en not_active Ceased
- 2018-01-19 WO PCT/CN2018/073320 patent/WO2019127744A1/zh active Application Filing
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7571182B1 (en) * | 2005-01-26 | 2009-08-04 | Star Analytics, Inc. | Emulation of a balanced hierarchy from a nonbalanced hierarchy |
CN105468702A (zh) * | 2015-11-18 | 2016-04-06 | 中国科学院计算机网络信息中心 | 一种大规模rdf数据关联路径发现方法 |
CN105930388A (zh) * | 2016-04-14 | 2016-09-07 | 中国人民大学 | 一种基于函数依赖关系的olap分组聚集方法 |
Non-Patent Citations (1)
Title |
---|
See also references of EP3709192A4 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112445812A (zh) * | 2020-11-27 | 2021-03-05 | 中原银行股份有限公司 | 一种结构化查询语句处理方法及装置 |
Also Published As
Publication number | Publication date |
---|---|
CN108153894A (zh) | 2018-06-12 |
US11055307B2 (en) | 2021-07-06 |
EP3709192A1 (en) | 2020-09-16 |
US20200394201A1 (en) | 2020-12-17 |
EP3709192A4 (en) | 2020-11-04 |
CN108153894B (zh) | 2020-08-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2019127744A1 (zh) | 一种olap数据模型自动建模的方法、分类器 | |
US8898145B2 (en) | Query optimization techniques for business intelligence systems | |
US8996555B2 (en) | Question answering framework for structured query languages | |
US9280535B2 (en) | Natural language querying with cascaded conditional random fields | |
US11461319B2 (en) | Dynamic database query efficiency improvement | |
WO2018176623A1 (zh) | 一种olap预计算模型、自动建模方法及自动建模系统 | |
US9286370B2 (en) | Viewing a dimensional cube as a virtual data source | |
US6718338B2 (en) | Storing data mining clustering results in a relational database for querying and reporting | |
US20120117054A1 (en) | Query Analysis in a Database | |
US9135296B2 (en) | System, method, and data structure for automatically generating database queries which are data model independent and cardinality independent | |
EP3267330A1 (en) | Query rewriting in a relational data harmonization framework | |
US20200175390A1 (en) | Word embedding model parameter advisor | |
US11288266B2 (en) | Candidate projection enumeration based query response generation | |
WO2020168702A1 (zh) | 一种基于模板的软件缺陷自动问答方法 | |
Peng et al. | Optimizing probabilistic query processing on continuous uncertain data | |
Feldman et al. | A knowledge-based approach for index selection in relational databases | |
Zhu et al. | A data cleaning method for heterogeneous attribute fusion and record linkage | |
Xu et al. | Semantic connection set-based massive RDF data query processing in Spark environment | |
Ordonez et al. | A data mining system based on SQL queries and UDFs for relational databases | |
Castellanos et al. | SIE-OBI: a streaming information extraction platform for operational business intelligence | |
Wang et al. | Probabilistic object deputy model for uncertain data and lineage management | |
Li et al. | Cost-based query optimization for XPath | |
Sakr | Towards a comprehensive assessment for selectivity estimation approaches of XML queries | |
CN112100370B (zh) | 一种基于文本卷积和相似度算法的图审专家组合推荐方法 | |
Rong et al. | DBinsight: A Tool for Interactively Understanding the Query Processing Pipeline in RDBMSs |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
WWE | Wipo information: entry into national phase |
Ref document number: 18894747.7 Country of ref document: EP |
|
ENP | Entry into the national phase |
Ref document number: 2018894747 Country of ref document: EP Effective date: 20200611 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 18894747 Country of ref document: EP Kind code of ref document: A1 |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 18894747 Country of ref document: EP Kind code of ref document: A1 |