CN103927346A - Query connection method on basis of data volumes - Google Patents

Query connection method on basis of data volumes Download PDF

Info

Publication number
CN103927346A
CN103927346A CN201410124531.1A CN201410124531A CN103927346A CN 103927346 A CN103927346 A CN 103927346A CN 201410124531 A CN201410124531 A CN 201410124531A CN 103927346 A CN103927346 A CN 103927346A
Authority
CN
China
Prior art keywords
statistical information
data
query
data volume
row
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201410124531.1A
Other languages
Chinese (zh)
Other versions
CN103927346B (en
Inventor
陈岭
周强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CN201410124531.1A priority Critical patent/CN103927346B/en
Publication of CN103927346A publication Critical patent/CN103927346A/en
Application granted granted Critical
Publication of CN103927346B publication Critical patent/CN103927346B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24553Query execution of query operations
    • G06F16/24558Binary matching operations
    • G06F16/2456Join operations

Abstract

The invention discloses a query connection method on the basis of data volumes. Characteristics such as line file reading are taken into deep consideration during real-time query on big data by the aid of the query connection method, so that costs can be estimated, and the optimal connection sequences can be assuredly generated. The query connection method mainly includes constructing metadata servers; collecting statistical information; querying the metadata servers and acquiring relevant statistical information of various tables participating in connection; estimating the selectivity and relevant parameters such as the data volumes according to the statistical information; computing the corresponding costs of various execution plans to find out the optimal connection sequences. The query connection method has the advantages that the cost estimation accuracy can be improved by the aid of the query connection method, accordingly, the optimal execution plans can be assuredly found out, and the integral query efficiency can be effectively improved.

Description

Inquiry method of attachment based on data volume
Technical field
The present invention relates to large data real-time query optimisation technique field, relate in particular to a kind of inquiry method of attachment based on data volume.
Background technology
Large data real-time query is important large data technique, and existing large data query system has Google Dremel, Cloudera Impala, Berkeley Shark, Apache Drill etc.Large data real-time query generally adopts distributed computing architecture, due to the support having weakened functions such as affairs, so have higher extensibility with respect to relevant database cluster.Be well positioned to meet the user's request of real-time query due to large data real-time query simultaneously, therefore its in internet, there is wide application space in the field such as wisdom city.
Multi-link sequential query optimization is the important component part of data base management system (DBMS), in large data real-time query technical field, possesses equally irreplaceable importance.It,, by adopting certain optimization method, constantly travels through the search volume of executive plan, finds out the best order of connection, to generate best executive plan, thereby promotes the performance of large data query system, meets the real-time demand of user's inquiry.
Estimate it is very important part in multi-link sequential query optimizing process due to cost, can provide a kind of effective result size estimation method be the key that query optimization is effectively realized.Traditional cost method of estimation is a kind of method based on table radix, can effectively solve traditional cost estimation problem by the method, thereby ensures to find the Optimum Implementation Plan that meets Cost Model.But in distributed data base system or data warehouse, there is the tables of data with row formula stored in file format, this formatted file is the I/O performance when optimizing bottom data and to read and reduces data transmission data volume, taking RCFile file as example, this file be a kind of first by row transversally cutting then by the file layout of the longitudinal cutting of row, it will only read and transmit required data rows.In the time that the tables of data to there being row formula stored in file format participates in connecting, while adopting the cost method of estimation of tradition based on table radix to estimate, its the possibility of result can produce serious deviation, and then cause order of connection optimized algorithm to be found out meeting the executive plan of Cost Model not for best, the order of connection finding not, for optimum, consequently makes whole query latency higher.
Summary of the invention
The technical problem to be solved in the present invention is how to guarantee that large data real time inquiry system promotes the accuracy that its cost is estimated while carrying out multi-link sequential optimization, thereby promotes the overall efficiency of inquiry.The problem of carrying out cost estimation existence based on table radix in order to solve above-mentioned tradition, the present invention proposes the multi-join query cost method of estimation based on data volume, consider that the part relations that participates in connecting in the inquiry of user's submission may be with the storage of row formula file, by characteristics such as deep consideration row formula file read, increase more fine-grained statistical information, utilize the average length of each field with the connection intermediate result size of estimation inquiry, thereby effectively guarantee the accuracy of cost estimation.
An inquiry method of attachment based on data volume, comprising:
Step 1, to the request of meta data server submit Query, obtains the corresponding statistical information of each table that participates in connection;
Step 2, obtains the data volume of all tables in current executive plan according to the statistical information estimation getting;
Step 3, repeating step 1 and step 2, until the executive plan that has suitable data amount and make Query Cost minimum is found out, the connection of showing by the order of connection in this executive plan in the search volume of traversal executive plan.
Wherein the search volume of executive plan refers to the set of the table order of connection that all executive plans obtain.
The present invention determines the order connecting in multi-join query using data volume as Query Cost, thereby guarantees that large data real time inquiry system promotes the accuracy that its cost is estimated while carrying out multi-link sequential optimization, thereby promotes the overall efficiency of inquiry.
Wherein, meta data server building mode is, chooses relevant database and designs the table schema of row rank, creates metadatabase and table relation according to the table schema designing in corresponding relevant database, obtains meta data server.
For the statistical information of three kinds of granularities such as table rank, subregion rank and row rank can be provided for inquiry system, design corresponding table schema and need to meet suitable normal form, can complete under the prerequisite of cost estimation simultaneously, reduce unnecessary storage overhead as far as possible.
Statistical information in meta data server is every corresponding statistical information of table, and described statistical information is added up and obtained according to the table schema his-and-hers watches of design.
The fine granularity of statistical information obtains because table schema is row class pattern according to the fine granularity of table schema, and therefore statistical information comprises the statistical information of row rank.
Described relevant database is: MYSQL database, Derby database or oracle database.
According to the actual demand of enterprise customer and system, choose the meta data server of suitable relevant database as large data real time inquiry system.
Statistical information comprises: the upper bound of data value in the lower bound of data value, row in row names, row, table midrange be the total line number according to maximum length, table or the view of field data in the average length of field data in the quantity of different value, row and row according to the quantity for empty, table midrange.
The storage in meta data server of meta data server and statistical information all completes under off-line state.
Because the structure of meta data server and the collection of statistical information are all that off-line completes, while making actual inquiry, carrying out returning of statistical information does not need to expend how many run-time overheads, has greatly reduced the time delay that cost is estimated.
In step 2, the data volume of each table calculates according to total line number of the corresponding selectance of this table, field average amount and table.
Selectance is according to statistical informations such as the upper bounds of data value in the lower bound of data value in the row of statistical information gained, row and connect correlated condition in inquiry, wherein generally represents with selectivity.
The evaluation method of selectance is, carries out corresponding calculating according to querying condition and statistical information in inquiry, the row that obtains meeting in table querying condition shared ratio in the object set that will inquire about.
Object set is wherein to be the set of table, view or intermediate result.
The computing formula of data volume size is as follows:
size = selectivity × numsOfTableLine × Σ i = 1 j avgCol Size i
Selectivity represents the selectance of inquiry, and numsOfTableLine is total line number of table or view, avgColSize ithe average amount of i row field in the table that expression need to be returned, j is the columns of table.
Evaluation method compared to tradition based on table radix, it not only depends on the line number size that inquiry intermediate result produces, and also the data volume of estimation is taken into account simultaneously, thereby promotes the accuracy of cost estimation.
Advantage of the present invention comprises:
There is the inaccurate problem of estimation in the cost method for tradition based on table radix, deeply considers the characteristics such as row formula file reads, and increases more fine-grained statistical information, effectively promoted the accuracy of cost estimation.
By meta data server storage and maintenance table ASSOCIATE STATISTICS information, avoid repeatedly carrying out a large amount of analytical works, reduce run-time overhead, promote the efficiency that cost is estimated.
Brief description of the drawings
Fig. 1 is the inquiry method of attachment overview flow chart of embodiment of the inventive method based on data volume;
The query processing Organization Chart that Fig. 2 adopts for the current embodiment of the present invention;
Fig. 3 is that in the current embodiment of the present invention, meta data server builds process flow diagram;
Fig. 4 is that in the current embodiment of the present invention, statistical information is collected process flow diagram;
Fig. 5 is statistical information querying flow figure in the current embodiment of the present invention;
Fig. 6 is data volume estimation process flow diagram in the current embodiment of the present invention;
Fig. 7 is order of connection product process figure in the current embodiment of the present invention.
Embodiment
The present invention proposes the inquiry method of attachment based on data volume, in the time inquiring about, multi-join query is carried out to cost estimation, the overall procedure of cost method of estimation as shown in Figure 1.First it carry out the construction work of meta data server; Then complete the collection of statistical information; Secondly obtain by query metadata server the ASSOCIATE STATISTICS information that participates in the each table connecting; Then carry out the estimation work of the correlation parameters such as selectance and data volume according to statistical information; Finally adopt method of estimation based on data volume to calculate the corresponding cost of each executive plan and find out the best order of connection.
The effect of method in query optimization proposing in order to introduce more intuitively the present invention, now provides the framework of query processing as shown in Figure 2, and it has set forth the relation between cost estimation module and the order of connection generation module based on data volume.Wherein, in order of connection generation module, carried out the work of executive plan search by related optimization, and cost estimation module based on data volume is mainly made up of Cost Model and MetaStore two parts, the work of estimating to complete cost.The inquiry of submitting to for user, through parsing after by by multi-link sequential query optimization method to complete the work of sequential optimization, it is carrying out in the process of executive plan search, need to call associated costs estimation module and carry out the estimation work of cost, to guarantee to find the Best link order that meets given Cost Model.
The step of the multi-join query cost method of estimation based on data volume that the present invention proposes comprises:
First need to build meta data server and by the statistical information of storing in the table in meta data server inquiring about before connecting.
Relevant database also designs table schema, builds meta data server.
For the cost method of estimation based on data volume can be able to efficient realization, first need to carry out the construction work of meta data server, as shown in Figure 3, concrete steps are as follows for its flow process:
According to the actual demand of enterprise customer and system, choose the meta data server of suitable relevant database (as MYSQL database, Derby database) as large data real time inquiry system;
For the statistical information of three kinds of granularities such as table rank, subregion rank and row rank can be provided for inquiry system, design corresponding table schema and need to meet suitable normal form, can complete under the prerequisite of cost estimation simultaneously, reduce unnecessary storage overhead as far as possible;
In corresponding database server, create metadatabase and table relation according to the table schema designing, use for subsequent step.
According to designed good table schema, analyze the relation in every table and corresponding statistical information is stored in meta data server to complete the collection of statistical information;
For the inquiry after resolving is carried out to order of connection Optimization Work, the work that has needed statistical information to collect after creating meta data server, as shown in Figure 4, concrete steps are as follows for its flow process:
Estimate to obtain the expense of statistical information in order to reduce cost in order of connection optimizing process, first carry out analytical work by corresponding anolytic sentence or instrument to often connecting the table of inquiring about;
Table after analyzing is carried out to the collection work of ASSOCIATE STATISTICS information, and this statistical information is stored in the respective table of meta data server, for the cost better completing based on data volume is estimated, need to collect the statistical information that comprises the row ranks such as field average length AVG_COL_LEN, it provides in the process of carrying out table schema design.Wherein statistical information comprises: the upper bound of data value in the lower bound of data value, row in row names, row, table midrange be the total line number according to maximum amount of data, table or the view of field data in the average amount of field data in the quantity of different value, row and row according to the quantity for empty, table midrange.
The establishment of meta data server (being metadatabase) and the collection of statistical information are off-line and complete, and then inquire about.
Step 1, by the request of meta data server submit Query to obtain the ASSOCIATE STATISTICS information of each table that connects of participating in;
This step mainly completes the inquiry of ASSOCIATE STATISTICS information and obtains work, and as shown in Figure 5, concrete steps are as follows for its flow process:
In order to obtain participating in inquiry the corresponding statistical information of the each table connecting, need to be by query optimization module to the request of respective meta-data server submit Query;
Return to the corresponding statistical information of each table relation by meta data server, to complete the work of obtaining of statistical information, thereby for the calculating of next stage correlation parameter.
Because the structure of meta data server and the collection of statistical information are all that off-line completes, therefore this step does not need to expend how many run-time overheads, greatly reduce the time delay that cost is estimated.
Step 2, obtains the data volume of all tables in current executive plan according to the statistical information estimation getting.
Wherein executive plan refers to the inquiry of carrying out with the different table order of connection.
Before the corresponding cost of carrying out executive plan is estimated, need to complete the estimation work of correlation parameter, as shown in Figure 6, concrete steps are as follows for its flow process of calculating that correlation parameter comprises selectance and data volume:
By the ASSOCIATE STATISTICS information getting in previous step, first participate in the calculating of the each table selectance connecting, step 2-1, carries out corresponding calculating according to the querying condition and the statistical information that connect in inquiry, the row that is met condition shared ratio in the object set that will inquire about.
For any two querying conditions that comprise in inquiry, the corresponding computing formula difference of satisfied different relations:
Selectance selectivity when inquiry meets querying condition A and querying condition B simultaneously (AandB)computing formula be:
selectivity (AandB)=selectivity (A)×selectivity (B) (1)
Wherein, selectivity (A)represent the selectance of single query condition A, selectivity (B)represent the selectance of single query condition B;
Selectance selevtivity when inquiry meets querying condition A or querying condition B (AorB)computing formula is:
selevtivity (AorB)=P(A)+P(B)-selectivity (AandB) (2)
P(A) represent the probability of occurrence of querying condition A, P(B) represent the probability of occurrence of querying condition B;
Inquiry meets selectance selectivity while getting rid of querying condition A (notA)computing formula:
selectivity (ntoA)=1-selectivity (A) (3)
Between any two querying condition A and B, satisfied pass is: meet simultaneously, meet A or meet B, querying condition also may be for not comprising A.When comprising multiple queries condition and comprising between querying condition multiple the relation, can carry out combination of two according to above-mentioned formula to querying condition wherein respectively, calculate according to the satisfied relation of each combination, obtain final selectance.
Step 2-2, calculates the data volume of each table according to the selectance of step 2-1 gained, computing formula is as follows:
size = selectivity × numsOfTableLine × Σ i = 1 j avgC olSize i - - - ( 4 )
Selectivity represents that step 2-1 calculates gained selectance, and numsOfTableLine is total line number of table or view, avgColSize ithe average amount of i row field in the table that expression need to be returned, j is the columns of table.
Each table data volume input Cost Model that formula (4) is calculated to gained, carries out the cost estimation of multi-join query, thereby obtains the cost of different executive plan gained.Evaluation method compared to tradition based on table radix, it not only depends on the line number size that inquiry intermediate result produces, and also the data volume of estimation is taken into account simultaneously, thereby promotes the accuracy of cost estimation.
Step 3, repeating step 1 and step 2, until the search volume of traversal executive plan, the table order of connection of finding out data volume minimum connects.
In order to find the best order of connection, in the search procedure of executive plan, need the cost method of estimation based on data volume that uses the present invention to propose, as shown in Figure 7, concrete steps are as follows for its flow process:
Carry out the space search work (being repeating step 1 and step 2) of executive plan according to adopted order of connection optimization method, it,, by consider the characteristic of real time inquiry system and increase corresponding technology of prunning branches to optimize the performance of executive plan search simultaneously, reduces the query latency that algorithm itself is carried out;
Obtain the estimated value of the data volume of corresponding executive plan by step 2, find out the executive plan that meets given Cost Model, and store;
The Optimum Implementation Plan of finding out according to above-mentioned steps, to generate the best order of connection, due to the cost estimation method that has adopted the present invention to propose, thereby has effectively improved the accuracy that cost is estimated.

Claims (9)

1. the inquiry method of attachment based on data volume, is characterized in that, comprising:
Step 1, to the request of meta data server submit Query, obtains the corresponding statistical information of each table that participates in connection;
Step 2, obtains the data volume of all tables in current executive plan according to the statistical information estimation getting;
Step 3, repeating step 1 and step 2, until the executive plan that has suitable data amount and make Query Cost minimum is found out, the connection of showing by the order of connection in this executive plan in the search volume of traversal executive plan.
2. the inquiry method of attachment based on data volume as claimed in claim 1, it is characterized in that, wherein, meta data server building mode is, choose relevant database and design the table schema of row rank, in corresponding relevant database, create metadatabase and table relation according to the table schema designing, build meta data server.
3. the inquiry method of attachment based on data volume as claimed in claim 1, is characterized in that, the statistical information of storing in meta data server is every corresponding statistical information of table, and described statistical information is added up and obtained according to the table schema his-and-hers watches of design.
4. the inquiry method of attachment based on data volume as claimed in claim 1, is characterized in that, described relevant database is: MYSQL database, Derby database or oracle database.
5. the inquiry method of attachment based on data volume as claimed in claim 1, it is characterized in that, statistical information comprises: the upper bound of data value in the lower bound of data value, row in row names, row, table midrange be the total line number according to maximum amount of data, table or the view of field data in the average amount of field data in the quantity of different value, row and row according to the quantity for empty, table midrange.
6. the inquiry method of attachment based on data volume as claimed in claim 1, is characterized in that, wherein, the storage in meta data server of meta data server and statistical information all completes under off-line state.
7. the inquiry method of attachment based on data volume as claimed in claim 1, is characterized in that, in step 2, the data volume of each table calculates according to total line number of the corresponding selectance of this table, field average amount and table.
8. the inquiry method of attachment based on data volume as claimed in claim 7, it is characterized in that, the evaluation method of selectance is, carries out corresponding calculating according to querying condition and statistical information in inquiry, the row that obtains meeting in table querying condition shared ratio in the object set that will inquire about.
9. the inquiry method of attachment based on data volume as claimed in claim 8, is characterized in that, the computing formula of every table data volume size is as follows:
size = selectivity × numsOfTableLine × Σ i = 1 j avgCol Size i
Selectivity represents the selectance of inquiry, and numsOfTableLine is total line number of table or view, avgColSize ithe average amount of i row field in the table that expression need to be returned, j is the columns of table.
CN201410124531.1A 2014-03-28 2014-03-28 Query connection method on basis of data volumes Expired - Fee Related CN103927346B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410124531.1A CN103927346B (en) 2014-03-28 2014-03-28 Query connection method on basis of data volumes

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410124531.1A CN103927346B (en) 2014-03-28 2014-03-28 Query connection method on basis of data volumes

Publications (2)

Publication Number Publication Date
CN103927346A true CN103927346A (en) 2014-07-16
CN103927346B CN103927346B (en) 2017-02-15

Family

ID=51145567

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410124531.1A Expired - Fee Related CN103927346B (en) 2014-03-28 2014-03-28 Query connection method on basis of data volumes

Country Status (1)

Country Link
CN (1) CN103927346B (en)

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106250567A (en) * 2016-08-31 2016-12-21 天津南大通用数据技术股份有限公司 In distributed data base system, table connects system of selection and the device of data distribution mode
CN106446170A (en) * 2016-09-27 2017-02-22 努比亚技术有限公司 Data querying method and device
CN107193813A (en) * 2016-03-14 2017-09-22 阿里巴巴集团控股有限公司 Tables of data connected mode processing method and processing device
CN108268536A (en) * 2016-12-30 2018-07-10 北京国双科技有限公司 Database aggregation processing method and device
CN108491516A (en) * 2018-03-26 2018-09-04 哈工大大数据(哈尔滨)智能科技有限公司 Distributed multi-table join selection method based on mixed integer linear programming and device
CN111625557A (en) * 2020-04-07 2020-09-04 上海熙菱信息技术有限公司 Method for rapidly estimating results of billion-level data volume multi-condition
CN112395372A (en) * 2020-12-10 2021-02-23 四川长虹电器股份有限公司 Quick statistical method based on two-dimensional table of relational database system
CN112905591A (en) * 2021-02-04 2021-06-04 成都信息工程大学 Data table connection sequence selection method based on machine learning
CN113010547A (en) * 2021-05-06 2021-06-22 电子科技大学 Database query optimization method and system based on graph neural network
CN113656437A (en) * 2021-07-02 2021-11-16 阿里巴巴新加坡控股有限公司 Method and device for determining optimal query plan
CN114090695A (en) * 2022-01-24 2022-02-25 北京奥星贝斯科技有限公司 Query optimization method and device for distributed database
CN114461677A (en) * 2022-04-12 2022-05-10 天津南大通用数据技术股份有限公司 Method for transmitting and adjusting connection sequence based on selection degree
CN117056361A (en) * 2023-07-03 2023-11-14 杭州拓数派科技发展有限公司 Data query method and device for distributed database

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060106777A1 (en) * 2004-11-18 2006-05-18 International Business Machines Corporation Method and apparatus for predicting selectivity of database query join conditions using hypothetical query predicates having skewed value constants
CN101739451A (en) * 2009-12-03 2010-06-16 南京航空航天大学 Joint query adaptive processing method for grid database
CN102929996A (en) * 2012-10-24 2013-02-13 华南理工大学 XPath query optimization method and system
CN103164495A (en) * 2011-12-19 2013-06-19 中国人民解放军63928部队 Half-connection inquiry optimizing method based on periphery searching and system thereof

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060106777A1 (en) * 2004-11-18 2006-05-18 International Business Machines Corporation Method and apparatus for predicting selectivity of database query join conditions using hypothetical query predicates having skewed value constants
CN101739451A (en) * 2009-12-03 2010-06-16 南京航空航天大学 Joint query adaptive processing method for grid database
CN103164495A (en) * 2011-12-19 2013-06-19 中国人民解放军63928部队 Half-connection inquiry optimizing method based on periphery searching and system thereof
CN102929996A (en) * 2012-10-24 2013-02-13 华南理工大学 XPath query optimization method and system

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
周强等: "基于改进DPhyp算法的Impala查询优化", 《计算机研究与发展》 *
孟凡辉: "数据库基于值的查询优化的研究与实践", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107193813A (en) * 2016-03-14 2017-09-22 阿里巴巴集团控股有限公司 Tables of data connected mode processing method and processing device
US11650990B2 (en) 2016-03-14 2023-05-16 Alibaba Group Holding Limited Method, medium, and system for joining data tables
CN106250567A (en) * 2016-08-31 2016-12-21 天津南大通用数据技术股份有限公司 In distributed data base system, table connects system of selection and the device of data distribution mode
CN106446170A (en) * 2016-09-27 2017-02-22 努比亚技术有限公司 Data querying method and device
CN108268536A (en) * 2016-12-30 2018-07-10 北京国双科技有限公司 Database aggregation processing method and device
CN108491516A (en) * 2018-03-26 2018-09-04 哈工大大数据(哈尔滨)智能科技有限公司 Distributed multi-table join selection method based on mixed integer linear programming and device
CN108491516B (en) * 2018-03-26 2021-09-14 哈工大大数据(哈尔滨)智能科技有限公司 Distributed multi-table connection selection method and device based on mixed integer linear programming
CN111625557B (en) * 2020-04-07 2023-04-14 上海熙菱信息技术有限公司 Method for quickly estimating result of multi-condition billion-level data volume
CN111625557A (en) * 2020-04-07 2020-09-04 上海熙菱信息技术有限公司 Method for rapidly estimating results of billion-level data volume multi-condition
CN112395372A (en) * 2020-12-10 2021-02-23 四川长虹电器股份有限公司 Quick statistical method based on two-dimensional table of relational database system
CN112905591A (en) * 2021-02-04 2021-06-04 成都信息工程大学 Data table connection sequence selection method based on machine learning
CN113010547A (en) * 2021-05-06 2021-06-22 电子科技大学 Database query optimization method and system based on graph neural network
CN113656437A (en) * 2021-07-02 2021-11-16 阿里巴巴新加坡控股有限公司 Method and device for determining optimal query plan
CN113656437B (en) * 2021-07-02 2023-10-03 阿里巴巴新加坡控股有限公司 Model construction method for predicting execution cost stability of reference
CN114090695A (en) * 2022-01-24 2022-02-25 北京奥星贝斯科技有限公司 Query optimization method and device for distributed database
CN114461677A (en) * 2022-04-12 2022-05-10 天津南大通用数据技术股份有限公司 Method for transmitting and adjusting connection sequence based on selection degree
CN114461677B (en) * 2022-04-12 2022-07-26 天津南大通用数据技术股份有限公司 Method for transmitting and adjusting connection sequence based on selection degree
CN117056361A (en) * 2023-07-03 2023-11-14 杭州拓数派科技发展有限公司 Data query method and device for distributed database

Also Published As

Publication number Publication date
CN103927346B (en) 2017-02-15

Similar Documents

Publication Publication Date Title
CN103927346A (en) Query connection method on basis of data volumes
US10216793B2 (en) Optimization of continuous queries in hybrid database and stream processing systems
Zhao et al. Modeling MongoDB with relational model
CN110837585B (en) Multi-source heterogeneous data association query method and system
CN102722531B (en) Query method based on regional bitmap indexes in cloud environment
CN103176974A (en) Method and device used for optimizing access path in data base
JPH07319923A (en) Method and equipment for processing of parallel database of multiprocessor computer system
CN104834754A (en) SPARQL semantic data query optimization method based on connection cost
US20110022581A1 (en) Derived statistics for query optimization
CN112328578B (en) Database query optimization method based on reinforcement learning and graph attention network
CN105630881A (en) Data storage method and query method for RDF (Resource Description Framework)
CN107870949B (en) Data analysis job dependency relationship generation method and system
CN108052635A (en) A kind of heterogeneous data source unifies conjunctive query method
CN104137095A (en) System for evolutionary analytics
CN103019728A (en) Effective complex report parsing engine and parsing method thereof
US10726006B2 (en) Query optimization using propagated data distinctness
Simitsis Modeling and managing ETL processes.
CN114691786A (en) Method and device for determining data blood relationship, storage medium and electronic device
CN103793467A (en) Method for optimizing real-time query on big data on basis of hyper-graphs and dynamic programming
CN103678589A (en) Database kernel query optimization method based on equivalence class
US9406027B2 (en) Making predictions regarding evaluation of functions for a database environment
CN104268298A (en) Method for creating database index and inquiring data
CN110795835A (en) Three-dimensional process model reverse generation method based on automatic synchronous modeling
CN111814458A (en) Rule engine system optimization method and device, computer equipment and storage medium
CN110750560A (en) System and method for optimizing network multi-connection

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20170215

Termination date: 20200328