CN116521719A - Query optimization system based on cost estimation - Google Patents

Query optimization system based on cost estimation Download PDF

Info

Publication number
CN116521719A
CN116521719A CN202310401083.4A CN202310401083A CN116521719A CN 116521719 A CN116521719 A CN 116521719A CN 202310401083 A CN202310401083 A CN 202310401083A CN 116521719 A CN116521719 A CN 116521719A
Authority
CN
China
Prior art keywords
model
information
cost estimation
execution
operator
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310401083.4A
Other languages
Chinese (zh)
Inventor
荆一楠
王嵩立
张寒冰
徐伟
陈振强
何震瀛
王晓阳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fudan University
Transwarp Technology Shanghai Co Ltd
Original Assignee
Fudan University
Transwarp Technology Shanghai Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fudan University, Transwarp Technology Shanghai Co Ltd filed Critical Fudan University
Priority to CN202310401083.4A priority Critical patent/CN116521719A/en
Publication of CN116521719A publication Critical patent/CN116521719A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • G06F16/24532Query optimisation of parallel queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24552Database cache management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention belongs to the technical field of database query, and particularly relates to a query optimization system based on cost estimation. The invention comprises a system information extractor and a cost estimation model based on deep learning; the system information device processes information such as storage, execution model and the like of the database management system into structured data for the model to use; the cost estimation model based on deep learning can establish a mapping relation from query to cost through a history execution record according to different system information, so as to estimate the cost of unknown query; the training of the cost estimation model adopts a layered training strategy, so that the model can be helped to learn from batch training data, the memory utilization rate of model training is improved, training concussion is reduced, and model convergence is accelerated. The invention can help the database optimizer to select the correct execution plan, and finally improves the overall query execution efficiency of the database.

Description

Query optimization system based on cost estimation
Technical Field
The invention belongs to the technical field of database query, and particularly relates to a query optimization system based on cost estimation.
Background
Today, the response delay requirements of the database field on the database system are higher and higher, and whether the query execution is efficient depends greatly on the performance of the query optimizer. In the process of query optimization, cost-based execution plan optimization is an important link, and an excellent cost estimation model can help an optimizer to better select a proper execution plan. Most of the existing cost-based query optimization techniques rely strongly on estimating the cardinality of the sub-query. Firstly, obtaining a relatively accurate base estimation value, and then carrying out weighted summation on the bases of all operators to obtain a cost estimation value.
In recent years, radix estimation methods are advanced, and besides traditional histogram estimation methods based on statistics, hyperlog estimation methods and the like, a plurality of methods based on machine learning also achieve good effects. Cardinal number estimation methods based on machine learning are largely divided into two categories: query-based data-based. The query-based method utilizes the deep neural network to establish the mapping relation from the query to the base, has the advantages of high accuracy, can quickly adapt to data change, and has the defect of poor adaptability to the change of the workload; the data-based method utilizes a sum-product network, an autoregressive model and the like to obtain the joint probability distribution of the data, and then obtains the base number of the query through calculation, so that the method has the advantages of high robustness to the change of the workload, but the accuracy rate is obviously reduced when the data distribution is changed.
From the query base to the query cost, the database system which is increasingly complex nowadays is no longer a simple linear association. From a storage perspective, for relational data, in a transaction-based system, the data is stored in rows; storing in columns in a data warehouse for analysis; in some hybrid transaction databases, a pattern of multi-copy rank hybrid storage is also employed. The cost of reading an equal amount of data on different underlying stores is quite different. On a distributed system, each operator also allows parallel execution on multiple copies of data, data slices. The accuracy of the existing cost estimation model for the complex execution scenario is often not satisfactory.
Disclosure of Invention
The invention aims to provide a query optimization system based on cost estimation, which overcomes the defects of the prior art.
The query optimization system based on cost estimation provided by the invention can sense the stored information and the execution information of the system. As a core component of the optimizer in the database management system, the enumerated execution plans can be accurately estimated by combining multiple aspects of information, and the selection of the final execution plan by the optimizer is facilitated. The method comprises a system information extractor and a cost estimation model based on deep learning:
system information extractor
The system information extractor processes the storage and execution model information of the database management system into structured data for use by the model; the system comprises two modules, namely a storage information extraction module and an execution information extraction module of a database management system; the information extracted by the system information extractor is used as the input of the cost estimation model. Wherein:
the stored information extraction module extracts information related to storage of data from a database management system, specifically, includes: storage model, copy number and compression strategy of data;
the execution information extraction module extracts information related to the physical execution process of the query from the database management system, and comprises the following steps: operator parallelism, buffer size;
the system information extractor integrates the results of the two modules into a cost estimation model.
Further specifically:
the system information extractor obtains and updates the stored information Is and the execution information Ie of the system by periodic extraction, and combines the stored information Is and the execution information Ie into system information I system
As an input feature for constructing a cost estimation model;
for stored information I s Storage model considering dataCopy number->Compression strategy->Expressed as:
first term storage modelDivided into line memories S row Sum column store S column Two classes:
second number of copiesDefined as the number of copies of a piece of data on multiple clusters, specifically row-store copy C row And column-store copy C column The sum of the numbers, namely:
third compression strategyDefined as the compression policy that the column store copy, if any, takes on the data; the information is extensible and defined as:
for execution information I e Consider operator parallelismCache size +.>Two dimensions, expressed as:
operator parallelismDefined as the parallelism of each SQL operator in the execution plan, support parallelism P including index scan operators index_scan Connection ofParallelism P of operators join Parallelism P of aggregation operator join Parallelism P of projection operator join The method comprises the steps of carrying out a first treatment on the surface of the The parallelism value ranges of the four operators are positive integers Z + The method comprises the steps of carrying out a first treatment on the surface of the Cache size +.>The value range is a positive integer Z + The default unit is MB.
(II) cost estimation model based on deep learning
The construction process of the cost estimation model comprises the following steps: the characteristics of the historical execution plan and the system information extracted from the system information extractor are required to be used as the input characteristics of the cost estimation model, and then the neural network constructed by the small model is used for fitting according to the historical information of the query and the real execution condition, so that the mapping relation between the execution plan and the cost is established. The method comprises the following specific steps:
(1) Encoding the input features;
(2) Building different sub-models M according to operator types node Specifically, submodels M are built for aggregation, concatenation, filtering, scanning, etc. operations, respectively aggr ,M join ,M filter ,M scan
Constructing a sub-model into a total cost estimation model according to a tree structure of an execution plan
(3) Adjusting a cost estimation model; comparing the predicted result with the true cost, calculating a quantized estimated error, and adjusting the neural network model using a back propagation mechanism based on the error.
In step (1), the input features are encoded; the input features mainly include two parts: feature I of historical execution plans plan And system information I extracted by a system information extractor system The method comprises the steps of carrying out a first treatment on the surface of the Input features I input Formalized representation is:
the history execution plan is a tree-shaped combination of operators, and thus the characteristics I of the history execution plan plan Is the set of all operator features in the plan, namely: i plan =UI node
The characteristics of the operators consist of two parts, namely an operator type and meta information of the operators; operator types include full table scanning, merging connection, filtering, hash aggregation, etc.; the meta information of the operators comprises predicate information, estimated base numbers and the like; predicate information is divided into a numerical type and a character string type, and the numerical type is normalized and converted into a numerical value in a (0, 1) range by using a maximum and minimum value; for the string value, a word vector (word vector) model is used for encoding the string value;
system information I system The code of (2) is shown in the formula (1).
In the step (2), the sub-models M are respectively built aggr ,M join ,M filter ,M scan The method specifically comprises the following steps:
for N different operators in a database management system, each operator O i ,i∈[1,N]All have an independent small neural network, denoted NN i Inside the small-sized neural network is a multi-layer cyclic neural network, and parameters among different neural networks are mutually independent; the design is light, and the high-dimensional characteristics of operators can be extracted more deeply; these small neural networks NN i Cost estimation specially used for processing corresponding class operators; finally, building a total cost estimation model by the sub-model according to the tree structure of the execution plan
The step (3) is to adjust the cost estimation model, and in particular to obtain the estimated value of the cost layer by layer upwards from the leaf nodesComparing the estimated value with a real cost, and calculating an estimated error as a model loss q:
according to the error, parameters of the neural network are adjusted by using a back propagation mechanism, and finally the neural network converges to obtain a better model.
In actual use, the cost estimation model receives the candidate execution plan from the optimizer, and feeds back the estimated value of the cost to the optimizer to obtain the estimated value of the cost.
In the invention, regarding cost estimation model training, a strategy of layered training is adopted; the model learning method can help the model learn from the batch training data, improve the memory utilization rate of model training, reduce training concussion and accelerate model convergence.
Specifically, the nodes of the tree execution plans are organized according to the tree hierarchy, and the nodes belonging to the same layer are used as a sub-batch. During training, the sublots are input into the cost estimation model neural network in reverse order according to the hierarchy, and the subnodes of each node are positioned to be output by the aid of the auxiliary index structure, as shown in fig. 2. The specific flow is as follows:
(1) For a batch of data with the size of n, firstly layering n execution plans according to a tree structure, and dividing the n execution plans into k layers (k is the maximum depth of n execution plan trees), wherein nodes of the same layer are used as a sub batch; taking the layer where the root operator of the execution plan is located as a 1 st layer, wherein the ith sub-batch contains all n operators of the ith layer of the execution plan;
(2) Indexing sub operators of each layer of operators; introducing a two-dimensional tensor as an auxiliary data structure, and storing the position of a sub operator of each operator of the ith layer in the (i+1) th layer;
(3) Organizing data of each layer in a batch into tensors according to the data, and training the tensors as model input; starting from the k layer, inputting the model, wherein the output result of each layer is added into the input data of the upper layer; during training, the average value in the batch is used as the average loss.
Compared with the traditional cost estimation model, the invention uses the end-to-end training mode to train the model, thereby improving the acquisition capability of the model for the whole information of the execution plan. In addition to this, system information is introduced: the method comprises the steps of storing information and executing information, and the adaptability of a complex system such as a distributed high concurrency system, a line-row mixed drinking storage system and the like is further improved.
Drawings
FIG. 1 is a block diagram of a cost estimation based query optimization system for sensing system information in accordance with the present invention.
FIG. 2 is a particular batch architecture of a particular cost estimation model.
FIG. 3 is a flow chart of the use of the present invention.
Detailed Description
The following is a specific embodiment of the present invention, as shown in fig. 3.
Experiment setting: data set: the IMDb dataset refers to the dataset of an internet movie database (Internet Movie Database) website, which contains millions of movies, television programs, and television movies, and related personal data, comments, scores, rewards, etc. of actors, directors, producers, and other staff. The dataset contains a plurality of tables, mainly seven of the following tables: title. Bases: tables containing basic information about movies, television programs, television movies, etc., such as movie names, genre, release date, country/region of production, etc.
title. Ratios: user scoring data, such as average score, number of scoring, etc., for movies, television shows, and television movies on the IMDb website are included.
title. Brew: including director and drama information for each movie, as well as other producer information.
name. Bases: including personal information of the staff such as actors, directors, producers, etc., such as date of birth, occupation, etc.
title, standards: including the roles and position information of actors and other staff in each movie.
title. Epsilon. Epoode: including episode and season information for television programs and television movies.
title. Akas: including information such as the movie alias, the translation name, the international standard movie code, etc.
Estimating the cardinality and cost of the IMDB dataset is much more difficult than TPC-H due to the correlation and skew distribution of the real world data. The IMDB dataset includes 22 tables that are connected by primary and foreign keys. We build an index over the primary key.
Workload: JOB-light is a set of workloads established based on the imdb dataset. The JOB-light workload contains 1-4 connected queries, for a total of 70 queries.
The implementation process comprises the following steps: (1) first, training data is created from the original data set. The user extracts inquiry information from the historical inquiry information and organizes the inquiry information into a binary combination form of < execution plan, real cost >. If the history information is insufficient, a query statement simulating the real business can be generated and submitted to system execution to supplement the training data. The method comprises the following specific steps: firstly, generating a query template, taking JOB-light as an example, firstly generating 22 query templates covering various tasks, and then randomly and uniformly generating 10 ten thousand SQL queries according to the templates. Of these, 9 ten thousand were used as training data and 1 ten thousand were used as test data.
(2) In the model training phase, the neural network parameters are first initialized randomly: for each parameter, values are randomly extracted from a normal distribution with a mean value of 0 and a standard deviation of 1 as initial values. The cost estimation model extracts the latest system characteristic information I from the system system Including storing information I s And execution information I e And extract I from the execution plan plan As input. The stored information is obtained by reading the database metadata. Setting training period number epoch=500, and estimating the cost of the execution plan by using a neural network by using a cost estimation model to obtain each training periodThen, calculating an error loss between the estimated cost and the real cost, and training by using a back propagation mechanism of the neural network. After 500 training periods, the cost estimation model convergesTo a more accurate state. Use q-error to measure estimation error,/->Wherein cost is estimated cost, < >>At the cost of opportunity. Through experiments, the average error of q-error is 1.17, the 95-minute error is 1.36, and the average error is 2-3 orders of magnitude better than the traditional cost estimation model.
(3) In the use phase, when the target system receives a new query request Q, candidate plans P generated from the optimizer candidate ={P 1 ,P 2 ,...,P n Extracting features of each plan from }And then inputting a cost estimation model. At this time, the latest system information I is maintained in the cost estimation model system . Then, the cost model inputs the planning characteristics and the system information together as characteristics into the neural network to obtain estimated cost +.>And then returns it to the optimizer.
(4) The optimizer utilizes cost estimation model to estimate costAnd selecting an optimal execution plan with the minimum estimated cost, and submitting the execution plan. The result is better credibility due to higher accuracy of cost estimation.

Claims (5)

1. The query optimization system based on cost estimation can sense the stored information and the execution information of the system and is characterized by comprising a system information extractor and a cost estimation model based on deep learning; wherein:
the system information extractor processes the storage and execution model information of the database management system into structured data for use by a model; the system comprises two modules, namely a storage information extraction module and an execution information extraction module of a database management system; the information extracted by the system information extractor is used as the input of a cost estimation model; wherein:
the storage information extraction module extracts information related to the storage of data from a database management system, and specifically comprises the following steps: storage model, copy number and compression strategy of data;
the execution information extraction module extracts information related to the physical execution process of the query from the database management system, and comprises the following steps: operator parallelism, buffer size;
the system information extractor integrates the results of the two modules and inputs the results into a cost estimation model;
the cost estimation model based on deep learning comprises the following construction processes: the characteristics of the historical execution plan and the system information extracted from the system information extractor are required to be used as the input characteristics of the cost estimation model, and then the neural network constructed by the small model is used for fitting according to the historical information of the query and the real execution condition, so that the mapping relation between the execution plan and the cost is established.
2. The cost estimation based query optimization system of claim 1, wherein the system information extractor obtains and updates the system's stored information I by periodically extracting s And execution information I e Is combined into system information I system
As an input feature for constructing a cost estimation model;
for stored information I s Storage model considering dataCopy number->Compression strategy->Expressed as:
first term storage modelDivided into line memories S row Sum column store S column Two classes:
second number of copiesDefined as the number of copies of a piece of data on multiple clusters, specifically row-store copy C row And column-store copy C column The sum of the numbers, namely:
third compression strategyDefined as the compression policy that the column store copy, if any, takes on the data; the information is extensible and defined as:
for execution information I e Consider operator parallelismCache size +.>Two dimensions, expressed as:
operator parallelismDefined as the parallelism of each SQL operator in the execution plan, support parallelism P including index scan operators index_scan Parallelism P of join operators join Parallelism P of aggregation operator join Parallelism P of projection operator join The method comprises the steps of carrying out a first treatment on the surface of the The parallelism value ranges of the four operators are positive integers Z + The method comprises the steps of carrying out a first treatment on the surface of the Cache size +.>The value range is a positive integer Z + The default unit is MB.
3. The cost estimation-based query optimization system of claim 2, wherein the specific steps of constructing the cost estimation model are as follows:
(1) Encoding the input features;
(2) Building different sub-models M according to operator types node Specifically, submodels M are built for aggregation, concatenation, filtering, and scanning operations, respectively aggr ,M join ,M filter ,M scan
Constructing a sub-model into a total cost estimation model according to a tree structure of an execution plan
(3) Adjusting a cost estimation model; comparing the predicted result with the true cost, calculating a quantized estimated error, and adjusting the neural network model using a back propagation mechanism based on the error.
4. A cost estimation based query optimization system as claimed in claim 3, wherein:
encoding an input feature as described in step (1), wherein the input feature comprises two parts: feature I of historical execution plans plan And system information I extracted by a system information extractor system The method comprises the steps of carrying out a first treatment on the surface of the Input features I input Formalized representation is:
the history execution plan is a tree-shaped combination of operators, and thus the characteristics I of the history execution plan plan Is the set of all operator features in the plan, namely: i plan =∪I node
The operator characteristics consist of two parts, namely an operator type and meta information of the operator; operator types include full table scanning, merging connection, filtering, hash aggregation; the meta information of the operator comprises predicate information and a predicted base number; predicate information is divided into a numerical type and a character string type, and the numerical type is normalized and converted into a numerical value in a (0, 1) range by using a maximum and minimum value; for the string value, encoding it using a word vector model;
the sub-models M are respectively built in the step (2) aggr ,M join ,M filter ,M scan The method specifically comprises the following steps:
for N different operators in a database management system, each operator O i ,i∈[1,N]All have an independent small neural network, denoted NN i Inside the small neural network is a multi-layer fully-connected neural network, and parameters among different neural networksIndependent of each other; these small neural networks NN i The cost estimation method is used for processing cost estimation of the corresponding class operator; finally, building a total cost estimation model by the sub-model according to the tree structure of the execution plan
The step (3) is to adjust the cost estimation model, and in particular to obtain the estimated value of the cost layer by layer upwards from the leaf nodesComparing the estimated value with a real cost, and calculating an estimated error as a model loss q:
according to the error, parameters of the neural network are adjusted by using a back propagation mechanism, and finally, the parameters are converged to obtain a better model;
in actual use, the cost estimation model receives the candidate execution plan from the optimizer, and feeds back the estimated value of the cost to the optimizer to obtain the estimated value of the cost.
5. The query optimization system based on cost estimation according to claim 4, wherein a hierarchical training strategy is adopted for the cost estimation model, specifically, nodes of a plurality of tree execution plans are organized according to a tree hierarchy, and nodes belonging to a layer are used as a sub-batch; during training, sublots are input into a neural network in a reverse order according to the hierarchy, and subnodes of each node are positioned to be output by means of an auxiliary index structure, and the specific flow is as follows:
(1) For a batch of data with the size of n, firstly layering n execution plans according to a tree structure, dividing the n execution plans into k layers, wherein k is the maximum depth of n execution plan trees, and the nodes of the same layer are used as a sub batch; taking the layer where the root operator of the execution plan is located as a 1 st layer, wherein the ith sub-batch contains all n operators of the ith layer of the execution plan;
(2) Indexing sub operators of each layer of operators; introducing a two-dimensional tensor as an auxiliary data structure, and storing the position of a sub operator of each operator of the ith layer in the (i+1) th layer;
(3) Organizing data of each layer in a batch into tensors according to the data, and training the tensors as model input; starting from the kth layer, inputting a model from the bottom to the top, and adding an output result of each layer into input data of an upper layer; training loss uses intra-batch mean as the average loss.
CN202310401083.4A 2023-04-15 2023-04-15 Query optimization system based on cost estimation Pending CN116521719A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310401083.4A CN116521719A (en) 2023-04-15 2023-04-15 Query optimization system based on cost estimation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310401083.4A CN116521719A (en) 2023-04-15 2023-04-15 Query optimization system based on cost estimation

Publications (1)

Publication Number Publication Date
CN116521719A true CN116521719A (en) 2023-08-01

Family

ID=87389541

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310401083.4A Pending CN116521719A (en) 2023-04-15 2023-04-15 Query optimization system based on cost estimation

Country Status (1)

Country Link
CN (1) CN116521719A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116881230A (en) * 2023-09-08 2023-10-13 北京谷器数据科技有限公司 Automatic relational database optimization method based on cloud platform
CN117520385A (en) * 2024-01-05 2024-02-06 凯美瑞德(苏州)信息科技股份有限公司 Database query optimization method based on exploration value and query cost

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116881230A (en) * 2023-09-08 2023-10-13 北京谷器数据科技有限公司 Automatic relational database optimization method based on cloud platform
CN117520385A (en) * 2024-01-05 2024-02-06 凯美瑞德(苏州)信息科技股份有限公司 Database query optimization method based on exploration value and query cost
CN117520385B (en) * 2024-01-05 2024-04-16 凯美瑞德(苏州)信息科技股份有限公司 Database query optimization method based on exploration value and query cost

Similar Documents

Publication Publication Date Title
CN116521719A (en) Query optimization system based on cost estimation
Re et al. Efficient top-k query evaluation on probabilistic data
Babcock et al. Towards a robust query optimizer: a principled and practical approach
Kim et al. Learned cardinality estimation: An in-depth study
CN102867066A (en) Data summarization device and data summarization method
CN111930817A (en) Big data-based distributed unstructured database correlation query method
CN112434024A (en) Relational database-oriented data dictionary generation method, device, equipment and medium
CN104391908A (en) Locality sensitive hashing based indexing method for multiple keywords on graphs
CN106874425A (en) Real time critical word approximate search algorithm based on Storm
CN115238053A (en) BERT model-based new crown knowledge intelligent question-answering system and method
CN112749191A (en) Intelligent cost estimation method and system applied to database and electronic equipment
US9177024B2 (en) System, method, and computer-readable medium for optimizing database queries which use spools during query execution
CN110334290B (en) MF-Octree-based spatio-temporal data rapid retrieval method
CN110597857A (en) Online aggregation method based on shared sample
Wu et al. POLYTOPE: a flexible sampling system for answering exploratory queries
Pang et al. AQUA+: Query Optimization for Hybrid Database-MapReduce System
Trivedi et al. Codd: Constructing dataless databases
Gao et al. Automatic index selection with learned cost estimator
US11709858B2 (en) Mapping of unlabeled data onto a target schema via semantic type detection
CN112835920B (en) Distributed SPARQL query optimization method based on hybrid storage mode
US9378229B1 (en) Index selection based on a compressed workload
CN111797300A (en) Knowledge representation learning model based on importance negative sampling and negative sampling frame construction method
Tan et al. Query predicate selectivity using machine learning in Db2®
CN117390064B (en) Database query optimization method based on embeddable subgraph
Bajaj A survey on query performance optimization by index recommendation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination