CN113157541B - Multi-concurrency OLAP type query performance prediction method and system for distributed database - Google Patents

Multi-concurrency OLAP type query performance prediction method and system for distributed database Download PDF

Info

Publication number
CN113157541B
CN113157541B CN202110425574.3A CN202110425574A CN113157541B CN 113157541 B CN113157541 B CN 113157541B CN 202110425574 A CN202110425574 A CN 202110425574A CN 113157541 B CN113157541 B CN 113157541B
Authority
CN
China
Prior art keywords
query
sensitivity
time
delay
concurrent
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110425574.3A
Other languages
Chinese (zh)
Other versions
CN113157541A (en
Inventor
李晖
丁玺润
闵圣天
戴震宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guizhou Youlian Borui Technology Co ltd
Original Assignee
Guizhou Youlian Borui Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guizhou Youlian Borui Technology Co ltd filed Critical Guizhou Youlian Borui Technology Co ltd
Priority to CN202110425574.3A priority Critical patent/CN113157541B/en
Publication of CN113157541A publication Critical patent/CN113157541A/en
Application granted granted Critical
Publication of CN113157541B publication Critical patent/CN113157541B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3447Performance evaluation by modeling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • G06F16/24532Query optimisation of parallel queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • G06F16/24534Query rewriting; Transformation
    • G06F16/24542Plan optimisation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Computer Hardware Design (AREA)
  • Quality & Reliability (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Operations Research (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention relates to the technical field of data processing, and discloses a multi-concurrency OLAP type query performance prediction method and system for a distributed database. The invention discloses a multi-concurrency OLAP type query performance prediction method for a distributed database, which comprises the following steps: calculating interference degree, calculating sensitivity and predicting delay; a multi-concurrency OLAP type query performance prediction system for a distributed database comprises: the system comprises a query interference degree calculation module, a query sensitivity calculation module, a cache module and a query delay calculation module. Compared with the prior art, the method and the device have the advantages that resources are occupied in terms of query optimization, the algorithm part is clear and simple, the performance requirement is low, and the method and the device are easier to deploy in actual use, so that the practicability is ensured.

Description

Multi-concurrency OLAP type query performance prediction method and system for distributed database
Technical Field
The embodiment of the invention relates to the technical field of data processing, in particular to database data processing technology.
Background
Executing queries in parallel in a database can bring many advantages. For example, it can shorten the overall run time of multiple queries and increase the utilization of hardware, but for one of the concurrent queries, its execution time may be extended or shortened compared to its execution alone. The main reasons are the interaction among multiple queries, some of which can facilitate the execution of the query, and some of which extend the execution of the query due to resource competition with the query.
The concurrent query performance prediction has great application value for query scheduling control and the like, for example, if the query execution time can be known in advance, the sequence of a plurality of queries can be changed, and then the user SLA requirement is met. The accurate query performance prediction technique can also be used for query progress display to know the execution progress of the current query, and then the DBA can make the next decision to wait for the query to be executed or kill the query. Query performance prediction also has a guiding effect on the query optimizer, such as: the query optimizer may better create concurrent query aware query plans to shorten the overall execution time of the query.
Because the query performance prediction technology has great value, many researches are conducted on the aspect, the researches mainly face two types of queries, namely, an OLTP type query, the OLTP mainly refers to a plurality of transaction type queries with higher time requirements in a relational database, and generally, the execution time of the queries is shorter; and secondly, OLAP type inquiry is mainly applied to a data warehouse, and the data volume faced by the type inquiry is relatively large, and the execution time is relatively long. The text is primarily directed to OLAP type queries. There are some techniques available to make performance predictions for analytical queries, but these techniques have certain limitations in terms of practicality and extensibility.
The inventor finds that at least the following problems exist in the prior art: the prior art can predict the performance of the analysis type query, but the techniques have certain limitations in the aspects of practicability and expansibility.
Disclosure of Invention
The embodiment of the invention aims to provide a multi-concurrency OLAP type query performance prediction method and system for a distributed database, which enable query optimization to occupy less resources, thereby ensuring practicability, and the calculation process is clear and simple and easy to enhance, thereby ensuring expansibility.
In order to solve the technical problems, the embodiment of the invention provides a multi-concurrency OLAP type query performance prediction method for a distributed database, which comprises the following steps:
calculating the interference degree: based on the query request, calculating the occupation condition of the computing resources related to the query request to obtain the query interference degree;
calculating sensitivity: based on the query request, calculating by combining the query interference degree to obtain the query sensitivity;
prediction delay: a query delay is calculated based on the query sensitivity.
The computing resource occupation situation specifically includes: the time when the query requests are executed alone, the percentage of the total running time of the I/O time in the query requests, the I/O time shared by the master query and the concurrent query, the I/O time shared between the concurrent queries, and the network interference of the concurrent query to the master query.
In the step of calculating the interference degree, the interference degree is inquiredThe calculation is performed in the following manner:
wherein,time of individual execution for query request, +.>For the percentage of I/O time in the query request to total run time, +.>I/O time shared for master query and concurrent query, +.>I/O time shared between queries for and to +.>Network interference to the master query for concurrent queries.
The query sensitivity is a linear dependent variable of the query interference degree, and a plurality of groups of query sensitivity and query interference degree values are adopted for training to obtain a linear relation parameter; the query sensitivity for training is calculated based on the query delay, the time the query request is executed in the worst environment, and the time the query request is executed alone.
The inquiry delay is obtained through measurement and is used for training.
The linear relationship between the query sensitivity and the query interference is that,
c q,m =μ qq,m +b q
wherein mu q And b q Is a linear relationship parameter.
The query sensitivity for training is calculated by the following method:
wherein τ q,m For inquiring delay τ maxq Execution time, τ, for query requests in worst case environment minq The time of individual execution is requested for the query.
In the step of predicting the delay, the query delay is calculated based on the following formula:
wherein c q,m For query sensitivity τ q,m For inquiring delay τ maxq Execution time, τ, for query requests in worst case environment minq The time of individual execution is requested for the query.
The query request is a primary query and/or a concurrent query.
The embodiment of the invention also provides a multi-concurrency OLAP type query performance prediction system facing the distributed database, which comprises the following steps: a query interference degree calculation module, a query sensitivity calculation module, a cache module and a query delay calculation module, wherein,
the inquiry interference degree calculation module is used for executing calculation of the calculated interference degree;
the query sensitivity calculation module is used for executing calculation of the calculation sensitivity;
the caching module is used for caching query interference, query sensitivity and query delay data in the calculation process;
the query delay calculation module is used to perform the calculation of the predicted delay as described above.
Compared with the prior art, the method and the device have the advantages that resources are occupied in terms of query optimization, the algorithm part is clear and simple, the performance requirement is low, and the method and the device are easier to deploy in actual use, so that the practicability is ensured.
In addition, the data processing process of the embodiment of the invention is clear and easy to understand, and is easy to further improve subsequently, so that the invention is easy to enhance, and the expansibility is ensured.
In addition, the method and the device not only calculate quickly, but also calculate accurately the query delay embodying the query performance by taking network resource overhead into consideration.
The foregoing description is only an overview of the present invention, and is intended to be implemented in accordance with the teachings of the present invention in order that the same may be more clearly understood and to make the same and other objects, features and advantages of the present invention more readily apparent.
Drawings
One or more embodiments are illustrated by way of example and not limitation in the figures of the accompanying drawings.
Fig. 1 is a flowchart of a multi-concurrent OLAP type query performance prediction method for a distributed database according to the first to ninth embodiments of the present invention;
FIG. 2 is a schematic connection diagram of a distributed database oriented multi-concurrent OLAP query performance prediction system module according to a tenth embodiment of the present invention;
FIG. 3 is a flowchart of a method for predicting multi-concurrency OLAP query performance for a distributed database according to an eleventh embodiment of the present invention;
FIG. 4 is a graph comparing predicted query results of different I/O contention optimizations for experiments performed by a multi-concurrent OLAP query performance prediction method for a distributed database according to an eleventh embodiment of the present invention;
fig. 5 is a comparison chart of predicted query results of different concurrent query amounts of an experiment performed by a multi-concurrent OLAP type query performance prediction method for a distributed database according to an eleventh embodiment of the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the following detailed description of the embodiments of the present invention will be given with reference to the accompanying drawings. However, those of ordinary skill in the art will understand that in various embodiments of the present invention, numerous technical details have been set forth in order to provide a better understanding of the present application. However, the technical solutions claimed in the present application can be implemented without these technical details and with various changes and modifications based on the following embodiments. The following embodiments are divided for convenience of description, and should not be construed as limiting the specific implementation of the present invention, and the embodiments can be mutually combined and referred to without contradiction.
The first embodiment of the invention relates to a multi-concurrency OLAP type query performance prediction method for a distributed database. The core of the embodiment is that the query interference degree and the query sensitivity are calculated, and the query delay is obtained based on the calculation, so that the performance of database query (especially distributed database concurrent query) is accurately predicted, the network resource cost is considered by calculating the resource occupation condition, the query delay embodying the query performance is accurately calculated, and the complex models such as deep learning and the like are not adopted, so that the sufficient practicability and expansibility are realized.
The flow of the method in this embodiment is shown in fig. 1, and specifically includes the following steps: calculating the interference degree: based on the query request, calculating the occupation condition of the computing resources related to the query request to obtain the query interference degree;
calculating sensitivity: based on the query request, calculating by combining the query interference degree to obtain the query sensitivity;
prediction delay: a query delay is calculated based on the query sensitivity.
The above steps are divided, for clarity of description, and may be combined into one step or split into multiple steps when implemented, so long as they include the same logic relationship, and they are all within the protection scope of this patent; it is within the scope of this patent to add insignificant modifications to the algorithm or flow or introduce insignificant designs, but not to alter the core design of its algorithm and flow.
The second embodiment of the invention relates to a multi-concurrency OLAP type query performance prediction method for a distributed database. The second embodiment is substantially the same as the first embodiment, and in the second embodiment of the present invention, the calculation of the resource occupation situation specifically includes: the time when the query requests are executed alone, the percentage of the total running time of the I/O time in the query requests, the I/O time shared by the master query and the concurrent query, the I/O time shared between the concurrent queries, and the network interference of the concurrent query to the master query. In addition, those skilled in the art will appreciate that the above data may be obtained directly from the operating system by practical comparison during the calculation process using prior art means.
The third embodiment of the invention relates to a multi-concurrency OLAP type query performance prediction method for a distributed database. In the third embodiment, the interference degree is searched for in the step of calculating the interference degreeThe calculation is performed in the following manner:
wherein,time of individual execution for query request, +.>For the percentage of I/O time in the query request to total run time, +.>I/O time shared for master query and concurrent query, +.>I/O time shared between queries for and to +.>Network interference to the master query for concurrent queries.
The fourth embodiment of the invention relates to a multi-concurrency OLAP type query performance prediction method for a distributed database. In the fourth embodiment of the present invention, the query sensitivity is a linear dependent variable of the query interference degree, and a plurality of sets of query sensitivity and query interference degree values are adopted to train to obtain a linear relation parameter; the query sensitivity for training is calculated based on the query delay, the time the query request is executed in the worst environment, and the time the query request is executed alone.
The fifth embodiment of the invention relates to a multi-concurrency OLAP type query performance prediction method for a distributed database. The fifth embodiment is substantially the same as the fourth embodiment, and in the fifth embodiment of the present invention, the query delay for training is obtained by measurement. The measurement is specifically obtained by comparing the system practice before and after inquiry and performing subtraction calculation.
The sixth embodiment of the invention relates to a multi-concurrency OLAP (on-line analytical processing) type query performance prediction method for a distributed database. The sixth embodiment is substantially the same as the fourth embodiment, and in the sixth embodiment of the present invention, the linear relationship between the query sensitivity and the query interference degree is that,
c q,m =μ qq,m +b q
wherein mu q And b q Is a linear relationship parameter.
The seventh embodiment of the invention relates to a multi-concurrency OLAP type query performance prediction method for a distributed database. The seventh embodiment is substantially the same as the fourth embodiment, and in the seventh embodiment of the present invention, the query sensitivity for training is calculated as follows:
wherein τ q,m For inquiring delay τ maxq Execution time, τ, for query requests in worst case environment minq The time of individual execution is requested for the query.
The eighth embodiment of the invention relates to a multi-concurrency OLAP type query performance prediction method for a distributed database. The eighth embodiment is substantially the same as the first embodiment, and in the eighth embodiment of the present invention, the inquiry delay is calculated based on the following formula:
wherein c q,m For query sensitivity τ q,m For inquiring delay τ maxq Execution time, τ, for query requests in worst case environment minq The time of individual execution is requested for the query.
The ninth embodiment of the invention relates to a multi-concurrency OLAP type query performance prediction method for a distributed database. The ninth embodiment is substantially the same as the first embodiment, and in the ninth embodiment of the present invention, the query request is a master query and/or a concurrent query.
A tenth embodiment of the present invention relates to a multi-concurrent OLAP query performance prediction system for a distributed database, as shown in fig. 2, including: a query interference degree calculation module, a query sensitivity calculation module, a cache module and a query delay calculation module, wherein,
the inquiry interference degree calculation module is used for executing calculation of the calculated interference degree;
the query sensitivity calculation module is used for executing calculation of the calculation sensitivity;
the caching module is used for caching query interference, query sensitivity and query delay data in the calculation process;
the query delay calculation module is used to perform the calculation of the predicted delay as described above.
It is to be noted that this embodiment is a system example corresponding to the first embodiment, and can be implemented in cooperation with the first embodiment. The related technical details mentioned in the first embodiment are still valid in this embodiment, and in order to reduce repetition, a detailed description is omitted here. Accordingly, the related art details mentioned in the present embodiment can also be applied to the first embodiment.
It should be noted that each module in this embodiment is a logic module, and in practical application, one logic unit may be one physical unit, or may be a part of one physical unit, or may be implemented by a combination of multiple physical units. In addition, in order to highlight the innovative part of the present invention, units that are not so close to solving the technical problem presented by the present invention are not introduced in the present embodiment, but this does not indicate that other units are not present in the present embodiment.
The eleventh embodiment of the invention relates to a multi-concurrency OLAP type query performance prediction method for a distributed database. The eleventh embodiment, which is implemented in a specific production environment in combination with the first to ninth embodiments, is shown in fig. 3. In this embodiment, mathematical symbols are referred to, and the definition of the main symbols is shown in table 1.
TABLE 1 Primary symbol meanings
A. Calculation of query interference
The query interference level (CQI, concurrent Query Interference) is used to describe the quality of the current execution environment of the primary query, i.e. to describe the contention situation of the resources.
Assuming that a query combination is m, it includes a master query q and a query c= { C that is executed in parallel with the master query 1 ,c 2 ,…,c n The number of concurrent queries is n. First, get each concurrent query c i I/O and network resources required during the separate operation, at this point, no resource contention occurs. Then, the impact of each concurrent query competing for resources with the master query on the master query is estimated. Finally, the impact on the master query due to contention resources between concurrent queries is evaluated.
Baseline I/O refers to the benchmark I/O of a query, i.e., when a query is executed independently, its I/O time occupies a percentage of the total execution time, the greater the percentage, the more I/O resources are required for the query. By usingRepresenting a concurrent query c i The percentage of I/O in the system.
When a master query is executed together with a concurrent query, if one concurrent query scans a different table than the master query, the concurrent query will "interfere" with the master query because the different queries contend for I/O. When the concurrent query scans the same table as the main query, the interference is greatly reduced, and even the main query is promoted, because in the database, when one table is frequently scanned, the data of the table is stored in the shared cache, and then the data of the table is requested to be directly fetched from the shared cache, thereby avoiding repeated I/O operation.
Let t be the primary query q and the concurrent query c i Table of common scans. The following values are defined:
can seeThe shared I/O time is calculated below with values of only 0 and 1.
Wherein n represents the total number of scan tables required for main query and concurrent query, S t The time taken to scan the table t is shown. With select from table]Formal query statement acquisition scan table]The total time spent, i.e., the time of the scan table in the query statement execution time. In Greenplum, the data of the table is distributed to each node, and queries are executed in each node, so if multiple queries contain a common table, the time to repeatedly scan the table on disk can be "saved". Equation (3) calculates the time saved due to shared I/O.
In addition to considering the shared I/O of the primary query and the concurrent query, there is a need to measure the I/O impact between the concurrent queries. I.e., the primary query is executed in concert with two concurrent queries a and b, a and b save I/O time due to concurrent execution. First, define table t as concurrent query c i Table co-scanned with other non-master queries:
definition d t The number of concurrent queries for scan table t, where d t Must be greater than 1. In addition, since only the table scan case between concurrent queries is considered, the table t here cannot appear in the master query. The time saved by computing the shared I/O for concurrent queries is:
n in the above formula is also the total number of scan tables needed for the primary query and the concurrent query.
When the distributed database is oriented, data is distributed in each node in the cluster, and the table connection operation in the SQL query must occur data transmission, namely, the data is transmitted to one nodeThe data is migrated to another node. There are two ways in which data can be migrated in greenplus: broadcast and redistribution. Broadcasting is the transmission of data on one node to all other nodes so that each node has the complete data of a table. The redistribution is to calculate a hash value of the data of the table according to the association key, and then redistribute the hash value to each node. Assuming that the number of records of a table is N, the amount of data to be redistributed is N, the amount of data to be broadcast is n×the number of nodes, and the data migration amount of a connection operation can be calculated in the above manner. The total data migration amount of the main query is t q The migration data volume of concurrent query isDefining network interference of concurrent queries to a master query as:
as can be seen from the above, concurrent query c i The larger the data migration volume, the more interference to the primary query, and conversely, the smaller. This is because the network bandwidth of the system is constant, which necessarily affects the data transmission of the master query when there are other queries in the network to transmit data.
After the variables are obtained, a concurrent query c can be defined i Impact on the master query.
Equation (7) can be understood as a concurrent query c i The first half of equation (7) is that the primary query subtracts the time that the concurrent query shares I/O with the primary query, where in the case of network contention determination, when r ci The larger the time that the concurrent query shares I/O with the master query is, the shorter the time to contend for I/O is, in which case the query delay of the master query is extended. When (when)The smaller the resource competition between the concurrent query and the main query is, the smaller the delay effect on the main query is.
In one query combination, the CQI value of the primary query is defined as gamma q,m The calculation formula is as follows:
the above formula takes the concurrent queriesAverage value.
B. Calculation of query sensitivity
The query performance interval PR (Performance Range) refers to a range of query delay times, the values in this interval represent the execution time of the query in different environments, and the maximum value of the interval isRepresenting the execution time of the query in the worst resource environment. This document simulates the worst case by constantly reading large files and exchanging the transfer of these files between different nodes. Minimum value +.>Representing the delay time in the current environment when only this query is executed. The two values represent execution queries in the extreme execution environment, and the query execution time in the rest of the environments is within the query performance interval, and the PRP (Performance Range Point) value of the main query is defined as follows:
when knowing c q,m After the value of (2), the value is carried into the formula (9) to be reversely deduced to obtain tau q,m I.e., the query delay of the master query.
Given a query combination m and a master query q, the CQI value can be calculated using equation (8) and then the linear regression model is used to predict the performance of the query. To further illustrate the linear relationship between query performance and CQI, query sensitivity QS (Query Sensitivity) is introduced herein.
Assuming that CQI and PRP have a linear relationship, the following formula is defined:
c q,m =μ qq,m +b q (10)
wherein mu is q Is a slope, b q For the intercept, c q,m And gamma is equal to q,m Is a linear relationship.
In summary, the flow of the present embodiment is as follows:
firstly, generating a query combination m by utilizing LHS, wherein the query combination m comprises a main query q;
second, τ with respect to q is obtained separately minq 、τ maxq And τ q,m ,τ minq And τ maxq Is obtainable in advance, τ q,m Can be obtained from experimental data; then take formula (9) to get c q,m The method comprises the steps of carrying out a first treatment on the surface of the In this way, a large number (c) of test sets can be obtained from the experimentally generated test sets q,mq,m ) Value pairs;
third step, using the obtained (c q,mq,m ) The value pairs train a QS model (formula (10)) by using a regression method based on least square linearity to obtain a QS model of query q;
fourth, when q is in another query combination, predicting the query delay of q at that time, and calculating the CQI value gamma of q at that time q,m′
Fifth, obtaining c 'by the QS model generated in the third step' q,m′
Sixth step, c' q,m′ Substituting formula (9) again to obtain q query delay tau in m' query combination q,m′
An experiment was performed in an experiment of the eleventh embodiment of the present invention in a Greemplum distributed cluster, greemplum version 5.0.0-alpha+79a3598. The cluster has 4 nodes in total, a master node and three slave nodes, the slave nodes are mainly used for storing data and executing inquiry, and the master node is responsible for distributing inquiry and summarizing results. The hardware of the master node is configured into a 32GB memory, the CPU is a 4-core Intel (R) Xeon (R) CPU E5-2630 v2@2.60GHz, the memory 16GB of the slave node, the core number and the model number of the CPU are the same as those of the master node, four database examples are arranged in each slave node, and each database example is equivalent to a complete PostgreSQL database and is used for processing a part of data. The operating systems of the master node and the slave node are both centOS 7.4, and the linux kernel version is 3.10. The table and data are generated by TPC-DS, which is a decision-supporting benchmark. The data size used in the experiment is 50G, 10 templates in TPC-DS are selected to generate 10 queries for training and testing the model, the 10 queries are mainly I/O sensitive queries, the execution time is long, and the accuracy of the prediction model is improved.
The impact of the various components of the CQI on the error rate is first evaluated and then the CQI is used to predict the query delay. When the number of queries run simultaneously MPL (Multi-programming Level) is 3, the prediction error of each variable against the query delay is shown in fig. 4. In the figure:
baseline I/O refers to the reference I/O of a query, i.e., when a query is executed independently, its I/O time occupies a percentage of the total execution time, with a larger percentage indicating that the query requires more I/O resources
Positive I/O refers to the situation where concurrent queries "interfere" with the primary query, contending for I/O.
Concurrent I/O refers to the I/O time saved by Concurrent execution of a and b when a primary query is executed in conjunction with two Concurrent queries, a and b.
Network refers to the optimized I/O occupancy of the eleventh embodiment.
It can be seen that the error is large when only baseline I/O is used to predict query delay, and the error rate is significantly reduced when the factors of concurrent query interaction are added. The prediction accuracy is not obviously improved by considering the current I/O and network contention factors, so that the positive I/O is the main factor affecting the accuracy of the prediction model, and other factors can improve the accuracy by a small margin. In summary, the eleventh embodiment considers the main influencing factors between concurrent queries, and is a better predictive model.
For a particular query q, a query combination containing this query is found, and then q is the dominant query to construct the QS model of the eleventh embodiment, which is used to predict execution time and compare with actual execution time to obtain the result shown in fig. 5.
It can be seen that the different MPLs, except for queries 61 and 62, have errors below 25% and some can even reach below 20%. Also, the reason for the higher errors of queries 61 and 62 is that they are performed for a shorter time, resulting in a larger error. From the experimental results, the QS model can adapt to different query execution environments (different query combinations under different MPLs), so that the execution delay of the query can be predicted more accurately.
In summary, the experimental results show that most of the prediction error rate of the method can be maintained below 25%, and the delay time of the query can be predicted more accurately.
That is, it will be understood by those skilled in the art that all or part of the steps in implementing the methods of the embodiments described above may be implemented by a program stored in at least one storage medium, where the program includes several instructions for causing a device (which may be a single-chip microcomputer, a chip or the like) or a processor (processor) to perform all or part of the steps in the methods of the embodiments described herein. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
It will be understood by those of ordinary skill in the art that the foregoing embodiments are specific examples of carrying out the invention and that various changes in form and details may be made therein without departing from the spirit and scope of the invention.

Claims (7)

1. The multi-concurrency OLAP type query performance prediction method for the distributed database is characterized by comprising the following steps of:
calculating the interference degree: based on the query request, calculating the occupation condition of the computing resources related to the query request to obtain the query interference degree;
the inquiry interference degreeThe calculation is performed in the following manner:
wherein,time of individual execution for query request, +.>For the percentage of I/O time in the query request to total run time, +.>I/O time shared for master query and concurrent query, +.>I/O time shared between queries for and to +.>Network interference to the master query for concurrent queries;
calculating sensitivity: based on the query request, calculating by combining the query interference degree to obtain the query sensitivity;
the linear relationship between the query sensitivity and the query interference is that,
c q,m =μ qq,m +b q
wherein c q,m For inquiring sensitivity, gamma q,m To query the interference level, mu q And b q Is a linear relation parameter;
prediction delay: calculating a query delay based on the query sensitivity;
the query delay is calculated based on the following formula:
wherein c q,m For query sensitivity τ q,m For inquiring delay τ maxq Execution time, τ, for query requests in worst case environment minq The time of individual execution is requested for the query.
2. The multi-concurrency OLAP type query performance prediction method for a distributed database according to claim 1, wherein the computing resource occupation situation specifically includes: the time when the query requests are executed alone, the percentage of the total running time of the I/O time in the query requests, the I/O time shared by the master query and the concurrent query, the I/O time shared between the concurrent queries, and the network interference of the concurrent query to the master query.
3. The multi-concurrency OLAP type query performance prediction method for a distributed database according to claim 1, wherein the query sensitivity is a linear dependent variable of query interference, and a plurality of sets of query sensitivity and query interference value training are adopted to obtain a linear relation parameter; the query sensitivity for training is calculated based on the query delay, the time the query request is executed in the worst environment, and the time the query request is executed alone.
4. A multi-concurrent OLAP type query performance prediction method for a distributed database of claim 3, wherein the query delay used for training is measured.
5. The distributed database oriented multi-concurrent OLAP type query performance prediction method of claim 3, wherein the query sensitivity for training is calculated by:
wherein c q,m For query sensitivity τ q,m For inquiring delay τ maxq Execution time, τ, for query requests in worst case environment minq The time of individual execution is requested for the query.
6. The multi-concurrent OLAP type query performance prediction method for a distributed database of claim 1, wherein the query request is a primary query and/or a concurrent query.
7. The multi-concurrency OLAP type query performance prediction system for the distributed database is characterized by comprising the following components: a query interference degree calculation module, a query sensitivity calculation module, a cache module and a query delay calculation module, wherein,
the query interference calculation module is configured to perform the calculation of the calculated interference as claimed in claim 1;
a query sensitivity calculation module for performing the calculation of the calculated sensitivity of claim 1;
the caching module is used for caching query interference, query sensitivity and query delay data in the calculation process;
the query delay calculation module is configured to perform the calculation of the predicted delay of claim 1.
CN202110425574.3A 2021-04-20 2021-04-20 Multi-concurrency OLAP type query performance prediction method and system for distributed database Active CN113157541B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110425574.3A CN113157541B (en) 2021-04-20 2021-04-20 Multi-concurrency OLAP type query performance prediction method and system for distributed database

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110425574.3A CN113157541B (en) 2021-04-20 2021-04-20 Multi-concurrency OLAP type query performance prediction method and system for distributed database

Publications (2)

Publication Number Publication Date
CN113157541A CN113157541A (en) 2021-07-23
CN113157541B true CN113157541B (en) 2024-04-05

Family

ID=76869343

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110425574.3A Active CN113157541B (en) 2021-04-20 2021-04-20 Multi-concurrency OLAP type query performance prediction method and system for distributed database

Country Status (1)

Country Link
CN (1) CN113157541B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114143226B (en) * 2021-12-06 2024-01-19 上海沄熹科技有限公司 Dynamic cost calibration method and system for distributed database network delay

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102663114A (en) * 2012-04-17 2012-09-12 中国人民大学 Database inquiry processing method facing concurrency OLAP (On Line Analytical Processing)
CN103473260A (en) * 2013-06-25 2013-12-25 北京控制工程研究所 Concurrency OLAP (On-Line Analytical Processing)-oriented test data hierarchy cluster query processing system and method
CN105989161A (en) * 2015-03-04 2016-10-05 华为技术有限公司 Big data processing method and apparatus
CN107301206A (en) * 2017-06-01 2017-10-27 华南理工大学 A kind of distributed olap analysis method and system based on pre-computation
CN107633183A (en) * 2017-09-29 2018-01-26 东南大学 A kind of leaking data detection method based on query resultses susceptibility

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA2858652C (en) * 2011-12-30 2017-01-17 Intel Corporation Energy-efficient query optimization

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102663114A (en) * 2012-04-17 2012-09-12 中国人民大学 Database inquiry processing method facing concurrency OLAP (On Line Analytical Processing)
CN103473260A (en) * 2013-06-25 2013-12-25 北京控制工程研究所 Concurrency OLAP (On-Line Analytical Processing)-oriented test data hierarchy cluster query processing system and method
CN105989161A (en) * 2015-03-04 2016-10-05 华为技术有限公司 Big data processing method and apparatus
CN107301206A (en) * 2017-06-01 2017-10-27 华南理工大学 A kind of distributed olap analysis method and system based on pre-computation
CN107633183A (en) * 2017-09-29 2018-01-26 东南大学 A kind of leaking data detection method based on query resultses susceptibility

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
一种面向分布式数据库的多并发OLAP型查询性能预测方法;刘骥超等;计算机测量与控制;第3节 *

Also Published As

Publication number Publication date
CN113157541A (en) 2021-07-23

Similar Documents

Publication Publication Date Title
CN109376549B (en) Electric power transaction big data publishing method based on differential privacy protection
CN107037980B (en) Method, medium, and computer system for storing time series data
EP3736723B1 (en) Differentially private budget tracking using renyi divergence
CN105512264B (en) The performance prediction method that concurrent efforts load in distributed data base
US8799267B2 (en) Optimizing storage allocation
US10372711B2 (en) System and method predicting effect of cache on query elapsed response time during application development stage
CN105630881A (en) Data storage method and query method for RDF (Resource Description Framework)
CN111930848B (en) Data partition storage method, device and system
CN110704336B (en) Data caching method and device
CN113157541B (en) Multi-concurrency OLAP type query performance prediction method and system for distributed database
Liu et al. Forecasting the cost of processing multi-join queries via hashing for main-memory databases
Li et al. Kriging-based reliability analysis considering predictive uncertainty reduction
Kim et al. Does selective search benefit from WAND optimization?
CN109154933A (en) Distributed data base system and distribution and the method for accessing data
Bender et al. Cache-adaptive analysis
Jeong et al. A recentering approach for interpreting interaction effects from logit, probit, and other nonlinear models
US9934051B1 (en) Adaptive code generation with a cost model for JIT compiled execution in a database system
CN113704300A (en) Data imprinting technique for use with data retrieval methods
Ye et al. Parameters tuning of multi-model database based on deep reinforcement learning
CN109711555B (en) Method and system for predicting single-round iteration time of deep learning model
Awada et al. Cost Estimation Across Heterogeneous SQL-Based Big Data Infrastructures in Teradata IntelliSphere.
Cruz et al. Resource usage prediction in distributed key-value datastores
Serrano et al. From relations to multi-dimensional maps: A SQL-to-hbase transformation methodology
Wang et al. Skew‐aware online aggregation over joins through guided sampling
CN112612415B (en) Data processing method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant