CN111737281B - Database query method, device, electronic equipment and readable storage medium - Google Patents

Database query method, device, electronic equipment and readable storage medium Download PDF

Info

Publication number
CN111737281B
CN111737281B CN202010583303.6A CN202010583303A CN111737281B CN 111737281 B CN111737281 B CN 111737281B CN 202010583303 A CN202010583303 A CN 202010583303A CN 111737281 B CN111737281 B CN 111737281B
Authority
CN
China
Prior art keywords
data
time point
data statistics
time
service
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010583303.6A
Other languages
Chinese (zh)
Other versions
CN111737281A (en
Inventor
朱博帅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing QIYI Century Science and Technology Co Ltd
Original Assignee
Beijing QIYI Century Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing QIYI Century Science and Technology Co Ltd filed Critical Beijing QIYI Century Science and Technology Co Ltd
Priority to CN202010583303.6A priority Critical patent/CN111737281B/en
Publication of CN111737281A publication Critical patent/CN111737281A/en
Application granted granted Critical
Publication of CN111737281B publication Critical patent/CN111737281B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • G06F16/24534Query rewriting; Transformation
    • G06F16/24542Plan optimisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9537Spatial or temporal dependent retrieval, e.g. spatiotemporal queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/0601Electronic shopping [e-shopping]
    • G06Q30/0633Lists, e.g. purchase orders, compilation or processing
    • G06Q30/0635Processing of requisition or of purchase orders
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Business, Economics & Management (AREA)
  • Accounting & Taxation (AREA)
  • Finance (AREA)
  • Mathematical Physics (AREA)
  • Development Economics (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Operations Research (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention provides a database query method, a database query device, electronic equipment and a readable storage medium, aiming at improving data query efficiency. The database query method comprises the following steps: predicting data statistics corresponding to the current period according to the mapping relation between the data statistics of the database and the time; and generating an execution plan for the database query statement obtained in the current period according to the data statistical information corresponding to the current period, and processing the database query statement based on the execution plan. When corresponding data statistics information is determined for each period, the method takes the mapping relation between the data statistics information and time as a means, and does not need to frequently count the business data in the database, so that the data query efficiency can be effectively improved.

Description

Database query method, device, electronic equipment and readable storage medium
Technical Field
The present invention relates to the field of data processing technologies, and in particular, to a data query method, a data query device, an electronic device, and a readable storage medium.
Background
In recent years, with the development of data processing technology, more and more enterprises have begun to store and manage business data generated by the enterprises during the course of conducting business by means of databases. Taking the internet business as an example, the business data stored by the internet business via a database includes, but is not limited to: user portrait data, user history browsing data, user order data, user rating data, merchandise information data, merchandise inventory data, audio-video data, hardware performance data, and the like. During the business development, the enterprises not only need to log the generated business data into the database, but also need to write corresponding database query sentences according to business demands, such as writing structured query sentences (Structured Query Language, SQL), and then query corresponding business data from the database by processing the database query sentences.
In the related art, in order to more efficiently process a database query statement and thus more efficiently query target data from a database, it is generally necessary to generate an execution plan for the database query statement based on data statistics of business data in the database. When an execution plan is generated for a database query statement according to the data statistics information of the service data, the higher the timeliness of the data statistics information is, namely, the higher the statistics frequency of the service data is, the better the execution plan is, so that the more reasonable execution plan is generated, and the data query efficiency is effectively improved. On the other hand, too high statistics frequency of the service data is not desirable, that is, too frequent statistics operations occupy and consume too much hardware resources, so that the data query efficiency is affected due to the limitation of the hardware resources.
In any aspect, the low data query efficiency is a problem to be solved in the related art.
Disclosure of Invention
The embodiment of the invention aims to provide a database query method, a database query device, electronic equipment and a readable storage medium, aiming at improving data query efficiency. The specific technical scheme is as follows:
in a first aspect of an embodiment of the present invention, there is provided a database query method, including:
predicting data statistics corresponding to the current period according to the mapping relation between the data statistics of the database and the time;
and generating an execution plan for the database query statement obtained in the current period according to the data statistical information corresponding to the current period, and processing the database query statement based on the execution plan.
In a second aspect of the embodiment of the present invention, there is provided a database query apparatus, the apparatus including:
the statistical information determining module is used for predicting the data statistical information corresponding to the current period according to the mapping relation between the data statistical information of the database and the time;
and the query statement processing module is used for generating an execution plan for the database query statement obtained in the current period according to the data statistics information corresponding to the current period, and processing the database query statement based on the execution plan.
In a third aspect of the embodiment of the present invention, there is also provided an electronic device, including a processor, a communication interface, a memory, and a communication bus, where the processor, the communication interface, and the memory complete communication with each other through the communication bus;
a memory for storing a computer program;
and the processor is used for realizing the method steps of the first aspect of the embodiment of the invention when executing the program stored in the memory.
In yet another aspect of the present invention, there is also provided a computer readable storage medium having instructions stored therein, which when run on a computer, cause the computer to perform any of the database query methods described above.
In yet another aspect of the invention, there is also provided a computer program product containing instructions which, when run on a computer, cause the computer to perform any of the database query methods described above.
In the database query method provided by the invention, the data statistics information corresponding to the current period is predicted according to the mapping relation between the data statistics information of the database and the time. Then, for a database query statement obtained in the current period, an execution plan is generated for the database query statement according to the data statistics predicted for the current period, and the database query statement is processed according to the execution plan.
When corresponding data statistics information is determined for each period, the invention takes the mapping relation between the calling data statistics information and time as a means, and the statistics of the business data in the database is not needed frequently, and the hardware resources required to be occupied by the calling mapping relation are far lower than the hardware resources required to be occupied by the statistics business data. Therefore, the invention can obviously reduce the occupation amount of hardware resources caused when determining the data statistical information during implementation, on one hand, more hardware resources can be distributed to the query task, and the database query efficiency is improved. On the other hand, the frequency of determining the data statistics information can be improved, namely the time length of each period is shortened, so that the timeliness of the data statistics information is improved, a more reasonable execution plan is generated for the database query statement, and the database query efficiency is further improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below.
FIG. 1 is a flow chart of a data query method according to an embodiment of the present invention;
FIG. 2 is a flow chart of a data query method according to another embodiment of the present invention;
FIG. 3 is a flowchart illustrating determining a mapping relationship according to another embodiment of the present invention;
FIG. 4 is a flowchart illustrating the monitoring of the validity of the mapping relationship according to an embodiment of the present invention;
FIG. 5 is an interactive schematic diagram of a data query method according to an embodiment of the present invention;
FIG. 6 (a) is a schematic diagram of a database query apparatus according to an embodiment of the present invention;
FIG. 6 (b) is a schematic diagram of a database query apparatus according to another embodiment of the present invention;
fig. 7 is a schematic diagram of an electronic device according to an embodiment of the invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be described below with reference to the accompanying drawings in the embodiments of the present invention.
In the related art, when a database query engine obtains a database query language, in order to efficiently process the query language, thereby improving data query efficiency, an execution plan is generated for the database query language according to data statistics information of service data in a database, and the database query language is processed based on the execution plan.
For ease of understanding, take a take-away order database as an example, where take-away orders generated within 1 hour of the past are stored, and business data such as distribution distance and order amount are related to each take-away order. The method comprises the steps of counting business data in an order database at 12:00 minutes and 00 seconds in noon, and determining that a total of 3299 orders are included, wherein 2973 orders with a distribution distance of less than 3km occupy 326 orders with a distribution distance of more than or equal to 3 km; wherein the orders below 20 yuan account for 842 parts, the orders between 20 and 30 yuan account for 1555 parts, and the orders above 30 yuan account for 902 parts. After the data statistics are obtained and stored, if a database query statement is obtained at a certain time (for example, 12 points 11 minutes 32 seconds), the database query statement requires: and inquiring orders with the order amount of 20-30 yuan and the distribution distance of more than or equal to 3km from an order database. In order to improve the query efficiency, the database query engine generates the following execution plan according to the data statistics information: firstly screening orders with a distribution distance of more than or equal to 3km, and then further screening orders with an amount of 20-30 yuan from the screened orders with the distribution distance of more than or equal to 3 km.
The execution plan is a reasonable execution plan, and the execution plan needs to execute 3299+326 times of screening logic in total, wherein 3299 refers to screening orders with a distribution distance of more than or equal to 3km from 3299 orders, and 326 refers to further screening orders with an amount of 20-30 yuan from the screened orders with the distribution distance of more than or equal to 3 km.
To facilitate understanding of the rationality of the execution plan, an unreasonable execution plan is schematically re-enumerated. For example, the irrational execution plan may be: firstly screening orders with the sum of 20-30 yuan, and then further screening orders with the distribution distance being more than or equal to 3km from the screened orders with the sum of 20-30 yuan. The execution plan requires execution of 3299+1555 times of screening logic, wherein 3299 refers to screening orders with an amount of 20-30 yuan from 3299 orders, 1555 refers to further screening orders with a distribution distance of 3km or more from the screened 1555 orders with an amount of 20-30 yuan. The filtering logic number required to be executed by the execution plan is obviously more than that of the previous execution plan, and the comparison shows that the database query efficiency is high due to the fact that the filtering logic number required to be executed by the previous execution plan is less.
By way of introduction to the above examples, it can be seen that data statistics have an important impact on the generation of execution plans, and that the timeliness of the data statistics will greatly impact the rationality of the execution plans. Along with the above example, assume that statistics are made on business data in the take-away order database every 4 hours. The last statistical operation occurs at 12 pm for 00 minutes and 00 seconds, which results in and saves the data statistics. Assuming that a database query sentence is obtained at 14 minutes and 52 seconds in afternoon, since a take-out order recorded in a take-out order database may be greatly changed between 12 minutes and 00 seconds in afternoon and 14 minutes and 52 seconds in afternoon, data statistics information obtained at 12 minutes and 00 seconds in afternoon cannot accurately reflect the data distribution situation in the current order database, and if an execution plan is still generated for the database query sentence at 14 minutes and 28 minutes and 52 seconds in afternoon according to the data statistics information obtained at 12 minutes and 00 seconds in afternoon, it is difficult to ensure that the execution plan has high rationality.
In order to improve timeliness of the data statistics, in the related art, the frequency of statistics on the service data may be increased. For example, the statistics of the business data is increased from one time every 4 hours to one time every 0.5 hours, the data statistics information obtained by each statistics is saved, and the latest data statistics information is utilized to generate an execution plan for a newly obtained database query statement. However, the increase of the statistical frequency increases the occupation and consumption of hardware resources, which results in insufficient hardware resources for processing the database query statement itself, and slow down the data query efficiency. It can be seen that the low data query efficiency is a problem to be solved in the related art.
In order to improve database query efficiency, referring to fig. 1, fig. 1 is a flowchart of a data query method according to an embodiment of the present invention. As shown in fig. 1, the method comprises the steps of:
step S11: and predicting the data statistical information corresponding to the current period according to the mapping relation between the data statistical information of the database and the time.
Wherein, the data statistics of the database are used for characterization: distribution of traffic data in a database. More specifically, the data statistics of the database may refer to: data distribution in a given column of a data table in a database.
The time may be a continuous time or a time point sequence consisting of a plurality of discrete time points. The time has a length of time, for example the length of time of the time may be one day, one week or one month. In particular implementations of the invention, the time length of the time is typically equal to the time length of the service period of the service data in the database, or is equal to an integer multiple of the time length of the service period. It should be noted that, the service period is different from the period in the step S11, and the time length of the service period is longer than the time length of the period in the step S11. For example, the time length of the service period may be one day, and the time length of the period in the above step S11 may be 10 minutes, 0.5 hours, 2 hours, or the like, or the time length of the period in the above step S11 is not fixed but is less than one day. The following embodiments of the present invention will describe the relationship between the service period and the period in step S11, and will not be described herein.
In the invention, the mapping relation between the data statistical information and the time of the database can be specifically understood as: each time point corresponds to a set of data statistics that characterize the data distribution of a given column of the data table at that time point.
For ease of understanding, take simply as an example a take-away order database for recording take-away orders generated within the last 1 hour. The take-out order database includes a plurality of data tables, each row of each data table being for recording one take-out order. Each data table comprises a plurality of columns, wherein the first column is used for recording the order placing time of each take-out order, the second column is used for recording the distribution distance of each take-out order, the third column is used for recording the order amount of each take-out order, and the fourth column is used for recording the payment mode of each take-out order. For example, the data statistics of the database may be the data distribution of the second and third columns of the data table, i.e. the distribution of the distribution distance of the individual take-out orders and the distribution of the order amounts.
Referring to table 1, table 1 is a data statistics schematic table. As shown in table 1, the distribution distance was divided into 0.5km or less, 0.5 to 1km, 1 to 2km, 2 to 4km, 4 to 6km, 6km or more, and the number of orders in each distribution distance section was counted. As shown in table 1, the order amount is divided into 20 yuan or less, 20 to 30 yuan, 30 to 50 yuan, 50 to 100 yuan, 100 to 300 yuan, 300 yuan or more, and the order quantity of different order quota zones is counted respectively.
Table 1 data statistics schematic table
The data statistics shown in table 1 are only data statistics corresponding to one point in time. It should be noted that different time points correspond to different data statistics information respectively, so that a mapping relationship between the data statistics information and time is formed.
It should be noted that, the distribution distance and the order amount are divided in the same interval between different data statistics corresponding to different time points. In other words, the statistics of the data at different time points are all the statistics of the orders for the distribution distance intervals of 0.5km or less, 0.5 to 1km, 1 to 2km, 2 to 4km, 4 to 6km, 6km or more, and the statistics of the orders for the order amount intervals of 20 yuan or less, 20 to 30 yuan, 30 to 50 yuan, 50 to 100 yuan, 100 to 300 yuan, 300 yuan or more.
The difference of the data statistical information at different time points is reflected in the difference of the order placing quantity in the same distribution distance interval. For example, there are 652 orders with a distribution distance of less than 0.5km in the data statistics at the first time point, and 441 orders with a distribution distance of less than 0.5km in the data statistics at the second time point. Similarly, the difference of the data statistics information at different time points is also reflected in the difference of the order placing quantity in the same order amount interval. For example, there are 3463 orders between 20 and 30 in the data statistics at the first time point, and 2027 orders between 20 and 30 in the data statistics at the second time point.
The data statistics shown in table 1 are only examples, and the distribution distance section division method, the order amount section division method, the specific number of orders, and the like are not to be construed as limiting the present invention. In particular, the data statistics may be recorded in the form of a histogram, but the present invention is not limited to the recording form of the data statistics.
In the present invention, the period in the step S11 may have at least two setting manners:
the first setting mode is: the time length of the cycle is set to a fixed time length. For example, the time length of the cycle is set to 30 minutes, and thus every 30 minutes is regarded as one cycle.
The second setting mode is: according to the business characteristics of business data in the database, the business period of the business data is thinned into a plurality of smaller periods, and the respective time lengths of the thinned periods are not all the same. The time length of the period in the frequent change period of the service data is shorter, and the time length of the period in the slow change period of the service data is longer. These thinned periods are the periods described in the above step S11. For example, the service period of the service data is 1 day, the service data frequently changes from 10 points 00 minutes to 22 points 30 minutes in the day, and the service data slowly changes in other periods in the day. The time period of each cycle may be 10 minutes, and the other time periods of the day may be 30 minutes, from the time point of 10 minutes to the time point of 22 minutes, and may be 30 minutes.
The third setting mode is: the time period between the two previous and subsequent execution of the prediction operation (i.e., the operation of predicting the data statistics) is regarded as one cycle, instead of setting the time length exclusively. For example, when the time for performing the prediction operation once is 10 minutes and 17 seconds, and the time for performing the prediction operation this time is 11 minutes and 53 seconds, the time period between 10 minutes and 17 seconds and 11 minutes and 53 seconds is regarded as a period, and the data statistics information predicted by the last execution of the prediction operation is the data statistics information corresponding to the period.
In a specific implementation of the present invention, the first setting method or the second setting method of the above three setting methods is preferable.
In the specific implementation of the invention, the specific modes of predicting the data statistical information for the current period are different according to different period setting modes.
For the first setting manner and the second setting manner, when predicting the data statistics corresponding to the current period, specifically, based on a preset time point in the current period, determining the data statistics corresponding to the preset time point from the mapping relationship between the data statistics and time, and determining the data statistics corresponding to the preset time point as the data statistics corresponding to the current period. Wherein the preset time point is a start time point, an end time point, or a time point between the start time point and the end time point of the current period.
For ease of understanding, taking the case where the time length of the time in the map is equal to 1 day and the time length of the period is fixedly set to 30 minutes as an example, each day may be divided into 48 periods, the 48 periods being respectively: 0 point 0 minutes to 0 point 29 minutes, 0 point 30 minutes to 0 point 59 minutes, 1 point 0 minutes to 1 point 29 minutes … points 0 minutes to 23 points 29 minutes, 23 points 30 minutes to 23 points 59 minutes. If the current time is 08 point 21 minutes, the current period is the 17 th period of the day. If the current time just reaches point 08 for 30 minutes, then the current period just comes to the 18 th period of the day, at which point the data statistics may be predicted for the current period (i.e., the 18 th period of the day).
In the prediction, based on the point-08-30 score, the data statistics corresponding to the point-08-30 score can be determined from the mapping relation between the data statistics and the time, and the data statistics is taken as the data statistics of the current period (i.e. the 18 th period in the day). Alternatively, based on the point 08 score 59, the data statistics corresponding to the point 08 score 59 may be determined from the mapping relationship between the data statistics and time, and the data statistics may be used as the data statistics of the current period (i.e., the 18 th period of the day). Alternatively, based on a time point between point 08 and point 59, the data statistics information corresponding to the time point may be determined from the mapping relationship between the data statistics information and time, and the data statistics information may be used as the data statistics information of the current period (i.e., the 18 th period in the day).
With respect to the third setting manner, when predicting the data statistics corresponding to the current period, specifically, the time point when the prediction operation execution instruction is generated may be taken as a basis, the data statistics corresponding to the time point may be determined from the mapping relationship between the data statistics and the time, and the data statistics corresponding to the time point may be determined as the data statistics corresponding to the current period.
For ease of understanding, it is assumed that the database query engine generates one predicted operation execution instruction after every 1000 database query statements are executed. And when the prediction operation execution instruction is generated, determining the data statistics information corresponding to the generation time point from the mapping relation between the data statistics information and time by taking the generation time point of the instruction as a basis, and determining the data statistics information corresponding to the generation time point as the data statistics information corresponding to the current period. Wherein, the current period refers to: starting from the current point in time (i.e., the point in time of the generation of the above-described instructions) until after 1000 database query statements have been executed.
In the present invention, in particular, a prediction method (i.e., a method of predicting data statistics corresponding to the current period) corresponding to the first setting method and the second setting method is preferable.
Step S12: and generating an execution plan for the database query statement obtained in the current period according to the data statistical information corresponding to the current period.
Step S13: and processing the database query statement based on the execution plan.
In the specific implementation, after predicting and storing the data statistics information corresponding to the current period, when each database query statement is obtained in the current period, an execution plan is generated for the database query statement according to the data statistics information corresponding to the current period, and the database query statement is processed based on the execution plan.
Along with the above example, assuming that the current period is 08 point 30 minutes to 08 point 59 minutes, it is possible to generate an execution plan for a database query sentence based on the data statistics predicted for the current period in the above step S11 and process the database query sentence based on the execution plan when one database query sentence is obtained during 08 point 30 minutes to 08 point 59 minutes.
Or along with another example above, assume that the current period refers to: starting from the generation time point of the prediction operation execution instruction, and after 1000 database query sentences are executed. And generating an execution plan for each database query statement in the 1 st to 1000 th database query statements obtained successively in the current period according to the data statistical information predicted for the current period, and processing the database query statement based on the execution plan.
For ease of understanding, an order database is simply taken as an example, and orders generated within the past 1 hour are stored in the order database. Assuming that the current period is between point 08 and point 59, the data statistics determined for the current period by the above step S11 are shown in table 2, and table 2 is a schematic diagram of the data statistics, and a total of 5872 orders are involved in table 2.
Table 2 data statistics schematic table
If a database query statement is obtained at point 08 for 30 minutes 13 seconds, the database query statement requires: and inquiring orders with the sum of 50-100 yuan and the distribution distance of 1-2 km from an order database. Then, through the step S12 described above, the execution plan generated for the database query statement may be: firstly, screening orders with the sum of 50-100 yuan from all orders, and then further screening takeout orders from the screened orders with the sum of 50-100 yuan. The execution plan is a reasonable execution plan that requires about 5872+402 filtering logics, wherein 5872 is to screen out from 5872 orders with an amount of 50 to 100 yuan, and 402 is to further screen out from the screened out 402 orders with an amount of 50 to 100 yuan, an order with a distribution distance of 1 to 2 km.
The order database is called as about requirement because the order data in the order database is updated too fast, so that the data statistics information corresponding to one period and the actual data statistics information at any moment in the period can be guaranteed to have higher matching degree, but the data statistics information and the actual data statistics information can not be guaranteed to be completely matched. The number of filtering logics to be executed, which are represented in the execution plan, is therefore a more accurate estimate, and is therefore called "about needed".
If a database query statement is again obtained at point 08 for 30 minutes and 19 seconds, the database query statement requests an order from the order database with an amount between 20 and 30 yuan and a delivery distance of less than 0.5 km. Then, through the step S12 described above, the execution plan generated for the database query statement may be: firstly, screening orders with the distribution distance lower than 0.5km from all orders, and then further screening orders with the amount between 20 yuan and 30 yuan from the screened orders with the distribution distance lower than 0.5 km. The execution plan is a reasonable execution plan that requires about 5872+418 filter logic, where 5872 refers to filtering out 5872 orders with a distribution distance less than 0.5km, and 418 refers to further filtering out orders with an amount between 20-30 yuan from the filtered orders with a distribution distance less than 0.5 km.
It should be noted that, the present invention is not limited to a specific manner of generating an execution plan for a database query sentence using data statistics information and a specific manner of processing the database query sentence based on the execution plan. Without contradicting the present invention, an execution plan may be generated for a database query statement and the database query statement may be processed based on the execution plan in any manner, either existing or future. For example, when a database query statement includes a plurality of filtering conditions (e.g., a database query statement requires that an order with an order amount of 20 to 30 yuan and a delivery distance of 3km or more be queried from an order database, the database query statement includes two query conditions, namely, "the order amount of 20 to 30 yuan" and "the delivery distance of 3km or more" respectively), each filtering condition is sequentially processed according to the screening rate of each filtering condition from low to high.
By executing the database query method comprising the steps S11 to S13, the mapping relation between the data statistics information and the time is called when the corresponding data statistics information is determined for each period, so that the statistics of the service data in the database is not required frequently, and the hardware resources required for calling the mapping relation are far lower than the hardware resources required for statistics of the service data. Therefore, the invention can obviously reduce the occupation amount of hardware resources caused when determining the data statistical information during implementation, on one hand, more hardware resources can be distributed to the query task, and the database query efficiency is improved. On the other hand, the frequency of determining the data statistics information can be improved, namely the time length of each period is shortened, so that the timeliness of the data statistics information is improved, a more reasonable execution plan is generated for the database query statement, and the database query efficiency is further improved.
Referring to fig. 2, fig. 2 is a flowchart of a data query method according to another embodiment of the present invention. As shown in fig. 2, the method includes, in addition to the above step S11 and step S12, the following steps before performing the above step S11:
step S10: and generating and storing the mapping relation between the data statistical information of the database and the time.
After determining and saving the mapping relationship between the data statistics and the time, the mapping relationship may be invoked when the above step S11 is performed multiple times later.
For ease of understanding, taking the case where the time length of the time in the map is equal to 1 day and the time length of the period is fixedly set to 30 minutes as an example, each day is divided into 48 periods, and the 48 periods are respectively: 0 point 0 minutes to 0 point 29 minutes, 0 point 30 minutes to 0 point 59 minutes, 1 point 0 minutes to 1 point 29 minutes … points 0 minutes to 23 points 29 minutes, 23 points 30 minutes to 23 points 59 minutes. If the current time just reaches point 08 for 00 minutes, the current period just reaches the 17 th period in the day, at this time, a predetermined and stored mapping relation can be called, and based on a preset time point in the current period (namely, the 17 th period in the day), data statistical information corresponding to the preset time point is determined from the mapping relation, and the data statistical information corresponding to the preset time point is determined as the data statistical information corresponding to the current period (namely, the 17 th period in the day).
Over time, if the current time just reaches point 08 for 30 minutes, the current period just reaches the 18 th period of the day, at this time, the predetermined and stored mapping relationship may be called again, and based on the preset time point in the current period (i.e. the 18 th period of the day), the data statistics information corresponding to the preset time point is determined from the mapping relationship, and the data statistics information corresponding to the preset time point is determined as the data statistics information corresponding to the current period (i.e. the 18 th period of the day).
Referring to fig. 3, fig. 3 is a flowchart illustrating determining a mapping relationship according to another embodiment of the present invention. As shown in fig. 3, in a specific implementation of the present invention, the mapping relationship between the data statistics of the database and the time may be generated and stored by the following sub-steps:
substep S10-1: and determining the service period of the service data according to the service characteristics of the service data recorded in the database.
The service data refers to data generated during the service development period, and the service data changes periodically along with the service and presents the characteristic of periodic fluctuation. The invention does not limit the kind of service data, for example, the service data may be: user volume data, user portrait data, user history browsing data, user order data, user rating data, merchandise information data, merchandise inventory data, audio-video data, or hardware performance data, etc.
Wherein the service characteristic refers to the type of service. For example, business data is generated during the development of a take-away business, and the business characteristic of the business data is the take-away business type. In determining the business period of the business data, the period of the take-away business may be determined as the period of the business data by analyzing the period of the take-away business. For example, by analysis or a priori knowledge, the take-out order volume is at a peak in the day between 11 and 13 and 17 and 20 points per day, while at other times of the day take-out orders are relatively small. Thus, the service period of the take-out service can be determined to be 1 day, and further the service period of the service data of the take-out service can be determined to be 1 day.
In addition, for some services which are difficult to quickly determine the service period through analysis or priori knowledge, such as e-commerce services of a comprehensive e-commerce platform, a coordinate graph of the service data can be made by collecting service data for a period of time, wherein the abscissa in the coordinate graph is time, and the ordinate is service data. By observing the coordinate graph, the fluctuation rule of the service data is determined, so that the service period of the service data is determined.
For some service data without a service period, the method is not applicable to the invention. In other words, the invention is applicable to service data with service period, and the invention can determine the mapping relation between the data statistical information and time of the service data with service period.
Substep S10-2: and continuously counting the service data in at least one service period to obtain multiple groups of data statistic information of the service data, wherein the multiple groups of data statistic information are time sequence data.
In the specific implementation of the invention, the service data can be counted once every fixed time length in at least one service period, and a group of data statistic information of the service data is obtained. After the statistics of the at least one service period, a plurality of sets of data statistics are obtained. Wherein the fixed time length is smaller than the service period, for example, the fixed time length may be 1/3600 of the service period, and if statistics of one service period are performed, 3600 sets of data statistics may be obtained. If statistics of two service periods are passed, 7200 sets of data statistics can be obtained.
For ease of understanding, assume that after going through the substep S10-1 described above, the business cycle of the take-out order data is determined to be 1 day, taking a statistical example for the take-out order data in the take-out order database. Then, when the above substep S10-2 is performed, statistics may be performed on the order data at 1 minute intervals from 0 point on a certain day, to obtain a set of data statistics. The statistics of 1440 sets of data were obtained in total from the statistics of one day up to 23 points 59 points of the day. The data statistics information of each group relates to the order quantity of different distribution distance intervals and also relates to the order quantity of different order amount intervals. Each group of data statistics information can be shown in table 1 or table 2, and in different groups of data statistics information, the distribution distance intervals are divided in the same way, and the order amount intervals are divided in the same way, but the number of orders in the same distribution distance interval is different, and the number of orders in the same order amount interval is also different.
The time series data is called as a plurality of sets of data statistics information because the plurality of sets of data statistics information correspond to different statistics times respectively, and the plurality of sets of data statistics information are a series of data with time sequence, so the time series data are called as the time series data.
Substep S10-3: and determining the mapping relation between the data statistics information of the service data and time according to the plurality of groups of data statistics information, and storing the mapping relation, wherein the time length of the time is equal to or longer than that of the service period.
In the specific implementation of the invention, if the time in the expected mapping relationship is formed by discrete time points, a plurality of statistical time points and data statistical information corresponding to each statistical time point can be directly determined as the mapping relationship. In the mapping relation, each statistic time point corresponds to a group of data statistic information.
Or if the time in the expected mapping relationship is a continuous time, the multiple sets of data statistics information counted in the sub-step S10-2 may be fitted to obtain a function with the independent variable being time and the dependent variable being data statistics information, and the function is used as the mapping relationship.
For ease of understanding, each set of data statistics obtained through the above substep S10-2, for example, relates to the number of orders with a distribution distance of 0.5km or less, the number of orders with a distribution distance of 0.5 to 1km, the number of orders with a distribution distance of 1 to 2km, the number of orders with a distribution distance of 2 to 4km, the number of orders with a distribution distance of 4 to 6km, and the number of orders with a distribution distance of 6km or more. And then fitting a function corresponding to each distribution distance interval according to the order quantity of each distribution distance interval, wherein the function takes time as an independent variable and takes the order quantity of the distribution distance interval as a dependent variable. For example, a function is fitted to the number of orders with a distribution distance of 0.5km or less by using a least square method, and the independent variable in the function is time and the dependent variable is the number of orders with a distribution distance of 0.5km or less.
In addition, each set of data statistics obtained in the above substep S10-2 also relates to an order quantity of 20 yuan or less, an order quantity of 20 to 30 yuan, an order quantity of 30 to 50 yuan, an order quantity of 50 to 100 yuan, an order quantity of 100 to 300 yuan, and an order quantity of 300 yuan or more. And then fitting a function corresponding to each order amount interval according to the order amount of each order amount interval, wherein the function takes time as an independent variable and takes the order amount of the order amount interval as a dependent variable. For example, a function is fitted by a least square method to the number of orders with an order amount of 20 yuan or less, and the independent variable in the function is time, and the dependent variable is the number of orders with an order amount of 20 yuan or less.
In this way, a total of 12 functions are determined, which are the mapping between the data statistics and time. Given a point in time, the number of orders for different distribution distance intervals, and the number of orders for different order amount intervals, can be predicted by the 12 functions, respectively.
Furthermore, the 12 functions can be integrated into a total function, the independent variables of which are the time point and the target data interval. The target data interval may be, for example: 0.5 to 1km, 30 to 50 yuan, etc. Given a point in time and at least one target data interval, the data of a given target data interval at the given point in time can be predicted by the overall function.
In the finally determined mapping relation, the time length of the time is equal to or longer than the time length of the service period. In some embodiments of the invention, the time length of the time is equal to an integer multiple of the time length of the traffic cycle.
For ease of understanding, a service period equal to 1 day is taken as an example, and a time period equal to 1 day is taken as a continuous time period. The starting point of the time was 0 point 0 minutes, and the ending point was 23 point 59 minutes. The corresponding time point can be determined in time for any time point of any day during the service development, so that the data statistics corresponding to the time point can be predicted. For example, at the time point of 10 points 30 minutes on 3 months 8 days, the corresponding time point (namely, 10 points 30 minutes) can be found in time based on the time point of 10 points 30 minutes, and then the data statistical information of the corresponding time point in the mapping relation is predicted to be the data statistical information of the period of 3 months 8 days, 10 points 30 minutes. After the day of 3 months and 8 days is ended, the current time comes to 0 point of 3 months and 9 days, the corresponding time point (namely 0 point) can be found in time based on the time point of 0 point, and then the data statistical information of the corresponding time point in the mapping relation is predicted to be the data statistical information of the period of 3 months and 9 days and 0 point.
The invention determines and saves the mapping relation between the data statistics information and time of the database by executing the step S10 in advance. Thus, during the development of the service, the mapping relationship may be invoked every cycle as described in step S11, so that the data statistics are predicted for the current cycle in step S11.
It is also contemplated that during service development, if the business rules are adjusted, a coordinated change in the business cycle may be caused. Once the service period changes, the predetermined and stored mapping relationship will be disabled. For ease of understanding, take-away business as an example. Assuming that the period of the take-out service is 1 day before the service rule of the take-out service is adjusted, the time length of the time in the predetermined and stored mapping relationship is also equal to 1 day. If the business rules of the takeout business are adjusted at this time, a work meal welfare activity per Wednesday is deduced, which applies half-value offers to takeout orders of Wednesday. The order volume at three weeks will be significantly elevated, with the order volume at three weeks being significantly higher than the order volume at other dates. Thus, the period of take-out business will vary from 1 day to 1 week. If the data statistics are still predicted for the current period in step S11 according to the predetermined and stored mapping relationship, a large gap exists between the predicted data statistics and the actual data statistics. In particular, on the three days of the week, the actual data statistics in the take-out order database cannot be truly reflected by using the data statistics respectively predicted for each period on the three days of the week according to the predetermined and stored mapping relationship. In this way, when generating an execution plan for a database query statement based on predicted data statistics, it is difficult to ensure the rationality of the execution plan.
To this end, in some embodiments of the present invention, the validity of the mapping relationship may be monitored during processing of each database query statement using the mapping relationship after determining and saving the mapping relationship between the data statistics of the database and the time, i.e. during repeatedly executing the above-mentioned step S11 and step S12.
The invention can monitor the validity of the mapping relation in the following way when being concretely implemented: and monitoring the validity of the mapping relation according to the data statistics information predicted by using the mapping relation as a target time point and the actual data statistics information corresponding to the target time point, wherein the target time point is a pre-designated time point.
In the invention, the data statistical information predicted by the mapping relation for a target time point is used as a predicted value, the actual data statistical information corresponding to the target time point is used as a true value, and the validity of the mapping relation is monitored according to the similarity between the predicted value and the true value by comparing the predicted value with the true value.
Referring to fig. 4, fig. 4 is a flowchart illustrating the monitoring of the validity of the mapping relationship according to an embodiment of the present invention. As shown in fig. 4, the monitoring process includes the following steps:
Step S41: and obtaining data statistical information predicted for the current target time point by using the mapping relation every preset time length, and obtaining actual data statistical information corresponding to the current target time point.
In order to obtain the actual data statistics information corresponding to the current target time point, the statistics operation may be performed on the service data in the database at the current target time point, so as to obtain the actual data statistics information.
As described above, since the statistics operation on the service data needs to occupy more hardware resources, the statistics operation period affects the data query efficiency, so that the statistics operation can be performed in a frequency manner as much as possible. In other words, the preset time period in the above step S41 is set to a longer time period.
In a specific implementation of the present invention, the preset time length in the step S41 may be set to N times the time length of the period in the step S11, where N is an integer greater than 1. For example, the time length of the cycle in the above step S11 is set to 30 minutes, and N is set to 8, that is, the preset time length in the above step S41 is set to 4 hours. And obtaining data statistical information predicted for the current time point by using the mapping relation every four hours, and obtaining actual data statistical information corresponding to the current time point.
Taking the example of a service period equal to 1 day, referring to table 3, table 3 is a schematic record of the operations performed at various time points in the day. As shown in table 3, every 30 minutes in one day, it is necessary to predict the data statistics of the current period using a predetermined and stored mapping relationship. Every 4 hours in a day, a statistics operation needs to be performed on the service data in the database to obtain actual data statistics information.
Table 3 record schematic form of operations performed at various time points in the day
In table 3, "-" indicates that no statistical operation is performed, taking 0 point 30 as an example, at which point a predictive operation needs to be performed but no statistical operation needs to be performed.
As shown in table 3, when the above step S11 is performed at 0 point per day, the data statistics corresponding to 0 point are determined from the mapping relation based on 0 point, and are used as the data statistics of the current period (i.e., the period from 0 point to 0 point 30 point). Meanwhile, taking the 0 point as a target time point, and executing statistical operation on the business data in the database at the 0 point to obtain actual data statistical information. Thus, the predicted data statistics information and the actual data statistics information corresponding to the 0 point and the 0 point are obtained.
In the same manner, 4 point 0, 8 point 0, 12 point 0, 16 point 0 and 20 point 0 are also taken as target time points, and predicted data statistics and actual data statistics corresponding to 4 point 0, predicted data statistics and actual data statistics corresponding to 8 point 0, predicted data statistics and actual data statistics corresponding to 12 point 0, predicted data statistics and actual data statistics corresponding to 16 point 0, and predicted data statistics and actual data statistics corresponding to 20 point 0 are obtained.
Step S42: and determining the similarity between the predicted data statistical information and the actual data statistical information corresponding to the current target time point.
In the specific implementation of the invention, the predicted data statistical information corresponding to the current time point can be used as one vector, the actual data statistical information corresponding to the current time point is used as the other vector, then the vector distance between the two vectors is calculated, and the calculated vector distance is used as the similarity between the predicted data statistical information and the actual data statistical information.
The vector distance may be euclidean distance (Euclidean Distance), cosine distance, chebyshev distance (Chebyshev distance), etc., and the specific kind and calculation mode of the vector distance are not limited in the present invention.
Step S43: and under the condition that the similarity determined for N times is lower than a preset threshold value, determining that the mapping relation fails, wherein N is a preset integer greater than or equal to 1.
The value of N may be preset or may be changed at any time during the service development. Similarly, the preset threshold value in step S43 may be preset, or may be changed at any time during the service development.
For ease of understanding, table 3 above is followed and N is equal to 3, with the preset threshold being equal to 0.8. Assume that at the time points of 0 point, 4 point 0 point, 8 point 0 point, 12 point 0 point, 16 point 0 point, etc., of the current day, the respective determined similarities are equal to 0.76, 0.85, 0.82, 0.65, 0.72, respectively, and the similarity determined at the current time point (20 point 0 point), etc., are equal to 0.67. Since the similarity determined at three consecutive time points of 12 points 0 minutes, 16 points 0 minutes, and 20 points 0 minutes is smaller than the preset threshold value 0.8, the mapping relationship is determined to be invalid.
When the method is concretely implemented, under the condition that the mapping relation is determined to be invalid, the mapping relation between the data statistical information of the database and the time can be redetermined, and the redetermined mapping relation is stored.
As described above, the service period is changed due to the modification of the service rule, and thus the predetermined mapping relationship is disabled. Thus, in order to redetermine the mapping relationship between the data statistics and time of the database, the service period of the service data in the database may be redetermined, for example; and then, based on the redetermined service period, redetermining the mapping relation between the data statistical information of the database and the time. Wherein the time length of the time in the redetermined mapping relationship is equal to or greater than the time length of the redetermined service period.
For the specific implementation steps of the redetermining the mapping relationship, reference may be made to the descriptions of the sub-steps S10-1 to S10-3, and the description of the present invention is omitted here.
Further, during the redetermining of the mapping relationship between the data statistics of the database and the time, since the previously determined mapping relationship has failed, the newly obtained database query sentence cannot be processed based on the previously determined mapping relationship, in other words, the above-described step S11 and step S12 cannot be performed based on the previously determined mapping relationship.
In order not to influence the normal execution of the database query task, the method and the device can generate an execution plan for the database query statement according to a preset alternative mode when the database query statement is obtained during the period of redetermining the mapping relation between the data statistical information and the time of the database, and process the database query statement based on the execution plan.
Wherein, the alternative mode refers to: in a manner different from the above-described steps S11 and S12. More specifically, the alternatives refer to: the manner in which an execution plan is generated for a database query statement can be made without using the mapping relationship between data statistics and time.
By way of example, the invention may be embodied in the following alternatives: and counting the business data in the database at regular time, so as to obtain the data statistic information of the database. And generating an execution plan for the obtained database query statement by utilizing the data statistical information before the next statistics of the business data in the database.
For ease of understanding, the present invention, illustratively, makes statistics on business data in a database every 1 hour during the re-determination of the mapping relationship between the data statistics of the database and time. Assume that at 20 pm a mapping between 0 minutes is determined to fail and the database data statistics and time is initially redetermined. And when the point is 0 minutes at 20 points, the business data in the database are counted, so that the data statistics information of the database is obtained, and the data statistics information is stored. For database query sentences obtained during the 20-point 0 to 21-point 0 minutes, execution plans are generated for these database query sentences based on the data statistics obtained at the 20-point 0 minutes. When the time comes to 21 point and 0 point, the business data in the database is counted again, so that the data statistic information of the database is obtained, and the data statistic information is stored. For database query sentences obtained during the 21-point 0 to 22-point 0 minutes, execution plans are generated for these database query sentences based on the data statistics obtained at the 21-point 0 minutes. And so on.
Referring to fig. 5, fig. 5 is an interaction schematic diagram of a data query method according to an embodiment of the present invention. As shown in fig. 5, the data query method proposed by the present invention may be performed among a database query engine, a first daemon (daemon), and a second daemon (daemon). The first daemon is used for monitoring the validity of the mapping relation, and the second daemon is used for determining the mapping relation again under the condition that the mapping relation is invalid.
As shown in fig. 5, each time the first daemon waits for a preset time length, a similarity between predicted data statistics information and actual data statistics information corresponding to a current time point is determined, so that the validity of the mapping relationship is monitored according to the similarity. If the first daemon determines that the mapping is invalid at a certain point in time, the database query engine is immediately notified, so that the database query engine can immediately mark the mapping saved earlier as invalid. In addition, the first daemon process can send out an alarm, so that the database manager can redetermine the service period of the service data after receiving the alarm, input the redetermined service period to the second daemon process, and send out an instruction for redetermining the mapping relation to the second daemon process.
As shown in fig. 5, after receiving the instruction for redefining the mapping relationship, the second daemon starts redefining the mapping relationship between the data statistics of the database and the time based on the redetermined service period.
As shown in fig. 5, the database query engine periodically predicts the latest data statistics based on the mapping relationship it maintains, and maintains the data statistics. In other words, the database query engine periodically executes the above step S11, and predicts the data statistics corresponding to the current period according to the mapping relationship between the data statistics of the database and the time.
As shown in fig. 5, the database query engine also continually processes database query statements. After the database query engine obtains a database query statement (e.g., structured query statement Structured Query Language, abbreviated as SQL), it is first determined whether the existing mapping is in a valid state. If the existing mapping relationship remains in a valid state, the database query engine may generate an execution plan for the database query statement based on the latest predicted data statistics (i.e., the data statistics predicted for the current period), and process the database query statement based on the generated execution plan.
If the existing mapping is marked as invalid, it is determined that the existing mapping has failed. In this way, an execution plan may be generated for the database query statement according to a preset alternative manner, and the database query statement may be processed based on the generated execution plan.
As shown in FIG. 5, after the second daemon re-determines the mapping between the data statistics and time of the database, the re-determined mapping is passed to the database query engine. The database query engine maintains the redetermined mapping. In this way, the database query engine may process the database query statement in the manner of step S11 and step S12 described above based on the mapping relationship.
Based on the same inventive concept, an embodiment of the present invention provides a database query apparatus. Referring to fig. 6 (a), fig. 6 (a) is a schematic diagram of a database query device according to an embodiment of the invention. As shown in fig. 6 (a), the apparatus includes:
the statistical information determining module 61 is configured to predict data statistical information corresponding to the current period according to a mapping relationship between data statistical information and time of the database;
an execution plan generating module 62, configured to generate an execution plan for a database query statement obtained in the current period according to the data statistics information corresponding to the current period;
A query statement processing module 63, configured to process the database query statement based on the execution plan.
Optionally, the statistical information determining module 61 in the device is specifically configured to determine, based on a preset time point in the current period, data statistical information corresponding to the preset time point from the mapping relationship, and determine the data statistical information corresponding to the preset time point as the data statistical information corresponding to the current period; wherein the preset time point is a start time point, an end time point, or a time point between the start time point and the end time point of the current period.
Optionally, referring to fig. 6 (b), fig. 6 (b) is a schematic diagram of a database query apparatus according to another embodiment of the present invention. As shown in fig. 6 (b), the apparatus may further include, in addition to the statistical information determination module and the query statement processing module:
the mapping relation determining module 60 is configured to determine and store a mapping relation between the data statistics information of the database and the time before predicting the data statistics information corresponding to the current period according to the mapping relation between the data statistics information of the database and the time.
Optionally, the mapping relation determining module 60 in the device is specifically configured to determine a service period of the service data according to a service characteristic of the service data recorded in the database; continuously counting the service data in at least one service period to obtain a plurality of groups of data statistic information of the service data, wherein the plurality of groups of data statistic information are time sequence data; and determining the mapping relation between the data statistics information of the service data and time according to the plurality of groups of data statistics information, and storing the mapping relation, wherein the time length of the time is equal to or longer than that of the service period.
Optionally, as shown in fig. 6 (b), the apparatus may further include:
the mapping relation monitoring module 64 is configured to monitor validity of the mapping relation according to data statistics information respectively predicted for at least one time point by using the mapping relation and actual data statistics information respectively corresponding to the at least one time point after determining and saving the mapping relation between the data statistics information and time of the database;
the mapping relation determining module 60 in the device is further configured to redefine the mapping relation between the data statistics information of the database and the time and store the redetermined mapping relation in case that the mapping relation fails.
Optionally, the mapping relation monitoring module 64 in the device is specifically configured to obtain, at intervals of a preset time length, data statistics information predicted for a current time point by using the mapping relation, and obtain actual data statistics information corresponding to the current time point; determining the similarity between the predicted data statistical information and the actual data statistical information corresponding to the current time point; and under the condition that the similarity determined for N times is lower than a preset threshold value, determining that the mapping relation fails, wherein N is a preset integer greater than or equal to 1.
Optionally, the mapping relationship determining module 60 in the device is specifically configured to, when determining the mapping relationship again, determine a service period of the service data again according to the latest service characteristic of the service data recorded in the database; based on the redetermined service period, redetermining a mapping relationship between the data statistics of the database and the time, wherein a time length of the time in the redetermined mapping relationship is equal to or greater than a time length of the redetermined service period.
Optionally, as shown in fig. 6 (b), the apparatus may further include:
the query term alternative processing module 65 is configured to, during redetermining the mapping relationship between the data statistics information and the time of the database, generate an execution plan for the database query term according to a preset alternative manner when obtaining the database query term, and process the database query term based on the execution plan.
The embodiment of the present invention further provides an electronic device, as shown in fig. 7, including a processor 701, a communication interface 702, a memory 703 and a communication bus 704, where the processor 701, the communication interface 702, and the memory 703 perform communication with each other through the communication bus 704,
a memory 703 for storing a computer program;
the processor 701 is configured to execute the program stored in the memory 703, and implement the following steps:
predicting data statistics corresponding to the current period according to the mapping relation between the data statistics of the database and the time;
and generating an execution plan for the database query statement obtained in the current period according to the data statistical information corresponding to the current period, and processing the database query statement based on the execution plan.
Alternatively, the processor 701, when executing the program stored on the memory 703, implements the steps included in the other method embodiments of the present invention described above.
The communication bus mentioned by the above terminal may be a peripheral component interconnect standard (Peripheral Component Interconnect, abbreviated as PCI) bus or an extended industry standard architecture (Extended Industry Standard Architecture, abbreviated as EISA) bus, etc. The communication bus may be classified as an address bus, a data bus, a control bus, or the like. For ease of illustration, the figures are shown with only one bold line, but not with only one bus or one type of bus.
The communication interface is used for communication between the terminal and other devices.
The memory may include random access memory (Random Access Memory, RAM) or non-volatile memory (non-volatile memory), such as at least one disk memory. Optionally, the memory may also be at least one memory device located remotely from the aforementioned processor.
The processor may be a general-purpose processor, including a central processing unit (Central Processing Unit, CPU for short), a network processor (Network Processor, NP for short), etc.; but also digital signal processors (Digital Signal Processing, DSP for short), application specific integrated circuits (Application Specific Integrated Circuit, ASIC for short), field-programmable gate arrays (Field-Programmable Gate Array, FPGA for short) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.
In yet another embodiment of the present invention, a computer readable storage medium is provided, in which instructions are stored, which when run on a computer, cause the computer to perform the database query method according to any of the above embodiments.
In yet another embodiment of the present invention, a computer program product comprising instructions, which when run on a computer, causes the computer to perform the database query method of any of the above embodiments is also provided.
In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, produces a flow or function in accordance with embodiments of the present invention, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in or transmitted from one computer-readable storage medium to another, for example, by wired (e.g., coaxial cable, optical fiber, digital Subscriber Line (DSL)), or wireless (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid State Disk (SSD)), etc.
It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
In this specification, each embodiment is described in a related manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for system embodiments, since they are substantially similar to method embodiments, the description is relatively simple, as relevant to see a section of the description of method embodiments.
The foregoing description is only of the preferred embodiments of the present invention and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention are included in the protection scope of the present invention.

Claims (8)

1. A method of database query, the method comprising:
determining a service period of service data according to service characteristics of the service data recorded in a database;
continuously counting the service data in at least one service period to obtain a plurality of groups of data statistic information of the service data, wherein the plurality of groups of data statistic information are time sequence data;
determining a mapping relation between the data statistics information of the service data and time according to the plurality of groups of data statistics information, and storing the mapping relation, wherein the time length of the time is equal to or longer than that of the service period;
based on a preset time point in a current period, determining data statistics information corresponding to the preset time point from the mapping relation, and determining the data statistics information corresponding to the preset time point as the data statistics information corresponding to the current period;
Wherein the preset time point is a start time point, an end time point, or a time point between the start time point and the end time point of the current period;
generating an execution plan for a database query statement obtained in the current period according to the data statistical information corresponding to the current period;
and processing the database query statement based on the execution plan.
2. The method of claim 1, wherein after determining the mapping relationship between the data statistics and time of the traffic data, the method further comprises:
according to the data statistics information predicted by using the mapping relation as a target time point and the actual data statistics information corresponding to the target time point, monitoring the validity of the mapping relation, wherein the target time point is a pre-designated time point;
and under the condition that the mapping relation is invalid, the mapping relation between the data statistical information of the database and the time is redetermined, and the redetermined mapping relation is stored.
3. The method according to claim 2, wherein the step of monitoring the validity of the mapping relationship based on the data statistics predicted for the target time point using the mapping relationship and the actual data statistics corresponding to the target time point comprises:
Obtaining data statistical information predicted for a current target time point by using the mapping relation every preset time length, and obtaining actual data statistical information corresponding to the current target time point;
determining the similarity between the predicted data statistics information and the actual data statistics information corresponding to the current target time point;
and under the condition that the similarity determined for N times is lower than a preset threshold value, determining that the mapping relation fails, wherein N is a preset integer greater than or equal to 1.
4. The method of claim 2, wherein the step of redefining the mapping relationship between the data statistics of the database and time comprises:
the service period of the service data is redetermined according to the latest service characteristics of the service data recorded in the database;
based on the redetermined service period, redetermining a mapping relationship between the data statistics of the database and the time, wherein a time length of the time in the redetermined mapping relationship is equal to or greater than a time length of the redetermined service period.
5. The method of claim 2, wherein during the redefining of the mapping relationship between the data statistics of the database and the time, the method further comprises:
When a database query statement is obtained, an execution plan is generated for the database query statement according to a preset alternative mode, and the database query statement is processed based on the execution plan.
6. A database query apparatus, the apparatus comprising:
the statistical information determining module is used for determining the service period of the service data according to the service characteristics of the service data recorded in the database; continuously counting the service data in at least one service period to obtain a plurality of groups of data statistic information of the service data, wherein the plurality of groups of data statistic information are time sequence data; according to the multiple groups of data statistics information, determining a mapping relation between the data statistics information of the service data and time, and storing the mapping relation; based on a preset time point in a current period, determining data statistics information corresponding to the preset time point from the mapping relation, and determining the data statistics information corresponding to the preset time point as the data statistics information corresponding to the current period; wherein the time length of the time is equal to or greater than the time length of the service period; the preset time point is a start time point, an end time point, or a time point between the start time point and the end time point of the current period;
The execution plan generation module is used for generating an execution plan for the database query statement obtained in the current period according to the data statistics information corresponding to the current period;
and the query statement processing module is used for processing the database query statement based on the execution plan.
7. The electronic equipment is characterized by comprising a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory are communicated with each other through the communication bus;
a memory for storing a computer program;
a processor for carrying out the method steps of any one of claims 1-5 when executing a program stored on a memory.
8. A computer readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the method according to any of claims 1-5.
CN202010583303.6A 2020-06-23 2020-06-23 Database query method, device, electronic equipment and readable storage medium Active CN111737281B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010583303.6A CN111737281B (en) 2020-06-23 2020-06-23 Database query method, device, electronic equipment and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010583303.6A CN111737281B (en) 2020-06-23 2020-06-23 Database query method, device, electronic equipment and readable storage medium

Publications (2)

Publication Number Publication Date
CN111737281A CN111737281A (en) 2020-10-02
CN111737281B true CN111737281B (en) 2023-09-01

Family

ID=72650729

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010583303.6A Active CN111737281B (en) 2020-06-23 2020-06-23 Database query method, device, electronic equipment and readable storage medium

Country Status (1)

Country Link
CN (1) CN111737281B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20090085869A (en) * 2008-02-05 2009-08-10 엔에이치엔(주) Method and system for managing database
CN102262636A (en) * 2010-05-25 2011-11-30 中国移动通信集团浙江有限公司 Method and device for generating database partition execution plan
CN106599130A (en) * 2016-12-02 2017-04-26 中国银联股份有限公司 Method and device for selectively interfering with multiple indexes of relational database management system
CN108370324A (en) * 2015-11-13 2018-08-03 电子湾有限公司 Distributed data base work data tilt detection
CN108804459A (en) * 2017-05-02 2018-11-13 杭州海康威视数字技术股份有限公司 Data query method and device
CN108829768A (en) * 2018-05-29 2018-11-16 中国银行股份有限公司 A kind of collection method and device of statistical information
CN110399388A (en) * 2019-07-29 2019-11-01 中国工商银行股份有限公司 Data query method, system and equipment

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040243555A1 (en) * 2003-05-30 2004-12-02 Oracle International Corp. Methods and systems for optimizing queries through dynamic and autonomous database schema analysis
US8060495B2 (en) * 2008-10-21 2011-11-15 International Business Machines Corporation Query execution plan efficiency in a database management system
US9990396B2 (en) * 2015-02-03 2018-06-05 International Business Machines Corporation Forecasting query access plan obsolescence

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20090085869A (en) * 2008-02-05 2009-08-10 엔에이치엔(주) Method and system for managing database
CN102262636A (en) * 2010-05-25 2011-11-30 中国移动通信集团浙江有限公司 Method and device for generating database partition execution plan
CN108370324A (en) * 2015-11-13 2018-08-03 电子湾有限公司 Distributed data base work data tilt detection
CN106599130A (en) * 2016-12-02 2017-04-26 中国银联股份有限公司 Method and device for selectively interfering with multiple indexes of relational database management system
CN108804459A (en) * 2017-05-02 2018-11-13 杭州海康威视数字技术股份有限公司 Data query method and device
CN108829768A (en) * 2018-05-29 2018-11-16 中国银行股份有限公司 A kind of collection method and device of statistical information
CN110399388A (en) * 2019-07-29 2019-11-01 中国工商银行股份有限公司 Data query method, system and equipment

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于成本的Spark SQL优化;连欣;《中国优秀硕士学位论文全文数据库》(第1期);全文 *

Also Published As

Publication number Publication date
CN111737281A (en) 2020-10-02

Similar Documents

Publication Publication Date Title
CN111782966B (en) User grouping method, device, computer equipment and medium
CN107943809B (en) Data quality monitoring method and device and big data computing platform
US10664837B2 (en) Method and system for real-time, load-driven multidimensional and hierarchical classification of monitored transaction executions for visualization and analysis tasks like statistical anomaly detection
KR100841876B1 (en) Automatic monitoring and statistical analysis of dynamic process metrics to expose meaningful changes
US20080195430A1 (en) Data quality measurement for etl processes
TWI738721B (en) Task scheduling method and device
CN107688626B (en) Slow query log processing method and device and electronic equipment
CN111061758B (en) Data storage method, device and storage medium
CN110704675B (en) Object management method, device, computer equipment and storage medium
CN113672600B (en) Abnormality detection method and system
CN110737673B (en) Data processing method and system
CN111737281B (en) Database query method, device, electronic equipment and readable storage medium
CN115098542A (en) Flow type big data frequency division pre-polymerization and query method
CN111131393B (en) User activity data statistical method, electronic device and storage medium
CN114969187A (en) Data analysis system and method
CN114186123A (en) Processing method, device and equipment for hotspot event and storage medium
CN114676127A (en) Server service analysis method, device, medium and electronic equipment
CN113761082A (en) Data visualization method, device and system
CN112434063A (en) Monitoring data processing method based on time sequence database
CN111309623A (en) Coordinate data classification test method and device
CN116132395B (en) Message processing method, electronic device and computer readable storage medium
CN116775667B (en) Associated data processing method and device
CN116610729B (en) Database intelligent statistical information management method, system, equipment and medium
CN116108086B (en) Time sequence data evaluation method and device, electronic equipment and storage medium
CN111984454A (en) Task timeout monitoring method and device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant