CN111737281A - Database query method and device, electronic equipment and readable storage medium - Google Patents

Database query method and device, electronic equipment and readable storage medium Download PDF

Info

Publication number
CN111737281A
CN111737281A CN202010583303.6A CN202010583303A CN111737281A CN 111737281 A CN111737281 A CN 111737281A CN 202010583303 A CN202010583303 A CN 202010583303A CN 111737281 A CN111737281 A CN 111737281A
Authority
CN
China
Prior art keywords
data
statistical information
database
time
service
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010583303.6A
Other languages
Chinese (zh)
Other versions
CN111737281B (en
Inventor
朱博帅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing QIYI Century Science and Technology Co Ltd
Original Assignee
Beijing QIYI Century Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing QIYI Century Science and Technology Co Ltd filed Critical Beijing QIYI Century Science and Technology Co Ltd
Priority to CN202010583303.6A priority Critical patent/CN111737281B/en
Publication of CN111737281A publication Critical patent/CN111737281A/en
Application granted granted Critical
Publication of CN111737281B publication Critical patent/CN111737281B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • G06F16/24534Query rewriting; Transformation
    • G06F16/24542Plan optimisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9537Spatial or temporal dependent retrieval, e.g. spatiotemporal queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/0601Electronic shopping [e-shopping]
    • G06Q30/0633Lists, e.g. purchase orders, compilation or processing
    • G06Q30/0635Processing of requisition or of purchase orders
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Business, Economics & Management (AREA)
  • Accounting & Taxation (AREA)
  • Finance (AREA)
  • Operations Research (AREA)
  • Mathematical Physics (AREA)
  • Development Economics (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention provides a database query method, a database query device, electronic equipment and a readable storage medium, and aims to improve the data query efficiency. The database query method comprises the following steps: predicting data statistical information corresponding to the current period according to the mapping relation between the data statistical information of the database and time; and generating an execution plan for the database query statement obtained in the current period according to the data statistical information corresponding to the current period, and processing the database query statement based on the execution plan. When determining the corresponding data statistical information for each period, the invention takes the mapping relation between the data statistical information and the time as a means, and does not need to count the service data in the database frequently, thereby effectively improving the data query efficiency.

Description

Database query method and device, electronic equipment and readable storage medium
Technical Field
The present invention relates to the field of data processing technologies, and in particular, to a data query method and apparatus, an electronic device, and a readable storage medium.
Background
In recent years, with the development of data processing technology, more and more enterprises have started to store and manage business data generated during business development of the enterprises by means of databases. Taking the internet enterprise as an example, the business data stored by the internet enterprise via the database includes but is not limited to: user representation data, user historical browsing data, user order data, user rating data, merchandise information data, merchandise inventory data, audio-video data, hardware performance data, and the like. During the business development of an enterprise, the generated business data not only needs to be recorded into a database, but also needs to be compiled into a corresponding database Query statement in accordance with business requirements, for example, a Structured Query Statement (SQL) is compiled, and then the corresponding business data is queried from the database by processing the database Query statement.
In the related art, in order to process a database query statement more efficiently and further to query target data from a database more efficiently, an execution plan needs to be generated for the database query statement according to data statistics information of business data in the database. When the execution plan is generated for the database query statement according to the data statistical information of the business data, on one hand, the timeliness of the data statistical information has higher requirements, namely, the higher the statistical frequency of the business data is, the better the statistical frequency is, so that a more reasonable execution plan is generated, and the data query efficiency is effectively improved. On the other hand, it is undesirable that the statistical frequency of the service data is too high, that is, it is undesirable that frequent statistical operations occupy and consume too many hardware resources, so that the data query efficiency is affected due to the limitation of the hardware resources.
In the related art, in any aspect, the low efficiency of data query is an urgent problem to be solved.
Disclosure of Invention
The embodiment of the invention aims to provide a database query method, a database query device, electronic equipment and a readable storage medium, and aims to improve the data query efficiency. The specific technical scheme is as follows:
in a first aspect of the embodiments of the present invention, a database query method is provided, where the method includes:
predicting data statistical information corresponding to the current period according to the mapping relation between the data statistical information of the database and time;
and generating an execution plan for the database query statement obtained in the current period according to the data statistical information corresponding to the current period, and processing the database query statement based on the execution plan.
In a second aspect of the embodiments of the present invention, there is provided a database query apparatus, including:
the statistical information determining module is used for predicting the data statistical information corresponding to the current period according to the mapping relation between the data statistical information of the database and the time;
and the query statement processing module is used for generating an execution plan for the database query statement obtained in the current period according to the data statistical information corresponding to the current period, and processing the database query statement based on the execution plan.
In a third aspect of the embodiments of the present invention, there is further provided an electronic device, including a processor, a communication interface, a memory, and a communication bus, where the processor, the communication interface, and the memory complete communication with each other through the communication bus;
a memory for storing a computer program;
the processor is configured to implement the method steps of the first aspect of the embodiments of the present invention when executing the program stored in the memory.
In yet another aspect of the present invention, there is also provided a computer-readable storage medium having stored therein instructions, which when run on a computer, cause the computer to execute any one of the above-described database query methods.
In yet another aspect of the present invention, there is also provided a computer program product containing instructions which, when run on a computer, cause the computer to perform any of the above described database query methods.
According to the database query method provided by the invention, the data statistical information corresponding to the current period is predicted according to the mapping relation between the data statistical information of the database and the time. And then generating an execution plan for the database query statement according to the data statistical information predicted for the current cycle aiming at the database query statement obtained in the current cycle, and processing the database query statement according to the execution plan.
When determining the corresponding data statistical information for each period, the invention takes the mapping relation between the data statistical information and the time as a means, thereby not needing to frequently count the service data in the database, and the hardware resources occupied by calling the mapping relation are far lower than the hardware resources occupied by counting the service data. Therefore, the invention can obviously reduce the occupation amount of hardware resources caused by determining the data statistical information during the implementation period, on one hand, more hardware resources can be allocated to the query task, and the query efficiency of the database is improved. On the other hand, the frequency of determining the data statistical information can be improved, namely, the time length of each period is shortened, so that the timeliness of the data statistical information is improved, a more reasonable execution plan is generated for the database query statement, and the database query efficiency is further improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below.
FIG. 1 is a flow chart of a data query method according to an embodiment of the present invention;
FIG. 2 is a flow chart of a data query method according to another embodiment of the present invention;
FIG. 3 is a flow chart of determining a mapping relationship according to another embodiment of the present invention;
FIG. 4 is a flowchart illustrating monitoring the validity of a mapping relationship according to an embodiment of the present invention;
FIG. 5 is an interaction diagram of a data query method according to an embodiment of the present invention;
FIG. 6(a) is a schematic diagram of a database query device according to an embodiment of the present invention;
FIG. 6(b) is a schematic diagram of a database query device according to another embodiment of the present invention;
fig. 7 is a schematic diagram of an electronic device according to an embodiment of the invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be described below with reference to the drawings in the embodiments of the present invention.
In the related art, when the database query engine obtains the database query language, in order to efficiently process the query language and thus improve the data query efficiency, an execution plan is generated for the database query language according to the data statistical information of the business data in the database, and the database query language is processed based on the execution plan.
For ease of understanding, a simple takeaway order database is taken as an example, and takeaway orders generated within the past 1 hour are stored in the takeaway order database, and each takeaway order relates to business data such as a delivery distance and an order amount. Counting the service data in the order database at the time of 00 minutes and 00 seconds at 12 pm, and determining that the total number of the order database comprises 3299 orders, wherein the order with the delivery distance less than 3km accounts for 2973, and the order with the delivery distance greater than or equal to 3km accounts for 326; wherein the order with the length of less than 20 yuan accounts for 842 parts, the order with the length of 20-30 yuan accounts for 1555 parts, and the order with the length of more than 30 yuan accounts for 902 parts. After obtaining and saving the above data statistics information, if a database query statement is obtained at a certain time (e.g. 12 o' clock, 11 min, 32 sec), the database query statement requires: and inquiring the order with the order amount between 20 and 30 yuan and the delivery distance greater than or equal to 3km from the order database. In order to improve the query efficiency, the database query engine generates the following execution plan according to the data statistical information: firstly, screening out orders with the delivery distance being more than or equal to 3km, and then further screening out orders with the money amount between 20 yuan and 30 yuan from the screened orders with the delivery distance being more than or equal to 3 km.
The execution plan is a reasonable execution plan, which needs to execute 3299+326 times of screening logic, wherein 3299 refers to screening orders with delivery distance greater than or equal to 3km from 3299 orders, and 326 refers to further screening orders with money amount between 20 yuan and 30 yuan from 326 screened orders with delivery distance greater than or equal to 3 km.
To facilitate understanding of the rationality of the execution plan, an unreasonable execution plan is schematically enumerated. For example, the irrational execution plan may be: firstly, screening out orders with the sum of 20 yuan to 30 yuan, and then further screening out orders with the delivery distance of more than or equal to 3km from the screened orders with the sum of 20 yuan to 30 yuan. The execution plan needs to execute 3299+1555 times of screening logic, wherein 3299 refers to screening out orders with the sum of 20 yuan to 30 yuan from 3299 orders, and 1555 refers to further screening out orders with the delivery distance larger than or equal to 3km from 1555 orders with the sum of 20 yuan to 30 yuan. It can be seen that the number of times of the screening logic which needs to be executed by the execution plan is obviously more than that of the previous execution plan, and the comparison shows that the number of times of the screening logic which needs to be executed by the previous execution plan is less, so that the database query efficiency is high, and the database query is more reasonable.
Through the introduction of the above example, it can be seen that the data statistics have an important influence on the generation of the execution plan, and the timeliness of the data statistics will greatly influence the rationality of the execution plan. Following the example described above, assume that the business data in the take-away order database is counted every 4 hours. The last statistical operation, which occurred at 00 min 00 sec at 12 pm, obtained and saved the data statistics. Assuming that a database query statement is obtained at 28 minutes and 52 seconds at 14 pm, since a great change may occur in a takeout order recorded in a takeout order database between 12 pm 00 minutes and 00 pm and 28 minutes and 52 seconds at 14 pm, data statistical information obtained at 12 pm 00 minutes and 00 seconds cannot accurately reflect the data distribution in the current order database, and if an execution plan is generated for the database query statement at 14 pm and 28 minutes and 52 seconds according to the data statistical information obtained at 12 pm 00 minutes and 00 seconds, it is difficult to ensure that the execution plan has high reasonableness.
In order to improve timeliness of data statistics information, in the related art, the frequency of statistics on service data may be increased. For example, the statistics of the business data is increased from once per 4 hours to once per 0.5 hours, the data statistics information obtained by each statistics is stored, and the execution plan is generated for the newly obtained database query statement by using the latest data statistics information. However, the increase of the statistical frequency increases the occupation and consumption of hardware resources, and leads to insufficient hardware resources for processing the database query statement itself, which also slows down the data query efficiency. Therefore, in the related art, the low efficiency of data query is an urgent problem to be solved.
To improve the database query efficiency, referring to fig. 1, fig. 1 is a flowchart of a data query method according to an embodiment of the present invention. As shown in fig. 1, the method comprises the steps of:
step S11: and predicting the data statistical information corresponding to the current period according to the mapping relation between the data statistical information of the database and the time.
Wherein, the data statistical information of the database is used for representing: distribution of business data in the database. More specifically, the data statistics of the database may refer to: data distribution of a given column of a data table in the database.
The time may be a continuous time or a time sequence composed of a plurality of discrete time points. The time has a length of time, for example the length of time may be a day, a week or a month. In a specific implementation of the present invention, the time length of the time is generally equal to the time length of the service period of the service data in the database, or equal to an integer multiple of the time length of the service period. It should be noted that the service period is different from the period in the above step S11, and the time length of the service period is longer than the time length of the period in the above step S11. For example, the time length of the service period may be one day, while the time length of the period in the above step S11 may be 10 minutes, 0.5 hours, 2 hours, or the like, or the time length of the period in the above step S11 is not fixed but is less than one day. The following embodiments of the present invention will describe the relationship between the service period and the period in step S11, and will not be described herein again.
In the present invention, the mapping relationship between the data statistics information of the database and the time can be specifically understood as follows: each time point corresponds to a group of data statistical information, and the group of data statistical information is used for representing the data distribution condition of the designated column of the data table at the time point.
For ease of understanding, a simple take order database is used as an example to record take orders generated within the past 1 hour. The take order database includes a plurality of data tables, and each row of each data table is used for recording a take order. Each data table comprises a plurality of columns, wherein the first column is used for recording the order placing time of each takeaway order, the second column is used for recording the delivery distance of each takeaway order, the third column is used for recording the order amount of each takeaway order, and the fourth column is used for recording the payment mode of each takeaway order. For example, the data statistics of the database may be data distribution of the second column and the third column of the data table, that is, data distribution of delivery distances of the respective takeaway orders and data distribution of order amounts.
Referring to table 1, table 1 is a data statistics significance table. As shown in table 1, the distribution distances are divided into 0.5km or less, 0.5 to 1km, 1 to 2km, 2 to 4km, 4 to 6km, and 6km or more, and the number of orders in different distribution distance sections is counted. As shown in table 1, the amount of the order is divided into 20 yuan or less, 20 to 30 yuan, 30 to 50 yuan, 50 to 100 yuan, 100 to 300 yuan, and 300 yuan or more, and the number of orders in different order rating intervals is counted.
Table 1 schematic table of statistical information of data
Figure BDA0002553796210000061
The data statistics shown in table 1 are only data statistics corresponding to one time point. It should be noted that different time points correspond to different data statistics information, so that a mapping relationship between the data statistics information and time is formed.
It should be noted that, the distribution distance interval division method is the same between different data statistics information corresponding to different time points, and the order amount interval division method is the same. In other words, the statistics of data at different time points are statistics of the order quantity in the distribution distance sections of 0.5km or less, 0.5 to 1km, 1 to 2km, 2 to 4km, 4 to 6km, 6km or more, and statistics of the order quantity in the order amount sections of 20 yuan or less, 20 to 30 yuan, 30 to 50 yuan, 50 to 100 yuan, 100 to 300 yuan, 300 yuan or more.
The difference of the data statistical information at different time points is reflected in the difference of the order quantity placed in the same distribution distance interval. For example, 652 orders with a delivery distance of less than 0.5km are included in the data statistics at the first time point, and 441 orders with a delivery distance of less than 0.5km are included in the data statistics at the second time point. Similarly, the difference of the data statistics information at different time points is also reflected in the difference of the order quantity placed in the same order amount interval. For example, the statistics at the first time point have 3463 orders with 20-30 yuan, and the statistics at the second time point have 2027 orders with 20-30 yuan.
Note that the statistical data shown in table 1 are merely examples, and the distribution distance interval division method, the order amount interval division method, the specific number of the order quantity, and the like are not to be construed as limiting the present invention. In addition, in the specific implementation, the data statistics information may be described in a form of a list or a histogram, and the description form of the data statistics information is not limited in the present invention.
In the present invention, the period described in the above step S11 may have at least two setting manners:
the first setting mode is as follows: the time length of the period is set to a fixed time length. The time length of the cycle is set to 30 minutes, for example, and thus every 30 minutes is regarded as one cycle.
The second setting mode is as follows: and according to the service characteristics of the service data in the database, thinning the service period of the service data into a plurality of smaller periods, wherein the respective time lengths of the plurality of thinned periods are not all the same. The time length of the period in the service data frequent change period is shorter, and the time length of the period in the service data slow change period is longer. These refined periods are the periods described in the above step S11. For example, the service period of the service data is 1 day, the service data changes frequently from 10: 00 to 22: 30 in one day, and the service data changes slowly in other periods in one day. Then 10: 00 to 22: 30 points of the day may be subdivided into a plurality of cycles each having a time length of 10 minutes, and the other periods of the day may be also subdivided into a plurality of cycles each having a time length of 30 minutes.
The third setting mode is as follows: the time length is not particularly set, but a time period between two previous and subsequent executions of the prediction operation (i.e., the operation of predicting the data statistical information) is regarded as one cycle. For example, if the time for the last prediction operation is 10 o 'clock 42 min 17 sec and the time for the current prediction operation is 11 o' clock 03 min 53 sec, the time period between 10 o 'clock 42 min 17 sec and 11 o' clock 03 min 53 sec is regarded as a cycle, and the statistical data information predicted by the last prediction operation is the statistical data information corresponding to the cycle.
In a specific embodiment of the present invention, the first setting mode or the second setting mode is preferably selected from the three setting modes.
In the specific implementation of the invention, the specific mode of predicting the data statistical information for the current period also has differences according to different period setting modes.
For the first setting mode and the second setting mode, when predicting the data statistics information corresponding to the current period, specifically, the data statistics information corresponding to the preset time point may be determined from the mapping relationship between the data statistics information and the time based on the preset time point in the current period, and the data statistics information corresponding to the preset time point is determined as the data statistics information corresponding to the current period. The preset time point is a starting time point, an ending time point or a time point between the starting time point and the ending time point of the current period.
For convenience of understanding, taking the example that the time length of the time in the mapping relationship is equal to 1 day, and the time length of the period is fixedly set to 30 minutes, each day may be divided into 48 periods, where the 48 periods are: 0 point-0 point 29 point, 0 point 30 point-0 point 59 point, 1 point 0 point-1 point 29 point … 23 point 0 point-23 point 29 point, 23 point 30 point-23 point 59 point. If the current time is 08 o' clock 21 minutes, the current period is the 17 th period of the day. If the current time is exactly 30 minutes at 08 o' clock, then the current cycle is exactly 18 cycles of the day, at which point data statistics can be predicted for the current cycle (i.e., 18 cycles of the day).
In the prediction, the data statistical information corresponding to the 08 point 30 point is determined from the mapping relationship between the data statistical information and the time based on the 08 point 30 point, and the data statistical information is used as the data statistical information of the current cycle (i.e., 18 th cycle in the day). Alternatively, the data statistics information corresponding to the 08 point 59 point may be determined from the mapping relationship between the data statistics information and time on the basis of the 08 point 59 point, and the data statistics information may be used as the data statistics information of the current cycle (i.e., the 18 th cycle in the day). Alternatively, the data statistical information corresponding to a time point is determined from the mapping relationship between the data statistical information and the time based on one time point between 08 point 30 and 08 point 59, and the data statistical information may be used as the data statistical information of the current period (i.e., 18 th period in one day).
For the third setting manner, when predicting the data statistics information corresponding to the current cycle, specifically, the time point at which the prediction operation execution instruction is generated may be used as a basis, the data statistics information corresponding to the time point is determined from the mapping relationship between the data statistics information and time, and the data statistics information corresponding to the time point is determined as the data statistics information corresponding to the current cycle.
For ease of understanding, it is assumed that the database query engine generates one prediction operation execution instruction after every 1000 database query statements are executed. And when the prediction operation execution instruction is generated, determining data statistical information corresponding to the generation time point from the mapping relation between the data statistical information and time by taking the generation time point of the instruction as a basis, and determining the data statistical information corresponding to the generation time point as the data statistical information corresponding to the current period. Wherein, the current cycle refers to: starting from the current time point (namely the generation time point of the instruction) and finishing executing 1000 database query statements.
In a specific implementation of the present invention, the prediction modes corresponding to the first setting mode and the second setting mode (i.e., the mode for predicting the data statistics information corresponding to the current cycle) are preferred.
Step S12: and generating an execution plan for the database query statement obtained in the current period according to the data statistical information corresponding to the current period.
Step S13: processing the database query statement based on the execution plan.
In specific implementation, after the data statistical information corresponding to the current period is predicted and stored, when each database query statement is obtained in the current period, an execution plan is generated for the database query statement according to the data statistical information corresponding to the current period, and the database query statement is processed based on the execution plan.
Following the above example, assuming that the current cycle is 08: 30 to 08: 59, when a database query statement is obtained during 08: 30 to 08: 59, an execution plan may be generated for the database query statement according to the data statistics predicted for the current cycle in step S11, and the database query statement may be processed based on the execution plan.
Or following the other example above, assume that the current cycle refers to: starting from the generation time point of the prediction operation execution instruction and finishing executing 1000 database query statements. And aiming at each database query statement in the 1 st to 1000 th database query statements obtained successively in the current period, generating an execution plan for the database query statement according to the data statistical information predicted for the current period, and processing the database query statement based on the execution plan.
For ease of understanding, the order database is simply taken as an example, and the order database stores orders generated within the past 1 hour. Assuming that the current cycle is 08: 30 to 08: 59, the data statistics determined for the current cycle by the above step S11 are shown in table 2, where table 2 is a data statistics indication table, and a total of 5872 orders are referred to in table 2.
Table 2 schematic table of statistical information of data
Figure BDA0002553796210000091
If a database query statement is obtained 30 minutes and 13 seconds at point 08, the database query statement requires: the order with the amount of 50-100 yuan and the distribution distance of 1-2 km is inquired from the order database. Then, through the above step S12, the execution plan generated for the database query statement may be: firstly, the orders with the sum of 50-100 yuan are screened from all the orders, and then the takeout orders are further screened from the screened orders with the sum of 50-100 yuan. The execution plan is a reasonable execution plan, which requires about 5872+402 times of screening logic, wherein 5872 refers to screening orders with a sum of 50-100 yuan from 5872 orders, and 402 refers to further screening orders with a delivery distance of 1-2 km from 402 screened orders with a sum of 50-100 yuan.
The reason why the order data in the order database is updated too fast is that the data statistical information corresponding to one period and the actual data statistical information at any time in the period can be guaranteed to have a high degree of coincidence at present, but the two cannot be guaranteed to be completely coincident. Therefore, the number of times of the screening logic to be executed, which is shown in the execution plan, is a more accurate estimation value, so that the execution plan is called as 'about required'.
If another database query statement is obtained 30 minutes and 19 seconds at 08, the database query statement requests an order with an amount between 20 and 30 dollars and a delivery distance below 0.5km to be queried from the order database. Then, through the above step S12, the execution plan generated for the database query statement may be: firstly, screening out orders with the delivery distance less than 0.5km from all orders, and then further screening out orders with the money amount between 20 yuan and 30 yuan from the screened orders with the delivery distance less than 0.5 km. The execution plan is a reasonable execution plan, which needs to execute about 5872+418 times of screening logic, wherein 5872 refers to screening orders with delivery distance less than 0.5km from 5872 orders, and 418 refers to further screening orders with the amount between 20 yuan and 30 yuan from the screened orders with the delivery distance less than 0.5 km.
It should be noted that, the present invention is not limited to the specific manner of generating the execution plan for the database query statement by using the data statistics information, and the specific manner of processing the database query statement based on the execution plan. Without prejudice to the invention, it is possible to generate an execution plan for a database query statement and to process the database query statement based on the execution plan in any way, existing or in the future. Illustratively, when a database query statement includes a plurality of screening conditions (for example, a certain database query statement requires that an order with an order amount between 20 and 30 yuan and a delivery distance greater than or equal to 3km is queried from an order database, the database query statement includes two query conditions, namely, "the order amount is between 20 and 30 yuan" and "the delivery distance is greater than or equal to 3 km"), the screening conditions are sequentially processed according to the screening rates of the screening conditions from low to high.
By executing the database query method including steps S11 to S13, the present invention takes the mapping relationship between the calling data statistics information and the time as a measure when determining the corresponding data statistics information for each period, so that the business data in the database does not need to be counted frequently, and the hardware resources required for calling the mapping relationship are much lower than the hardware resources required for counting the business data. Therefore, the invention can obviously reduce the occupation amount of hardware resources caused by determining the data statistical information during the implementation period, on one hand, more hardware resources can be allocated to the query task, and the query efficiency of the database is improved. On the other hand, the frequency of determining the data statistical information can be improved, namely, the time length of each period is shortened, so that the timeliness of the data statistical information is improved, a more reasonable execution plan is generated for the database query statement, and the database query efficiency is further improved.
Referring to fig. 2, fig. 2 is a flowchart of a data query method according to another embodiment of the present invention. As shown in fig. 2, in addition to the steps S11 and S12, before the step S11, the method further includes the following steps:
step S10: and generating and storing the mapping relation between the data statistical information of the database and the time.
After determining and saving the mapping relationship between the data statistics and the time, the mapping relationship may be called when the step S11 is executed for subsequent times.
For convenience of understanding, taking the example that the time length of the time in the mapping relationship is equal to 1 day, and the time length of the period is fixedly set to 30 minutes, each day is divided into 48 periods, and the 48 periods are respectively: 0 point-0 point 29 point, 0 point 30 point-0 point 59 point, 1 point 0 point-1 point 29 point … 23 point 0 point-23 point 29 point, 23 point 30 point-23 point 59 point. If the current time is just 00 minutes at 08 o' clock, the current cycle is just coming to the 17 th cycle in one day, at this time, a mapping relationship which is predetermined and stored can be called, and the data statistical information corresponding to the preset time point is determined from the mapping relationship based on the preset time point in the current cycle (i.e. the 17 th cycle in one day), and the data statistical information corresponding to the preset time point is determined as the data statistical information corresponding to the current cycle (i.e. the 17 th cycle in one day).
With the lapse of time, if the current time reaches exactly 08 o' clock 30 minutes, the current cycle reaches exactly 18 th cycle of a day, at this time, the mapping relationship which is predetermined and stored may be called again, and based on the preset time point in the current cycle (i.e. 18 th cycle of a day), the data statistical information corresponding to the preset time point is determined from the mapping relationship, and the data statistical information corresponding to the preset time point is determined as the data statistical information corresponding to the current cycle (i.e. 18 th cycle of a day).
Referring to fig. 3, fig. 3 is a flowchart of determining a mapping relationship according to another embodiment of the present invention. As shown in fig. 3, when the present invention is implemented, the mapping relationship between the data statistics information of the database and the time may be generated and stored through the following sub-steps:
substep S10-1: and determining the service period of the service data according to the service characteristics of the service data recorded in the database.
The service data refers to data generated during the service development, and the service data shows periodic fluctuation along with the periodic change of the service. The invention does not limit the kind of the service data, for example, the service data may be: user volume data, user profile data, user historical browsing data, user order data, user rating data, merchandise information data, merchandise inventory data, audio-video data, or hardware performance data, among others.
Wherein, the service characteristic refers to the type of service. For example, if the service data is generated during the process of carrying out the takeaway service, the service characteristic of the service data is the type of the takeaway service. When the service period of the service data is determined, the period of the takeout service can be determined as the period of the service data by analyzing the period of the takeout service. For example, by analysis or a priori knowledge, the volume of take-away orders is at a peak during the day between 11 and 13 and 17 to 20, while the take-away orders are relatively small during other times of the day. In this way, the service period of the takeout service can be determined to be 1 day, and further, the service period of the service data of the takeout service can be determined to be 1 day.
In addition, for some services for which it is difficult to quickly determine the service period through analysis or a priori knowledge, for example, the e-commerce services of a comprehensive e-commerce platform, a coordinate graph of the service data can be made by collecting service data for a period of time, wherein an abscissa in the coordinate graph is time and an ordinate in the coordinate graph is service data. And determining the fluctuation rule of the service data by observing the coordinate graph, thereby determining the service period of the service data.
For some service data without a service period, the method is not suitable for the invention. In other words, the invention is suitable for the service data with the service period, and the invention can determine the mapping relation between the data statistical information and the time of the service data with the service period.
Substep S10-2: and continuously counting the service data in at least one service period to obtain multiple groups of data statistical information of the service data, wherein the multiple groups of data statistical information are time sequence data.
In the specific implementation of the invention, the business data can be counted once every fixed time length in at least one business period to obtain a group of data statistical information of the business data. After the statistics of the at least one service period, a plurality of sets of data statistics are obtained. Wherein the fixed time length is less than the service period, for example, the fixed time length may be 1/3600 of the service period, and if statistics of one service period pass, 3600 groups of data statistics information may be obtained. If statistics of two service periods are passed, 7200 groups of data statistics can be obtained.
For ease of understanding, taking the example of statistics on the take-away order data in the take-away order database, assume that the business period of the take-away order data is determined to be 1 day after the above-described substep S10-1. Then, when performing the above sub-step S10-2, the order data may be counted every 1 minute from 0 o' clock 0 of a certain day, so as to obtain a set of data statistics information. At 23 o 'clock and 59 o' clock through the day, 1440 sets of data statistics were obtained. The statistical information of each group of data relates to the order quantity of different distribution distance intervals and also relates to the order quantity of different order amount intervals. Each group of data statistical information can be shown in table 1 or table 2, and in different groups of data statistical information, the distribution distance intervals are divided in the same manner, the order amount intervals are divided in the same manner, but the number of orders in the same distribution distance interval is different, and the number of orders in the same order amount interval is different.
The reason why the plurality of sets of data statistical information are referred to as time series data is that the plurality of sets of data statistical information respectively correspond to different statistical times, and the plurality of sets of data statistical information are a series of data with time sequence, so the data are referred to as time series data.
Substep S10-3: and determining a mapping relation between the data statistical information of the service data and time according to the plurality of groups of data statistical information, and storing the mapping relation, wherein the time length of the time is equal to or greater than the time length of the service period.
In specific implementation, if the time in the expected mapping relationship is composed of discrete time points, the data statistical information corresponding to a plurality of statistical time points and each statistical time point can be directly determined as the mapping relationship. In the mapping relationship, each statistical time point corresponds to a group of data statistical information respectively.
Or if the time in the expected mapping relationship is a continuous time, fitting the multiple sets of data statistics information counted in the sub-step S10-2 to obtain a function with the independent variable as time and the dependent variable as data statistics information, and using the function as the mapping relationship.
For ease of understanding, each set of statistical data obtained, for example, by the above-described sub-step S10-2, relates to the number of orders with a delivery distance of 0.5km or less, the number of orders with a delivery distance of 0.5 to 1km, the number of orders with a delivery distance of 1 to 2km, the number of orders with a delivery distance of 2 to 4km, the number of orders with a delivery distance of 4 to 6km, and the number of orders with a delivery distance of 6km or more. And then, aiming at the order number of each distribution distance interval, fitting a function corresponding to the distribution distance interval, wherein the function takes time as an independent variable and the order number of the distribution distance interval as a dependent variable. For example, a function is fitted to the number of orders with a delivery distance of 0.5km or less by the least square method, and the independent variable in the function is time and the dependent variable is the number of orders with a delivery distance of 0.5km or less.
In addition, each set of data statistics obtained by the above sub-step S10-2 also relates to the order quantity with the order amount of less than 20 yuan, the order quantity with the order amount of 20 to 30 yuan, the order quantity with the order amount of 30 to 50 yuan, the order quantity with the order amount of 50 to 100 yuan, the order quantity with the order amount of 100 to 300 yuan, and the order quantity with the order amount of 300 yuan or more. And then fitting a function corresponding to the order amount interval according to the order amount of each order amount interval, wherein the function takes time as an independent variable and the order amount of the order amount interval as a dependent variable. For example, a function is fitted to the order quantity with an order amount of 20 dollars or less by the least square method, and the independent variable in the function is time and the dependent variable is the order quantity with an order amount of 20 dollars or less.
Thus, a total of 12 functions are determined, and the 12 functions are the mapping relationship between the data statistics and the time. Given a point in time, the 12 functions can be used to predict the order quantity in different delivery distance intervals and the order quantity in different order amount intervals.
In addition, the 12 functions can be integrated into a total function, and the independent variables of the total function are the time point and the target data interval. The target data interval may be, for example: 0.5 to 1km, 30 to 50 yuan, etc. Given a point in time and at least one target data interval, the data of the given target data interval at the given point in time can be predicted by the overall function.
In the mapping relationship finally determined by the invention, the time length of the time is equal to or greater than the time length of the service period. In some embodiments of the invention, the length of time is equal to an integer multiple of the length of time of the traffic period.
For the sake of understanding, the service period is equal to 1 day, the time length is also equal to 1 day, and the time is a continuous time period. The time was 0 at the start point and 0 at the end point, and 23 at the end point and 59 at the end point. During the service development, for any time point of any day, the corresponding time point can be determined in time, so that the data statistical information corresponding to the time point can be predicted. For example, 10 points and 30 points on 8 days of 3 months and 8 days, and based on the time point of 10 points and 30 points, the corresponding time point (i.e., 10 points and 30 points) can be found in time, and then the data statistical information of the corresponding time point in the mapping relationship is predicted as the data statistical information of the period of 10 points and 30 points on 8 days of 3 months. After the day of 3/8/3 comes, the current time reaches 0 point of 3/9/0, and the corresponding time point (i.e. 0 point) can be found in time based on the time point of 0 point, and then the data statistical information of the corresponding time point in the mapping relation is predicted to be the data statistical information of the period of 0 point of 3/9/0.
The present invention determines and saves the mapping relationship between the data statistics and the time of the database by performing step S10 in advance. Thus, during the business development, the mapping relationship may be invoked at each cycle as described in step S11, so as to predict data statistics for the current cycle in step S11.
It is also considered that if the business rules are adjusted during the business development, it may cause the business cycle to change in a coordinated manner. Once the service period changes, the predetermined and stored mapping relationship will be invalidated. For ease of understanding, the sales service is still exemplified. Assuming that the period of the take-out service is 1 day before the service rule of the take-out service is adjusted, the time length of the time in the predetermined and stored mapping relationship is also equal to 1 day. If the business rule of the take-out business is adjusted at the moment, the good fortune activity of the working meal of every wednesday is released, and the good fortune activity applies half-price discount to the take-out order of wednesday. The order volume on wednesday will be significantly higher, with the order volume on wednesday being significantly higher than the order volume on other dates. Thus, the period of take-out traffic will vary from 1 day to 1 week. If the data statistics are still predicted for the current cycle of step S11 according to the predetermined and stored mapping relationship, the predicted data statistics will have a larger difference from the actual data statistics. Particularly, on each wednesday, the actual data statistical information in the takeout order database cannot be truly reflected by using the data statistical information respectively predicted for each period of the wednesday according to the mapping relationship which is predetermined and stored. In this way, when an execution plan is generated for a database query statement based on predicted data statistics, it is difficult to ensure the rationality of the execution plan.
To this end, in some embodiments of the present invention, the validity of the mapping relationship may be monitored during processing of each database query statement using the mapping relationship after determining and saving the mapping relationship between the data statistics of the database and time, i.e., during the repeated execution of the above steps S11 and S12.
When the method is specifically implemented, the effectiveness of the mapping relation can be monitored in the following modes: and monitoring the effectiveness of the mapping relation according to the data statistical information predicted by using the mapping relation as a target time point and the actual data statistical information corresponding to the target time point, wherein the target time point is a pre-specified time point.
In the invention, the data statistical information predicted for a target time point by using the mapping relation is used as a predicted value, the actual data statistical information corresponding to the target time point is used as a true value, and the effectiveness of the mapping relation is monitored by comparing the predicted value with the true value according to the similarity between the predicted value and the true value.
Exemplarily, referring to fig. 4, fig. 4 is a flowchart for monitoring validity of a mapping relationship according to an embodiment of the present invention. As shown in fig. 4, the monitoring process includes the following steps:
step S41: and acquiring data statistical information predicted for the current target time point by using the mapping relation and acquiring actual data statistical information corresponding to the current target time point every preset time length.
In order to obtain actual data statistical information corresponding to the current target time point, statistical operation can be performed on the service data in the database at the current target time point, so that the actual data statistical information is obtained.
As described above, since the statistical operation on the service data needs to occupy more hardware resources, and the data query efficiency is affected during the statistical operation, the statistical operation may be performed in a frequency manner as much as possible. In other words, the preset time length in the above step S41 is set to a longer time length.
In a specific implementation of the present invention, the preset time duration in step S41 may be N times the time duration of the period in step S11, where N is an integer greater than 1. For example, the time length of the cycle in the above step S11 is set to 30 minutes, and N is set to 8, i.e., the preset time length in the above step S41 is set to 4 hours. Every four hours, obtaining the data statistical information predicted for the current time point by using the mapping relation, and obtaining the actual data statistical information corresponding to the current time point.
Taking the service period equal to 1 day as an example, referring to table 3, table 3 is a record schematic table of operations performed at various time points in the day. As shown in table 3, data statistics of the current period need to be predicted by using a mapping relationship that is predetermined and stored every 30 minutes in a day. Every 4 hours in a day, a statistical operation needs to be performed on the service data in the database to obtain actual data statistical information.
TABLE 3 schematic representation of the recording of the operations performed at various points in the day
Figure BDA0002553796210000151
Figure BDA0002553796210000161
In table 3, "-" indicates that no statistical operation is performed, and takes 0 to 30 as an example, at which time a prediction operation needs to be performed but no statistical operation needs to be performed.
As shown in table 3, when the step S11 is executed at 0 o ' clock and 0 o ' clock every day, the data statistics information corresponding to 0 o ' clock and 0 o ' clock is determined from the mapping relationship according to 0 o ' clock and 0 o ' clock, and is used as the data statistics information of the current period (i.e., the period from 0 o ' clock and 0 o ' clock to 30 o ' clock). Meanwhile, 0 point and 0 point are used as a target time point, and statistical operation is performed on the service data in the database at 0 point and 0 point to obtain actual data statistical information. Thus, the predicted data statistical information and the actual data statistical information corresponding to 0 point and 0 point are obtained.
In the same manner, with 4 point 0, 8 point 0, 12 point 0, 16 point 0 and 20 point 0 as target time points, the predicted data statistical information and actual data statistical information corresponding to 4 point 0, the predicted data statistical information and actual data statistical information corresponding to 8 point 0, the predicted data statistical information and actual data statistical information corresponding to 12 point 0, the predicted data statistical information and actual data statistical information corresponding to 16 point 0, and the predicted data statistical information and actual data statistical information corresponding to 20 point 0 are obtained.
Step S42: and determining the similarity between the predicted data statistical information corresponding to the current target time point and the actual data statistical information.
In specific implementation, the predicted data statistical information corresponding to the current time point can be used as one vector, the actual data statistical information corresponding to the current time point can be used as the other vector, then the vector distance between the two vectors is calculated, and the calculated vector distance is used as the similarity between the predicted data statistical information and the actual data statistical information.
The vector Distance may be an Euclidean Distance (Euclidean Distance), a cosine Distance, a Chebyshev Distance (Chebyshev Distance), and the like, and the specific type and calculation method of the vector Distance are not limited in the present invention.
Step S43: and determining that the mapping relation is invalid under the condition that the similarity determined for N times is lower than a preset threshold, wherein N is a preset integer greater than or equal to 1.
The value of N may be preset or may be changed at any time during the service development. Similarly, the preset threshold value in step S43 may be preset or may be changed at any time during the service development.
For ease of understanding, the above table 3 is used, and N is equal to 3, and the preset threshold is equal to 0.8 as an example. Assume that at time points 0 o 'clock, 4 o' clock 0 o 'clock, 8 o' clock 0 o 'clock, 12 o' clock 0 o 'clock, and 16 o' clock 0 o 'clock on the current day, the respective determined similarities are equal to 0.76, 0.85, 0.82, 0.65, and 0.72, respectively, and the determined similarity at the current time point (20 o' clock 0 o) is equal to 0.67. Since the similarity determined at three continuous time points of 12 points 0 min, 16 points 0 min and 20 points 0 min is less than the preset threshold value 0.8, the mapping relation is determined to be invalid.
When the method is specifically implemented, under the condition that the mapping relation is determined to be invalid, the mapping relation between the data statistical information of the database and the time can be re-determined, and the re-determined mapping relation is stored.
As mentioned above, the change of the business period due to the change of the business rule may cause the predetermined mapping relation to fail. Therefore, in order to re-determine the mapping relationship between the data statistics information and the time of the database, for example, the service period of the service data in the database may be re-determined; and then, based on the re-determined service period, re-determining the mapping relation between the data statistical information and the time of the database. Wherein the time length of the time in the re-determined mapping relation is equal to or greater than the time length of the re-determined service period.
The specific implementation steps of re-determining the mapping relationship can refer to the descriptions of the sub-step S10-1 to the sub-step S10-3, which are not repeated herein.
Further, during the re-determination of the mapping relationship between the data statistical information and the time of the database, since the mapping relationship determined earlier has failed, the newly obtained database query statement cannot be processed based on the mapping relationship determined earlier, in other words, the above-described steps S11 and S12 cannot be performed based on the mapping relationship determined earlier.
In order not to affect the normal execution of the database query task, during the process of re-determining the mapping relationship between the data statistics information and the time of the database, when the database query statement is obtained, an execution plan is generated for the database query statement according to a preset alternative mode, and the database query statement is processed based on the execution plan.
Wherein, the alternative mode is as follows: in a manner different from that of the above-described steps S11 and S12. More specifically, the alternative means are: the method can generate the execution plan for the database query statement without utilizing the mapping relation between the data statistic information and the time.
For example, the invention, when embodied, may take the following alternative forms: and counting the service data in the database at regular time so as to obtain the data statistical information of the database. And before the next time of counting the service data in the database, generating an execution plan for the obtained database query statement by using the data counting information.
For convenience of understanding, the invention illustratively counts the business data in the database every 1 hour during the process of re-determining the mapping relationship between the data statistics information and the time of the database. Suppose that the mapping relation is determined to be invalid at 20 o' clock in the evening at 0 point, and the mapping relation between the data statistical information of the database and the time is determined again. Counting the service data in the database at 20 o 'clock and 0 o' clock to obtain the data statistical information of the database, and storing the data statistical information. And generating an execution plan for the database query sentences obtained in the period from 20 points 0 to 21 points 0 according to the data statistical information obtained in the period from 20 points 0 to 21 points 0. When the time reaches 21 o' clock 0 min, the service data in the database is counted again, so that the data statistical information of the database is obtained and is stored. And generating an execution plan for the database query sentences obtained in the period from the point 21 to the point 22 through the point 0 according to the data statistical information obtained in the point 21 and the point 0. And so on.
Referring to fig. 5, fig. 5 is an interaction diagram of a data query method according to an embodiment of the present invention. As shown in fig. 5, the data query method proposed by the present invention may be executed between a database query engine, a first daemon process (daemon), and a second daemon process (daemon). The first daemon process is used for monitoring the effectiveness of the mapping relation, and the second daemon process is used for re-determining the mapping relation under the condition that the mapping relation is invalid.
As shown in fig. 5, each time the first daemon process waits for a preset time length, the similarity between the predicted data statistical information and the actual data statistical information corresponding to the current time point is determined, so that the validity of the mapping relationship is monitored according to the similarity. If the first daemon process determines that the mapping relation is invalid at a certain time point, the database query engine is immediately informed, so that the database query engine can immediately mark the mapping relation stored earlier as invalid. In addition, the first daemon process can also send out an alarm, so that a database administrator re-determines the service period of the service data after receiving the alarm, inputs the re-determined service period to the second daemon process, and sends an instruction for re-determining the mapping relation to the second daemon process.
As shown in fig. 5, after receiving the instruction of re-determining the mapping relationship, the second daemon process starts to re-determine the mapping relationship between the data statistics information of the database and the time based on the re-determined service period.
As shown in fig. 5, the database query engine periodically predicts the latest data statistics according to the mapping relationship stored therein, and stores the data statistics. In other words, the database query engine periodically executes the above step S11, and predicts the data statistics corresponding to the current period according to the mapping relationship between the data statistics of the database and the time.
As shown in fig. 5, the database query engine also continuously processes database query statements. After a database Query engine obtains a database Query statement (e.g., Structured Query Language, SQL for short), it first determines whether an existing mapping relationship is in a valid state. If the existing mapping relationship still remains in a valid state, the database query engine may generate an execution plan for the database query statement according to the latest predicted data statistics (i.e., the data statistics predicted in the current cycle), and process the database query statement based on the generated execution plan.
If the existing mapping is marked as invalid, determining that the existing mapping has been invalid. In this way, an execution plan may be generated for the database query statement according to a preset alternative manner, and the database query statement may be processed based on the generated execution plan.
As shown in fig. 5, after the second daemon process redetermines the mapping relationship between the data statistics information of the database and the time, the redetermined mapping relationship is transmitted to the database query engine. The database query engine saves the re-determined mapping relationship. In this way, the database query engine may process the database query statement in the manner of the above steps S11 and S12 based on the mapping relationship.
Based on the same inventive concept, an embodiment of the present invention provides a database query device. Referring to fig. 6(a), fig. 6(a) is a schematic diagram of a database query device according to an embodiment of the present invention. As shown in fig. 6(a), the apparatus includes:
the statistical information determining module 61 is configured to predict the data statistical information corresponding to the current period according to a mapping relationship between the data statistical information of the database and time;
an execution plan generating module 62, configured to generate an execution plan for the database query statement obtained in the current period according to the data statistics information corresponding to the current period;
and a query statement processing module 63, configured to process the database query statement based on the execution plan.
Optionally, the statistical information determining module 61 in the device is specifically configured to determine, based on a preset time point in the current period, data statistical information corresponding to the preset time point from the mapping relationship, and determine the data statistical information corresponding to the preset time point as the data statistical information corresponding to the current period; the preset time point is a starting time point, an ending time point or a time point between the starting time point and the ending time point of the current period.
Alternatively, referring to fig. 6(b), fig. 6(b) is a schematic diagram of a database query device according to another embodiment of the present invention. As shown in fig. 6(b), the apparatus may further include, in addition to the statistical information determination module and the query statement processing module:
and a mapping relation determining module 60, configured to determine and store the mapping relation between the data statistics information of the database and the time before predicting the data statistics information corresponding to the current period according to the mapping relation between the data statistics information of the database and the time.
Optionally, the mapping relationship determining module 60 in the apparatus is specifically configured to determine a service period of the service data according to a service characteristic of the service data recorded in the database; continuously counting the service data in at least one service period to obtain multiple groups of data statistical information of the service data, wherein the multiple groups of data statistical information are time sequence data; and determining a mapping relation between the data statistical information of the service data and time according to the plurality of groups of data statistical information, and storing the mapping relation, wherein the time length of the time is equal to or greater than the time length of the service period.
Optionally, as shown in fig. 6(b), the apparatus may further include:
a mapping relation monitoring module 64, configured to monitor validity of the mapping relation according to data statistical information predicted for at least one time point respectively by using the mapping relation and actual data statistical information corresponding to the at least one time point respectively after determining and storing the mapping relation between the data statistical information of the database and time;
the mapping relation determining module 60 in the apparatus is further configured to, in the event that the mapping relation fails, re-determine the mapping relation between the data statistics information of the database and time, and store the re-determined mapping relation.
Optionally, the mapping relationship monitoring module 64 in the apparatus is specifically configured to, every preset time interval, obtain data statistical information predicted for a current time point by using the mapping relationship, and obtain actual data statistical information corresponding to the current time point; determining the similarity between the predicted data statistical information corresponding to the current time point and the actual data statistical information; and determining that the mapping relation is invalid under the condition that the similarity determined for N times is lower than a preset threshold, wherein N is a preset integer greater than or equal to 1.
Optionally, when the mapping relationship is determined again, the mapping relationship determining module 60 in the apparatus is specifically configured to determine again the service period of the service data according to the latest service characteristic of the service data recorded in the database; and re-determining the mapping relation between the data statistical information of the database and the time based on the re-determined service period, wherein the time length of the time in the re-determined mapping relation is equal to or greater than the time length of the re-determined service period.
Optionally, as shown in fig. 6(b), the apparatus may further include:
and the query statement alternative processing module 65 is configured to, during the process of re-determining the mapping relationship between the data statistics information of the database and time, generate an execution plan for the database query statement according to a preset alternative manner when the database query statement is obtained, and process the database query statement based on the execution plan.
An embodiment of the present invention further provides an electronic device, as shown in fig. 7, including a processor 701, a communication interface 702, a memory 703 and a communication bus 704, where the processor 701, the communication interface 702, and the memory 703 complete mutual communication through the communication bus 704,
a memory 703 for storing a computer program;
the processor 701 is configured to implement the following steps when executing the program stored in the memory 703:
predicting data statistical information corresponding to the current period according to the mapping relation between the data statistical information of the database and time;
and generating an execution plan for the database query statement obtained in the current period according to the data statistical information corresponding to the current period, and processing the database query statement based on the execution plan.
Alternatively, the processor 701 implements the steps included in the above-described other method embodiments of the present invention when executing the program stored in the memory 703.
The communication bus mentioned in the above terminal may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The communication bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown, but this does not mean that there is only one bus or one type of bus.
The communication interface is used for communication between the terminal and other equipment.
The Memory may include a Random Access Memory (RAM) or a non-volatile Memory (non-volatile Memory), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the processor.
The Processor may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; the device can also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, a discrete Gate or transistor logic device, or a discrete hardware component.
In yet another embodiment of the present invention, there is also provided a computer-readable storage medium having stored therein instructions, which when run on a computer, cause the computer to execute the database query method described in any of the above embodiments.
In yet another embodiment, the present invention further provides a computer program product containing instructions, which when run on a computer, causes the computer to execute the database query method described in any of the above embodiments.
In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the invention to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website site, computer, server, or data center to another website site, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the system embodiment, since it is substantially similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.
The above description is only for the preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.

Claims (10)

1. A method of database querying, the method comprising:
predicting data statistical information corresponding to the current period according to the mapping relation between the data statistical information of the database and time;
generating an execution plan for the database query statement obtained in the current period according to the data statistical information corresponding to the current period;
processing the database query statement based on the execution plan.
2. The method according to claim 1, wherein the step of predicting the data statistics corresponding to the current cycle according to the mapping relationship between the data statistics of the database and the time comprises:
determining data statistical information corresponding to a preset time point from the mapping relation according to the preset time point in the current period, and determining the data statistical information corresponding to the preset time point as the data statistical information corresponding to the current period;
the preset time point is a starting time point, an ending time point or a time point between the starting time point and the ending time point of the current period.
3. The method of claim 1, wherein before predicting the data statistics corresponding to the current cycle according to the mapping relationship between the data statistics of the database and the time, the method further comprises:
determining a service period of the service data according to the service characteristics of the service data recorded in the database;
continuously counting the service data in at least one service period to obtain multiple groups of data statistical information of the service data, wherein the multiple groups of data statistical information are time sequence data;
and determining a mapping relation between the data statistical information of the service data and time according to the plurality of groups of data statistical information, and storing the mapping relation, wherein the time length of the time is equal to or greater than the time length of the service period.
4. The method of claim 3, wherein after determining the mapping relationship between the data statistics and the time of the traffic data, the method further comprises:
monitoring the effectiveness of the mapping relation according to data statistical information predicted by using the mapping relation as a target time point and actual data statistical information corresponding to the target time point, wherein the target time point is a pre-specified time point;
and under the condition that the mapping relation is invalid, re-determining the mapping relation between the data statistical information of the database and the time, and storing the re-determined mapping relation.
5. The method of claim 4, wherein the step of monitoring the validity of the mapping relationship based on the data statistics predicted for the target time point using the mapping relationship and the actual data statistics corresponding to the target time point comprises:
acquiring data statistical information predicted for a current target time point by using the mapping relation and actual data statistical information corresponding to the current target time point every preset time length;
determining the similarity between the predicted data statistical information corresponding to the current target time point and the actual data statistical information;
and determining that the mapping relation is invalid under the condition that the similarity determined for N times is lower than a preset threshold, wherein N is a preset integer greater than or equal to 1.
6. The method of claim 4, wherein the step of re-determining the mapping relationship between the data statistics and the time of the database comprises:
re-determining the service period of the service data according to the latest service characteristics of the service data recorded in the database;
and re-determining the mapping relation between the data statistical information of the database and the time based on the re-determined service period, wherein the time length of the time in the re-determined mapping relation is equal to or greater than the time length of the re-determined service period.
7. The method of claim 4, wherein during the re-determining of the mapping relationship between the data statistics and the time of the database, the method further comprises:
when the database query statement is obtained, an execution plan is generated for the database query statement according to a preset alternative mode, and the database query statement is processed based on the execution plan.
8. An apparatus for querying a database, the apparatus comprising:
the statistical information determining module is used for predicting the data statistical information corresponding to the current period according to the mapping relation between the data statistical information of the database and the time;
the execution plan generating module is used for generating an execution plan for the database query statement obtained in the current period according to the data statistical information corresponding to the current period;
and the query statement processing module is used for processing the database query statement based on the execution plan.
9. An electronic device is characterized by comprising a processor, a communication interface, a memory and a communication bus, wherein the processor and the communication interface are used for realizing mutual communication by the memory through the communication bus;
a memory for storing a computer program;
a processor for implementing the method steps of any of claims 1 to 7 when executing a program stored in the memory.
10. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1-7.
CN202010583303.6A 2020-06-23 2020-06-23 Database query method, device, electronic equipment and readable storage medium Active CN111737281B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010583303.6A CN111737281B (en) 2020-06-23 2020-06-23 Database query method, device, electronic equipment and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010583303.6A CN111737281B (en) 2020-06-23 2020-06-23 Database query method, device, electronic equipment and readable storage medium

Publications (2)

Publication Number Publication Date
CN111737281A true CN111737281A (en) 2020-10-02
CN111737281B CN111737281B (en) 2023-09-01

Family

ID=72650729

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010583303.6A Active CN111737281B (en) 2020-06-23 2020-06-23 Database query method, device, electronic equipment and readable storage medium

Country Status (1)

Country Link
CN (1) CN111737281B (en)

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040243555A1 (en) * 2003-05-30 2004-12-02 Oracle International Corp. Methods and systems for optimizing queries through dynamic and autonomous database schema analysis
KR20090085869A (en) * 2008-02-05 2009-08-10 엔에이치엔(주) Method and system for managing database
US20100114868A1 (en) * 2008-10-21 2010-05-06 International Business Machines Corporation Query execution plan efficiency in a database management system
CN102262636A (en) * 2010-05-25 2011-11-30 中国移动通信集团浙江有限公司 Method and device for generating database partition execution plan
US20160224627A1 (en) * 2015-02-03 2016-08-04 International Business Machines Corporation Forecasting query access plan obsolescence
CN106599130A (en) * 2016-12-02 2017-04-26 中国银联股份有限公司 Method and device for selectively interfering with multiple indexes of relational database management system
CN108370324A (en) * 2015-11-13 2018-08-03 电子湾有限公司 Distributed data base work data tilt detection
CN108804459A (en) * 2017-05-02 2018-11-13 杭州海康威视数字技术股份有限公司 Data query method and device
CN108829768A (en) * 2018-05-29 2018-11-16 中国银行股份有限公司 A kind of collection method and device of statistical information
CN110399388A (en) * 2019-07-29 2019-11-01 中国工商银行股份有限公司 Data query method, system and equipment

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040243555A1 (en) * 2003-05-30 2004-12-02 Oracle International Corp. Methods and systems for optimizing queries through dynamic and autonomous database schema analysis
KR20090085869A (en) * 2008-02-05 2009-08-10 엔에이치엔(주) Method and system for managing database
US20100114868A1 (en) * 2008-10-21 2010-05-06 International Business Machines Corporation Query execution plan efficiency in a database management system
CN102262636A (en) * 2010-05-25 2011-11-30 中国移动通信集团浙江有限公司 Method and device for generating database partition execution plan
US20160224627A1 (en) * 2015-02-03 2016-08-04 International Business Machines Corporation Forecasting query access plan obsolescence
CN108370324A (en) * 2015-11-13 2018-08-03 电子湾有限公司 Distributed data base work data tilt detection
CN106599130A (en) * 2016-12-02 2017-04-26 中国银联股份有限公司 Method and device for selectively interfering with multiple indexes of relational database management system
CN108804459A (en) * 2017-05-02 2018-11-13 杭州海康威视数字技术股份有限公司 Data query method and device
CN108829768A (en) * 2018-05-29 2018-11-16 中国银行股份有限公司 A kind of collection method and device of statistical information
CN110399388A (en) * 2019-07-29 2019-11-01 中国工商银行股份有限公司 Data query method, system and equipment

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
连欣: "基于成本的Spark SQL优化", 《中国优秀硕士学位论文全文数据库》, no. 1 *

Also Published As

Publication number Publication date
CN111737281B (en) 2023-09-01

Similar Documents

Publication Publication Date Title
CN110008257B (en) Data processing method, device, system, computer equipment and storage medium
US8631040B2 (en) Computer-implemented systems and methods for flexible definition of time intervals
KR102522274B1 (en) User grouping method, apparatus thereof, computer, computer-readable recording medium and computer program
US11163735B2 (en) Database capacity estimation for database sizing
WO2015148159A1 (en) Determining a temporary transaction limit
WO2022252782A1 (en) Cloud computing index recommendation method and system
CN111858742A (en) Data visualization method and device, storage medium and equipment
CN113918622B (en) Information tracing method and system based on block chain
CN111382182A (en) Data processing method and device, electronic equipment and storage medium
CN110737673B (en) Data processing method and system
CN108932241B (en) Log data statistical method, device and node
CN110781235A (en) Big data based purchase data processing method and device, terminal and storage medium
CN110097113B (en) Method, device and system for monitoring working state of display information delivery system
WO2019019596A1 (en) Breakpoint list processing method, device, server and medium
CN109189810B (en) Query method, query device, electronic equipment and computer-readable storage medium
CN111737281A (en) Database query method and device, electronic equipment and readable storage medium
CN113220705A (en) Slow query identification method and device
CN115617794A (en) Data analysis method, data analysis apparatus, and computer-readable storage medium
CN111131393B (en) User activity data statistical method, electronic device and storage medium
CN113377604B (en) Data processing method, device, equipment and storage medium
CN111401969B (en) Method, device, server and storage medium for improving user retention
CN114077532A (en) SQL statement execution efficiency detection method and device
CN113469825A (en) Data processing method and device, electronic equipment and computer readable medium
CN111047150A (en) Method, device and system for calculating stability rate of process industrial device
CN112561552A (en) Method and device for adjusting value attribute of article

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant