WO2022252782A1 - 云计算索引推荐方法及系统 - Google Patents
云计算索引推荐方法及系统 Download PDFInfo
- Publication number
- WO2022252782A1 WO2022252782A1 PCT/CN2022/083619 CN2022083619W WO2022252782A1 WO 2022252782 A1 WO2022252782 A1 WO 2022252782A1 CN 2022083619 W CN2022083619 W CN 2022083619W WO 2022252782 A1 WO2022252782 A1 WO 2022252782A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- query
- index
- cost
- computing
- historical
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 51
- 238000003860 storage Methods 0.000 claims abstract description 76
- 238000010276 construction Methods 0.000 claims description 35
- 238000004458 analytical method Methods 0.000 claims description 30
- 238000004364 calculation method Methods 0.000 claims description 28
- 238000005070 sampling Methods 0.000 claims description 19
- 230000008901 benefit Effects 0.000 claims description 16
- 230000002776 aggregation Effects 0.000 claims description 14
- 238000004220 aggregation Methods 0.000 claims description 14
- 238000007728 cost analysis Methods 0.000 claims description 12
- 239000000284 extract Substances 0.000 claims description 5
- 238000004590 computer program Methods 0.000 claims description 3
- 238000012545 processing Methods 0.000 description 14
- 230000008569 process Effects 0.000 description 11
- 238000010586 diagram Methods 0.000 description 5
- 238000004891 communication Methods 0.000 description 3
- 230000004044 response Effects 0.000 description 2
- 238000012546 transfer Methods 0.000 description 2
- 239000002253 acid Substances 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 238000012731 temporal analysis Methods 0.000 description 1
- 238000000700 time series analysis Methods 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/10—Protocols in which an application is distributed across nodes in the network
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2453—Query optimisation
- G06F16/24534—Query rewriting; Transformation
- G06F16/24542—Plan optimisation
- G06F16/24545—Selectivity estimation or determination
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/2228—Indexing structures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/2228—Indexing structures
- G06F16/2272—Management thereof
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2453—Query optimisation
- G06F16/24534—Query rewriting; Transformation
- G06F16/24542—Plan optimisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/901—Indexing; Data structures therefor; Storage structures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9532—Query formulation
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/1396—Protocols specially adapted for monitoring users' activity
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Definitions
- the present disclosure relates to the technical field of cloud computing, and in particular to a cloud computing index recommendation method and system.
- MPP Massively Parallel Processing
- Embodiments of the present disclosure provide a cloud computing index recommendation method and system, which can reduce the total cost of ownership used on the cloud by exchanging computing costs for storage costs.
- a cloud computing index recommendation method including:
- the plurality of current query indexes determine the total cost corresponding to each current query index through the unit computing cost and unit storage cost, as well as the computing resource usage and usage time;
- a target query index is recommended to the target user, wherein the target query index includes the query index with the lowest cost among the total costs corresponding to each current query index.
- the method for determining the query cost of each query index according to the frequency, time, and computing resources used for querying the database by the query index includes:
- the cost is calculated based on the pre-acquired query index, the cost benefit of the query index is determined, and a cost benefit label is added to the query index.
- the method for determining the computing resource usage and usage time corresponding to each current query index includes:
- the method further includes:
- the calculation cost of the target query index is determined.
- the method before acquiring all historical query statements of the target user, extracting the common characteristics of all the historical query statements, and determining the query index corresponding to the historical query statements according to the common characteristics, the method also includes:
- a query index corresponding to the historical query statement is determined according to the query analysis model, wherein the query index includes an inclusion relationship between the query statement and the query index.
- a cloud computing index recommendation system includes:
- the cloud computing and storage cost collection module is used to obtain the unit computing cost and unit storage cost of the currently used cloud computing server in unit time;
- the query history analysis and prediction module is used to obtain all historical query statements of the target user, extract the common characteristics of all the historical query statements, and determine the corresponding query index of the historical query statements according to the common characteristics;
- a cost analysis and prediction module which is used to determine the query cost of each query index according to the frequency and time of querying the database by the query index, and the computing resources used; based on the obtained current query statement of the target user, determine the multiple current query indexes corresponding to the current query statement;
- the plurality of current query indexes determine the total cost corresponding to each current query index through the unit computing cost and unit storage cost, as well as the computing resource usage and usage time;
- the intelligent central judgment module recommends a target query index to the target user, wherein the target query index includes the query index with the lowest cost among the total costs corresponding to each current query index.
- construction and storage cost analysis and prediction module is also used for:
- the cost is calculated based on the pre-acquired query index, the cost benefit of the query index is determined, and a cost benefit label is added to the query index.
- construction and storage cost analysis and prediction module is also used for:
- system further includes a cost calculation module, and the cost calculation module is used for:
- the calculation cost of the target query index is determined.
- system further includes a model matching module, and the model matching module is used for:
- a query index corresponding to the historical query statement is determined according to the query analysis model, wherein the query index includes an inclusion relationship between the query statement and the query index.
- An embodiment of the present disclosure provides a cloud computing index recommendation method, the method comprising:
- the plurality of current query indexes determine the total cost corresponding to each current query index through the unit computing cost and unit storage cost, as well as the computing resource usage and usage time;
- a target query index is recommended to the target user, wherein the target query index includes the query index with the lowest cost among the total costs corresponding to each current query index.
- Embodiments of the present disclosure provide an intelligently recommended index to reduce query calculation costs, and more intelligently recommended indexes are used for pre-computation, thereby exchanging calculation costs for storage costs, thereby reducing the total cost of ownership used on the cloud.
- the more queries there are the more precomputed results can be reused, and the computing resources consumed by each query can be reduced.
- FIG. 1 is a schematic flow diagram of a cloud computing index recommendation method according to an embodiment of the present disclosure
- FIG. 2 is a logical schematic diagram of a cloud computing index recommendation system according to an embodiment of the present disclosure
- FIG. 3 is a schematic structural diagram of a cloud computing index recommendation system according to an embodiment of the present disclosure.
- sequence numbers of the processes do not mean the order of execution, and the execution order of the processes should be determined by their functions and internal logic, rather than by the implementation order of the embodiments of the present disclosure.
- the implementation process constitutes no limitation.
- “plurality” means two or more.
- “And/or” is just an association relationship describing associated objects, which means that there can be three kinds of relationships, for example, and/or B, which can mean: A exists alone, A and B exist at the same time, and B exists alone. .
- the character “/” generally indicates that the contextual objects are an “or” relationship.
- “Includes A, B and C” means that A, B, and C are all included, “includes A, B, or C” means includes one of A, B, and C, "Containing A, B and/or C” means containing any 1 or any 2 or 3 of A, B and C.
- B corresponding to A means that B is associated with A, and according to A It is possible to determine B. Determining B from A does not mean determining B from A alone, B can also be determined from A and/or other information.
- the matching between A and B means that the similarity between A and B is greater than or equal to a preset threshold.
- Fig. 1 exemplarily shows a schematic flowchart of a cloud computing index recommendation method according to an embodiment of the present disclosure. As shown in Fig. 1 , the method includes:
- Step S101 obtaining the unit computing cost and unit storage cost of the currently used cloud computing server per unit time;
- the cloud computing index recommendation method of the embodiment of the present disclosure is a solution based on cost intelligent recommendation index on the cloud in the OLAP field. Based on the solution of the embodiment of the present disclosure, it can meet the customer's query performance and construction performance, and will analyze and query according to the customer. Historically, through all-round and multiple rounds of intelligent feedback tuning, a part of the index was finally intelligently recommended. By increasing this part of the index, although the construction calculation cost and storage cost were increased, the calculation cost of the query was greatly reduced, thereby greatly reducing the the total cost.
- the application of OLTP On-Line Transaction Processing, online transaction processing process
- OLTP On-Line Transaction Processing, online transaction processing process
- the amount of data is not very large, and the amount of data on the production database is generally not too large, and corresponding data processing and transfer will be done in a timely manner.
- Transactions are generally deterministic. For example, the amount of bank deposits and withdrawals must be deterministic, so OLTP accesses deterministic data.
- OLAP On-Line Analytical Processing, online analytical processing process
- the real-time requirements are not very high.
- the most common application is to update data at the sky level, and then generate corresponding data reports.
- the amount of data is large, because OLAP supports dynamic queries, so users may have to collect a lot of data before they can get the information they want to know, such as time series analysis, etc., so the amount of data processed is large;
- Step S102 obtaining all historical query statements of the target user, extracting the common characteristics of all the historical query statements, and determining the query index corresponding to the historical query statements according to the common characteristics;
- the query index can be further determined according to the common features, and the query index can be reused, thereby reducing the cost of subsequent query.
- the method before acquiring all historical query statements of the target user, extracting the common characteristics of all the historical query statements, and determining the query index corresponding to the historical query statements according to the common characteristics, the method also includes:
- a query index corresponding to the historical query statement is determined according to the query analysis model, wherein the query index includes an inclusion relationship between the query statement and the query index.
- Step S103 Determine the query cost of each query index according to the frequency and time of querying the database by the query index and the computing resources used;
- the method for determining the query cost of each query index according to the frequency, time, and computing resources used for querying the database by the query index includes:
- the cost is calculated based on the pre-acquired query index, the cost benefit of the query index is determined, and a cost benefit label is added to the query index.
- the query history analysis and forecasting module is based on the frequency of querying SQL in historical queries, as well as the time-consuming and computing resources used for querying SQL, and The data sampling statistics of the source data can be used to infer that the SQL of each query can reduce the usage of computing resources after having a certain index, save the cost of computing, and label each index with the query cost benefit.
- Step S104 based on the acquired current query statement of the target user, determine a plurality of current query indexes corresponding to the current query statement;
- Step S015 according to the plurality of current query indexes, determine the total cost corresponding to each current query index through the unit computing cost and unit storage cost, as well as the computing resource usage and usage time;
- the method for determining the computing resource usage and usage time corresponding to each current query index includes:
- the calculation cost and storage cost required to construct each index can be inferred based on the data sampling statistics of the source data.
- the slope rate and repetition rate of each dimension can be identified based on the data sampling statistics of the source data, so as to intelligently predict the CPU and memory resources and construction time required for each index calculation, and thus speculate Display the usage and duration of computing resources.
- the storage size of the index will be constructed according to the data characteristics, and then the total cost of each index will be calculated according to the unit computing cost and unit storage cost provided by the cloud computing and storage cost collection module, and then Label all candidate indexes with construction cost expenditures.
- Step S106 recommending a target query index to the target user.
- the target query index includes the query index with the lowest cost among the total costs corresponding to each current query index.
- the method further includes:
- the calculation cost of the target query index is determined.
- all construction cost expenditures can be analyzed. Then select the index according to the total cost-benefit, so as to provide the index recommendation scheme with the lowest total cost.
- An embodiment of the present disclosure provides a cloud computing index recommendation method, the method comprising:
- the plurality of current query indexes determine the total cost corresponding to each current query index through the unit computing cost and unit storage cost, as well as the computing resource usage and usage time;
- a target query index is recommended to the target user, wherein the target query index includes the query index with the lowest cost among the total costs corresponding to each current query index.
- Embodiments of the present disclosure provide an intelligently recommended index to reduce query calculation costs, and more intelligently recommended indexes are used for pre-computation, thereby exchanging calculation costs for storage costs, thereby reducing the total cost of ownership used on the cloud.
- the more queries there are the more precomputed results can be reused, and the computing resources consumed by each query can be reduced.
- Fig. 2 exemplarily shows a logical schematic diagram of a cloud computing index recommendation system according to an embodiment of the present disclosure.
- the operating logic of the system includes:
- Cloud computing and storage cost collection module this module will automatically collect the computing host type of the cloud service provider currently used, the usage cost per unit time of the computing server, and the storage cost per unit time of the unit storage data volume, this module Adapted to a number of mainstream manufacturers, it is used to collect accurate unit calculation and storage cost information to support the cost calculation process of querying historical analysis and forecasting modules and building and storing cost analysis and forecasting modules.
- Query historical analysis and prediction module which collects all historical analysis query statements of customers, extracts common features from all query plan trees, and recommends models that can answer these queries, because customer analysis queries are complex and diverse , so a large number of indexes with inclusion relations will be recommended.
- the query history analysis and prediction module estimates based on the frequency of SQL queries in historical queries, the time-consuming and computing resources used for querying SQL, and the data sampling statistics of source data. After a certain index is available, the SQL for each query can reduce the usage of computing resources, save computing costs, and label each index with a query cost benefit.
- Construction and storage cost analysis prediction module this module will receive the construction index candidates passed in by the intelligent central judgment module, and this module will infer the construction calculation cost required to construct each index based on the data sampling statistics of the source data , and storage costs.
- this module will identify the slope rate and repetition rate of each dimension based on the data sampling statistics of the source data, so as to intelligently predict and calculate the CPU and memory resources and construction time required for each index. In this way, the usage amount and usage time of computing resources can be estimated.
- the storage size of the index will be estimated based on the data characteristics, and then the unit computing cost and unit storage provided by the cloud computing and storage cost collection module Cost, calculate the total cost of each index, and label all candidate indexes with construction cost expenditure.
- Intelligent central judgment module this module will notify the query history analysis and prediction module to provide all candidate indexes and their query cost and benefit, and submit it to the construction and storage cost analysis and prediction module to analyze all construction cost expenditures. Then select the index according to the total cost-benefit, so as to provide the index recommendation scheme with the lowest total cost.
- this module will build pre-calculated indexes according to the index recommended by the intelligent central judgment module, the pre-calculated module will pull the original ultra-large-scale data set for pre-aggregation, and provide the built index To the query module, so as to accelerate the execution efficiency of customer analysis SQL and reduce the amount of scanned data, and further reduce the cost of query calculation.
- Fig. 3 exemplarily shows a schematic structural diagram of a cloud computing index recommendation system according to an embodiment of the present disclosure. As shown in Fig. 3 , the system includes:
- Cloud computing and storage cost collection module used to obtain the unit computing cost and unit storage cost of the currently used cloud computing server in unit time;
- the query history analysis and prediction module 32 is used to obtain all the historical query statements of the target user, extract the common characteristics of all the historical query statements, and determine the corresponding query index of the historical query statements according to the common characteristics;
- the plurality of current query indexes determine the total cost corresponding to each current query index through the unit computing cost and unit storage cost, as well as the computing resource usage and usage time;
- the intelligent center judgment module 34 recommends a target query index to the target user, wherein the target query index includes the query index with the lowest cost among the total costs corresponding to each current query index.
- construction and storage cost analysis and prediction module 33 is also used for:
- the cost is calculated based on the pre-acquired query index, the cost benefit of the query index is determined, and a cost benefit label is added to the query index.
- construction and storage cost analysis and prediction module 33 is also used for:
- system further includes a cost calculation module, and the cost calculation module is used for:
- the calculation cost of the target query index is determined.
- system further includes a model matching module, and the model matching module is used for:
- a query index corresponding to the historical query statement is determined according to the query analysis model, wherein the query index includes an inclusion relationship between the query statement and the query index.
- the present disclosure also provides a program product, the program product includes execution instructions, and the execution instructions are stored in a readable storage medium. At least one processor of the device may read the execution instruction from the readable storage medium, and the at least one processor executes the execution instruction so that the device implements the methods provided in the foregoing various implementation manners.
- the readable storage medium may be a computer storage medium, or a communication medium.
- Communication media includes any medium that facilitates transfer of a computer program from one place to another.
- Computer storage media can be any available media that can be accessed by a general purpose or special purpose computer.
- a readable storage medium is coupled to the processor such that the processor can read information from, and write information to, the readable storage medium.
- the readable storage medium can also be a component of the processor.
- the processor and the readable storage medium may be located in Application Specific Integrated Circuits (ASIC for short). Additionally, the ASIC may be located in the user equipment.
- ASIC Application Specific Integrated Circuits
- the processor and the readable storage medium can also exist in the communication device as discrete components.
- the readable storage medium may be read only memory (ROM), random access memory (RAM), CD-ROM, magnetic tape, floppy disk, and optical data storage devices, among others.
- the processor may be a central processing unit (English: Central Processing Unit, referred to as: CPU), and may also be other general-purpose processors, digital signal processors (English: Digital Signal Processor , referred to as: DSP), application specific integrated circuit (English: Application Specific Integrated Circuit, referred to as: ASIC) and so on.
- a general-purpose processor may be a microprocessor, or the processor may be any conventional processor, or the like. The steps of the method disclosed in conjunction with the present disclosure may be directly implemented by a hardware processor, or implemented by a combination of hardware and software modules in the processor.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Software Systems (AREA)
- Operations Research (AREA)
- Computational Linguistics (AREA)
- Mathematical Physics (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
本公开提供一种云计算索引推荐方法及系统,所述方法包括:获取当前使用的云计算服务器在单位时间的单位计算成本以及单位存储成本;获取目标用户所有的历史查询语句,提取所有的历史查询语句的共有特征,根据共有特征确定历史查询语句对应的查询索引;根据查询索引查询数据库的频率、时间以及所使用的计算资源,确定每个查询索引的查询成本;基于所获取的目标用户的当前查询语句,确定当前查询语句对应的多个当前查询索引;根据多个当前查询索引,通过单位计算成本以及单位存储成本、以及计算资源使用量以及使用时间,确定每个当前查询索引对应的总成本;向目标用户推荐目标查询索引。本公开的方法通过推荐索引,用存储资源交换计算资源。
Description
本公开涉及云计算技术领域,尤其涉及一种云计算索引推荐方法及系统。
近年来,云计算行业突飞猛进,越来越多的企业开始大规模拥抱云环境,无论是OLTP(On-Line Transaction Processing,联机事务处理过程)应用,还是OLAP(On-Line Analytical Processing,联机分析处理过程)应用,都在逐步往云上迁移,主流的云厂商都提供了可靠弹性伸缩的计算服务和存储服务,来满足客户的需求。
目前的大数据架构的一种趋势是计算和存储分离,在云环境的大背景下面,计算服务部署在云厂商提供的弹性云服务器上,存储服务则可以选择使用云厂商提供的廉价的可以无限伸缩的块存储上面。
在观察多个主流云计算服务商的产品定价后,可以看到块存储的成本是远远低于计算成本的,在当今的OLAP分析领域内,很多软件是采用了MPP架构的系统,MPP(Massively Parallel Processing,大规模并行处理)的核心思想是把任务并行的分散到多个服务器和节点上面,在每个节点上完成计算后,将各自的结果汇总在一起得到最终的分析结果,然后在当今云环境上,在处理超大规模的数据集的时候,每个查询都会消耗大量的计算资源,即使是重复的查询分析需求,会产生高昂的分析成本。
发明内容
本公开实施例提供一种云计算索引推荐方法及系统,能够通过将计算成本交换为存储成本,从而降低云上使用的总拥有成本。
本公开实施例的第一方面,提供一种云计算索引推荐方法,包括:
获取当前使用的云计算服务器在单位时间的单位计算成本以及单位存储成本;
获取目标用户所有的历史查询语句,提取所述所有的历史查询语句的共有特征,根据所述共有特征确定所述历史查询语句对应的查询索引;
根据所述查询索引查询数据库的频率、时间以及所使用的计算资源,确定每个查询索引的查询成本;
基于所获取的目标用户的当前查询语句,确定所述当前查询语句对应的多个当前查询索引;
根据所述多个当前查询索引,通过所述单位计算成本以及单位存储成本、以及所述计算资源使用量以及使用时间,确定每个当前查询索引对应的总成本;
向所述目标用户推荐目标查询索引,其中,所述目标查询索引包括每个当前查询索引对应的总成本中成本最低的查询索引。
在一种可选的实施方式中,所述根据所述查询索引查询数据库的频率、时间以及所使用的计算资源,确定每个查询索引的查询成本的方法包括:
根据所述查询索引查询数据库的频率、所述查询索引查询数据库的时间、所述查询索引查询所用计算资源,以及预先获取的源数据的数据抽样统计信息,确定每个查询索引的查询成本;
基于预先获取的查询索引计算成本,确定所述查询索引的成本收益,并为所述查询索引添加成本收益标签。
在一种可选的实施方式中,确定每个当前查询索引对应的计算资源使用量以及使用时间的方法包括:
根据预先获取的源数据的数据抽样统计信息,确定所述查询索引在每个维度的倾斜率和重复率;
基于每个维度的倾斜率和重复率,预测所述每个查询索引所需的计算资源、内存资源以及构建时长;
基于所述每个查询索引所需的计算资源、内存资源以及构建时长,以及所述单位计算成本以及单位存储成本,确定每个当前查询索引对应的计算资源使用量以及使用时间。
在一种可选的实施方式中,所述向所述目标用户推荐目标查询索引之后,所述方法还包括:
基于所述目标查询索引,构建预计算索引;
基于所述预计算索引,以及预先构建数据集对所述预计算索引进行预聚合;
基于所述进行预聚合后的预计算索引,分析所述目标用户的查询语句对数据库的查询效率以及扫描所述数据库的数据量;
基于所述查询效率以及所述扫描所述数据库的数据量,确定所述目标查询索引的计算成本。
在一种可选的实施方式中,所述获取目标用户所有的历史查询语句,提取所述所有的历史查询语句的共有特征,根据所述共有特征确定所述历史查询语句对应的查询索引之前,所述方法还包括:
基于预先获取的多个用户的所有的历史查询语句,构建所述所有的历史查询语句对应的查询计划树;
提取所述查询计划树的查询语句的共有特征,基于所述共有特征,匹配与所述共有特征对应的查询分析模型;
根据所述查询分析模型确定所述历史查询语句对应的查询索引,其中,所述查询索引包括所述查询语句与所述查询索引之间的包含关系。
本公开实施例的第二方面,提供一种云计算索引推荐系统,所述系统包括:
云上计算和存储成本收集模块,用于获取当前使用的云计算服务器在单位时间的单位计算成本以及单位存储成本;
查询历史分析预测模块,用于获取目标用户所有的历史查询语句,提取所述所有的历史查询语句的共有特征,根据所述共有特征确定所述历史查询语句对应的查询索引;
构建和存储成本分析预测模块,用于根据所述查询索引查询数据库的频率、时间以及所使用的计算资源,确定每个查询索引的查询成本;基于所获取的目标用户的当前查询语句,确定所述当前查询语句对应的多个当前查询索引;
根据所述多个当前查询索引,通过所述单位计算成本以及单位存储成本、以及所述计算资源使用量以及使用时间,确定每个当前查询索引对应的总成本;
智能中枢判断模块,向所述目标用户推荐目标查询索引,其中,所述目标查询索引包括每个当前查询索引对应的总成本中成本最低的查询索引。
在一种可选的实施方式中,所述构建和存储成本分析预测模块还用于:
根据所述查询索引查询数据库的频率、所述查询索引查询数据库的时间、所述查询索引查询所用计算资源,以及预先获取的源数据的数据抽样统计信息,确定每个查询索引的查询成本;
基于预先获取的查询索引计算成本,确定所述查询索引的成本收益,并为所述查询索引添加成本收益标签。
在一种可选的实施方式中,所述构建和存储成本分析预测模块还用于:
根据预先获取的源数据的数据抽样统计信息,确定所述查询索引在每个维度的倾斜率和重复率;
基于每个维度的倾斜率和重复率,预测所述每个查询索引所需的计算资源、内存资源以及构建时长;
基于所述每个查询索引所需的计算资源、内存资源以及构建时长,以及所述单位计算成本以及单位存储成本,确定每个当前查询索引对应的计算资源使用量以及使用时间。
在一种可选的实施方式中,所述系统还包括成本计算模块,所述成本计算模块用于:
基于所述目标查询索引,构建预计算索引;
基于所述预计算索引,以及预先构建数据集对所述预计算索引进行预聚合;
基于所述进行预聚合后的预计算索引,分析所述目标用户的查询语句对数据库的查询效率以及扫描所述数据库的数据量;
基于所述查询效率以及所述扫描所述数据库的数据量,确定所述目标查询索引的计算成本。
在一种可选的实施方式中,所述系统还包括模型匹配模块,所述模型匹配模块用于:
基于预先获取的多个用户的所有的历史查询语句,构建所述所有的历史查询语句对应的查询计划树;
提取所述查询计划树的查询语句的共有特征,基于所述共有特征,匹配与所述共有特征对应的查询分析模型;
根据所述查询分析模型确定所述历史查询语句对应的查询索引,其中,所述查询索引包括所述查询语句与所述查询索引之间的包含关系。
本公开实施例提供一种云计算索引推荐方法,所述方法包括:
获取当前使用的云计算服务器在单位时间的单位计算成本以及单位存储成本;
获取目标用户所有的历史查询语句,提取所述所有的历史查询语句的共有特征,根据所述共有特征确定所述历史查询语句对应的查询索引;
根据所述查询索引查询数据库的频率、时间以及所使用的计算资源,确定每 个查询索引的查询成本;
基于所获取的目标用户的当前查询语句,确定所述当前查询语句对应的多个当前查询索引;
根据所述多个当前查询索引,通过所述单位计算成本以及单位存储成本、以及所述计算资源使用量以及使用时间,确定每个当前查询索引对应的总成本;
向所述目标用户推荐目标查询索引,其中,所述目标查询索引包括每个当前查询索引对应的总成本中成本最低的查询索引。
本公开实施例提供一种智能推荐索引来减少查询计算成本,更多的使用智能推荐出来的索引来进行预计算,从而把计算成本交换为存储成本,从而减少在云上使用的总拥有成本。尤其在高并发的场景下,查询越多,越能复用预计算结果,就越能降低每个查询消耗的计算资源。
图1为本公开实施例云计算索引推荐方法的流程示意图;
图2为本公开实施例云计算索引推荐系统的逻辑示意图;
图3为本公开实施例云计算索引推荐系统的结构示意图。
为使本公开实施例的目的、技术方案和优点更加清楚,下面将结合本公开实施例中的附图,对本公开实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本公开一部分实施例,而不是全部的实施例。基于本公开中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本公开保护的范围。
本公开的说明书和权利要求书及上述附图中的术语“第一”、“第二”、“第三”“第四”等(如果存在)是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的本公开的实施例能够以除了在这里图示或描述的那些以外的顺序实施。
应当理解,在本公开的各种实施例中,各过程的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本公开实施例的实施过程构成任何限定。
应当理解,在本公开中,“包括”和“具有”以及他们的任何变形,意图在于覆 盖不排他的包含,例如,包含了一系列步骤或单元的过程、方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。
应当理解,在本公开中,“多个”是指两个或两个以上。“和/或”仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。字符“/”一般表示前后关联对象是一种“或”的关系。“包含A、B和C”、“包含A、B、C”是指A、B、C三者都包含,“包含A、B或C”是指包含A、B、C三者之一,“包含A、B和/或C”是指包含A、B、C三者中任1个或任2个或3个。
应当理解,在本公开中,“与A对应的B”、“与A相对应的B”、“A与B相对应”或者“B与A相对应”,表示B与A相关联,根据A可以确定B。根据A确定B并不意味着仅仅根据A确定B,还可以根据A和/或其他信息确定B。A与B的匹配,是A与B的相似度大于或等于预设的阈值。
取决于语境,如在此所使用的“若”可以被解释成为“在……时”或“当……时”或“响应于确定”或“响应于检测”。
下面以具体地实施例对本公开的技术方案进行详细说明。下面这几个具体的实施例可以相互结合,对于相同或相似的概念或过程可能在某些实施例不再赘述。
图1示例性地示出本公开实施例云计算索引推荐方法的流程示意图,如图1所示,所述方法包括:
步骤S101、获取当前使用的云计算服务器在单位时间的单位计算成本以及单位存储成本;
本公开实施例的云计算索引推荐方法是OLAP领域中基于云上成本智能推荐索引的方案,基于本公开实施例的方案,能够满足客户对查询性能和构建性能的前提下,会根据客户分析查询历史,进行全方位多轮次的智能反馈调优,最终智能推荐出来一部分索引,通过增加这部分索引,虽然增加了构建计算成本和存储成本,但是大幅度降低了查询的计算成本,从而大大降低了总成本。
本公开实施例中,OLTP(On-Line Transaction Processing,联机事务处理过程)应用,其特点一般有:
1.实时性要求高。
2.数据量不是很大,生产库上的数据量一般不会太大,而且会及时做相应的 数据处理与转移。
3.交易一般是确定的,比如银行存取款的金额肯定是确定的,所以OLTP是对确定性的数据进行存取。
4.高并发,并且要求满足ACID原则。比如两人同时操作一个银行卡账户,比如大型的购物网站秒杀活动时上万的QPS请求。
OLAP(On-Line Analytical Processing,联机分析处理过程)应用,其特点一般有:
1.实时性要求不是很高,比如最常见的应用就是天级更新数据,然后出对应的数据报表。
2.数据量大,因为OLAP支持的是动态查询,所以用户也许要通过将很多数据的统计后才能得到想要知道的信息,例如时间序列分析等等,所以处理的数据量很大;
3.OLAP系统的重点是通过数据提供决策支持,所以查询一般都是动态,自定义的。所以在OLAP中,维度的概念特别重要。一般会将用户所有关心的维度数据,存入对应数据平台。
通过获取当前使用的云计算服务器在单位时间的单位计算成本以及单位存储成本,能够适配多家主流云计算厂商,用来收集准确的单位计算和存储成本信息,用以支持后续的成本计算过程。
步骤S102、获取目标用户所有的历史查询语句,提取所述所有的历史查询语句的共有特征,根据所述共有特征确定所述历史查询语句对应的查询索引;
通过提取所有历史查询语句的共有特征,能够进一步根据共有特征确定查询索引,查询索引能够重复使用,从而降低后续查询成本。
在一种可选的实施方式中,所述获取目标用户所有的历史查询语句,提取所述所有的历史查询语句的共有特征,根据所述共有特征确定所述历史查询语句对应的查询索引之前,所述方法还包括:
基于预先获取的多个用户的所有的历史查询语句,构建所述所有的历史查询语句对应的查询计划树;
提取所述查询计划树的查询语句的共有特征,基于所述共有特征,匹配与所 述共有特征对应的查询分析模型;
根据所述查询分析模型确定所述历史查询语句对应的查询索引,其中,所述查询索引包括所述查询语句与所述查询索引之间的包含关系。
通过收集客户所有的历史分析查询语句,从所有的查询计划树中提取出来共有的特征,从而推荐出来可以回答这些查询的模型。
步骤S103、根据所述查询索引查询数据库的频率、时间以及所使用的计算资源,确定每个查询索引的查询成本;
在一种可选的实施方式中,所述根据所述查询索引查询数据库的频率、时间以及所使用的计算资源,确定每个查询索引的查询成本的方法包括:
根据所述查询索引查询数据库的频率、所述查询索引查询数据库的时间、所述查询索引查询所用计算资源,以及预先获取的源数据的数据抽样统计信息,确定每个查询索引的查询成本;
基于预先获取的查询索引计算成本,确定所述查询索引的成本收益,并为所述查询索引添加成本收益标签。
因为客户的分析查询是复杂多样的,所以会推荐出来具有包含关系的大量索引,查询历史分析预测模块根据历史查询中查询SQL的频率,以及查询SQL的耗时和使用的计算资源的情况,以及源数据的数据抽样统计信息,推测出具备某个索引后每个查询的SQL可以降低的计算资源的使用量,节约计算的成本,给每个索引打上查询成本收益的标签。
步骤S104、基于所获取的目标用户的当前查询语句,确定所述当前查询语句对应的多个当前查询索引;
步骤S015、根据所述多个当前查询索引,通过所述单位计算成本以及单位存储成本、以及所述计算资源使用量以及使用时间,确定每个当前查询索引对应的总成本;
在一种可选的实施方式中,确定每个当前查询索引对应的计算资源使用量以及使用时间的方法包括:
根据预先获取的源数据的数据抽样统计信息,确定所述查询索引在每个维度的倾斜率和重复率;
基于每个维度的倾斜率和重复率,预测所述每个查询索引所需的计算资源、内存资源以及构建时长;
基于所述每个查询索引所需的计算资源、内存资源以及构建时长,以及所述单位计算成本以及单位存储成本,确定每个当前查询索引对应的计算资源使用量以及使用时间。
多个候选索引,可以根据源数据的数据抽样统计信息,来推测出构建每个索引需要的构建计算成本和存储成本。在推测构建计算成本时,可以根据源数据的数据抽样统计信息,识别出来每个维度的倾斜率和重复率,从而智能预测出计算出每个索引需要的cpu和内存资源和构建时长,从而推测出计算资源的使用量和使用时长。
在推测存储成本的时候,会根据数据特征推测构建出该索引的存储大小,再根据云上计算和存储成本收集模块提供的单位计算成本和单位存储成本,计算出每个索引的总成本,再给所有的候选索引打上构建成本支出的标签。
步骤S106、向所述目标用户推荐目标查询索引。
其中,所述目标查询索引包括每个当前查询索引对应的总成本中成本最低的查询索引。
在一种可选的实施方式中,所述向所述目标用户推荐目标查询索引之后,所述方法还包括:
基于所述目标查询索引,构建预计算索引;
基于所述预计算索引,以及预先构建数据集对所述预计算索引进行预聚合;
基于所述进行预聚合后的预计算索引,分析所述目标用户的查询语句对数据库的查询效率以及扫描所述数据库的数据量;
基于所述查询效率以及所述扫描所述数据库的数据量,确定所述目标查询索引的计算成本。
根据所有的候选索引以及查询成本收益的情况,可以分析出所有的构建成本支出的情况。然后根据总的成本收益,来挑选索引,从而提供出使总成本最低的索引推荐方案。
本公开实施例提供一种云计算索引推荐方法,所述方法包括:
获取当前使用的云计算服务器在单位时间的单位计算成本以及单位存储成本;
获取目标用户所有的历史查询语句,提取所述所有的历史查询语句的共有特征,根据所述共有特征确定所述历史查询语句对应的查询索引;
根据所述查询索引查询数据库的频率、时间以及所使用的计算资源,确定每个查询索引的查询成本;
基于所获取的目标用户的当前查询语句,确定所述当前查询语句对应的多个当前查询索引;
根据所述多个当前查询索引,通过所述单位计算成本以及单位存储成本、以及所述计算资源使用量以及使用时间,确定每个当前查询索引对应的总成本;
向所述目标用户推荐目标查询索引,其中,所述目标查询索引包括每个当前查询索引对应的总成本中成本最低的查询索引。
本公开实施例提供一种智能推荐索引来减少查询计算成本,更多的使用智能推荐出来的索引来进行预计算,从而把计算成本交换为存储成本,从而减少在云上使用的总拥有成本。尤其在高并发的场景下,查询越多,越能复用预计算结果,就越能降低每个查询消耗的计算资源。
图2示例性地示出本公开实施例云计算索引推荐系统的逻辑示意图,如图2所示,所述系统的运行逻辑包括:
云上计算和存储成本收集模块,此模块会自动收集当前使用的云服务提供商的计算主机类型,计算服务器的单位时间的使用成本,以及单位存储数据量在单位时间内的存储成本,此模块适配了多家主流厂商,用来收集准确的单位计算和存储成本信息来支持查询历史分析预测模块和构建和存储成本分析预测模块的成本计算过程。
查询历史分析预测模块,此模块会收集客户所有的历史分析查询语句,从所有的查询计划树中提取出来共有的特征,从而推荐出来可以回答这些查询的模型,因为客户的分析查询是复杂多样的,所以会推荐出来具有包含关系的大量索引,查询历史分析预测模块根据历史查询中查询SQL的频率,以及查询SQL的耗时和使用的计算资源的情况,以及源数据的数据抽样统计信息,推测出具备某个索引后每个查询的SQL可以降低的计算资源的使用量,节约计算的成本,给每个索引打上查询成本收益的标签。
构建和存储成本分析预测模块,此模块会收到智能中枢判断模块传入的构建的索引候选者,此模块会根据源数据的数据抽样统计信息,来推测出来构建每个索引需要的构建计算成本,和存储成本。在推测构建计算成本时,此模块会根据源数据的数据抽样统计信息,识别出来每个维度的倾斜率和重复率,从而智 能预测出计算出每个索引需要的cpu和内存资源和构建时长,从而推测出计算资源的使用量和使用时长,在推测存储成本的时候,会根据数据特征推测构建出该索引的存储大小,再根据云上计算和存储成本收集模块提供的单位计算成本和单位存储成本,计算出每个索引的总成本,再给所有的候选索引打上构建成本支出的标签。
智能中枢判断模块,此模块会通知查询历史分析预测模块提供出所有的候选索引以及他们的查询成本收益的情况,并交由构建和存储成本分析预测模块分析出所有的构建成本支出的情况。然后根据总的成本收益,来挑选索引,从而提供出使总成本最低的索引推荐方案。
预计算和查询引擎模块,此模块会根据智能中枢判断模块推荐出来的索引,进行构建预计算索引,预计算模块会拉取原始超大规模的数据集合来进行预聚合,并且将构建出来的索引提供给查询模块,从而加速客户分析SQL的执行效率和减少扫描的数据量,进一步减少查询计算成本。
图3示例性地示出本公开实施例云计算索引推荐系统的结构示意图,如图3所示,所述系统包括:
云上计算和存储成本收集模块31,用于获取当前使用的云计算服务器在单位时间的单位计算成本以及单位存储成本;
查询历史分析预测模块32,用于获取目标用户所有的历史查询语句,提取所述所有的历史查询语句的共有特征,根据所述共有特征确定所述历史查询语句对应的查询索引;
构建和存储成本分析预测模块33,用于根据所述查询索引查询数据库的频率、时间以及所使用的计算资源,确定每个查询索引的查询成本;基于所获取的目标用户的当前查询语句,确定所述当前查询语句对应的多个当前查询索引;
根据所述多个当前查询索引,通过所述单位计算成本以及单位存储成本、以及所述计算资源使用量以及使用时间,确定每个当前查询索引对应的总成本;
智能中枢判断模块34,向所述目标用户推荐目标查询索引,其中,所述目标查询索引包括每个当前查询索引对应的总成本中成本最低的查询索引。
在一种可选的实施方式中,所述构建和存储成本分析预测模块33还用于:
根据所述查询索引查询数据库的频率、所述查询索引查询数据库的时间、所述查询索引查询所用计算资源,以及预先获取的源数据的数据抽样统计信息,确 定每个查询索引的查询成本;
基于预先获取的查询索引计算成本,确定所述查询索引的成本收益,并为所述查询索引添加成本收益标签。
在一种可选的实施方式中,所述构建和存储成本分析预测模块33还用于:
根据预先获取的源数据的数据抽样统计信息,确定所述查询索引在每个维度的倾斜率和重复率;
基于每个维度的倾斜率和重复率,预测所述每个查询索引所需的计算资源、内存资源以及构建时长;
基于所述每个查询索引所需的计算资源、内存资源以及构建时长,以及所述单位计算成本以及单位存储成本,确定每个当前查询索引对应的计算资源使用量以及使用时间。
在一种可选的实施方式中,所述系统还包括成本计算模块,所述成本计算模块用于:
基于所述目标查询索引,构建预计算索引;
基于所述预计算索引,以及预先构建数据集对所述预计算索引进行预聚合;
基于所述进行预聚合后的预计算索引,分析所述目标用户的查询语句对数据库的查询效率以及扫描所述数据库的数据量;
基于所述查询效率以及所述扫描所述数据库的数据量,确定所述目标查询索引的计算成本。
在一种可选的实施方式中,所述系统还包括模型匹配模块,所述模型匹配模块用于:
基于预先获取的多个用户的所有的历史查询语句,构建所述所有的历史查询语句对应的查询计划树;
提取所述查询计划树的查询语句的共有特征,基于所述共有特征,匹配与所述共有特征对应的查询分析模型;
根据所述查询分析模型确定所述历史查询语句对应的查询索引,其中,所述查询索引包括所述查询语句与所述查询索引之间的包含关系。
本公开还提供一种程序产品,该程序产品包括执行指令,该执行指令存储在可读存储介质中。设备的至少一个处理器可以从可读存储介质读取该执行指令,至少一个处理器执行该执行指令使得设备实施上述的各种实施方式提供的方法。
其中,可读存储介质可以是计算机存储介质,也可以是通信介质。通信介质包括便于从一个地方向另一个地方传送计算机程序的任何介质。计算机存储介质可以是通用或专用计算机能够存取的任何可用介质。例如,可读存储介质耦合至处理器,从而使处理器能够从该可读存储介质读取信息,且可向该可读存储介质写入信息。当然,可读存储介质也可以是处理器的组成部分。处理器和可读存储介质可以位于专用集成电路(Application Specific Integrated Circuits,简称:ASIC)中。另外,该ASIC可以位于用户设备中。当然,处理器和可读存储介质也可以作为分立组件存在于通信设备中。可读存储介质可以是只读存储器(ROM)、随机存取存储器(RAM)、CD-ROM、磁带、软盘和光数据存储设备等。
在上述终端或者服务器的实施例中,应理解,处理器可以是中央处理单元(英文:Central Processing Unit,简称:CPU),还可以是其他通用处理器、数字信号处理器(英文:Digital Signal Processor,简称:DSP)、专用集成电路(英文:Application Specific Integrated Circuit,简称:ASIC)等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。结合本公开所公开的方法的步骤可以直接体现为硬件处理器执行完成,或者用处理器中的硬件及软件模块组合执行完成。
最后应说明的是:以上各实施例仅用以说明本公开的技术方案,而非对其限制;尽管参照前述各实施例对本公开进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分或者全部技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本公开各实施例技术方案的范围。
Claims (13)
- 一种云计算索引推荐方法,其特征在于,所述方法包括:获取当前使用的云计算服务器在单位时间的单位计算成本以及单位存储成本;获取目标用户所有的历史查询语句,提取所述所有的历史查询语句的共有特征,根据所述共有特征确定所述历史查询语句对应的查询索引;根据所述查询索引查询数据库的频率、时间以及所使用的计算资源,确定每个查询索引的查询成本;基于所获取的目标用户的当前查询语句,确定所述当前查询语句对应的多个当前查询索引;根据所述多个当前查询索引,通过所述单位计算成本以及单位存储成本、以及所述计算资源使用量以及使用时间,确定每个当前查询索引对应的总成本;向所述目标用户推荐目标查询索引,其中,所述目标查询索引包括每个当前查询索引对应的总成本中成本最低的查询索引。
- 根据权利要求1所述的云计算索引推荐方法,其特征在于,所述根据所述查询索引查询数据库的频率、时间以及所使用的计算资源,确定每个查询索引的查询成本的方法包括:根据所述查询索引查询数据库的频率、所述查询索引查询数据库的时间、所述查询索引查询所用计算资源,以及预先获取的源数据的数据抽样统计信息,确定每个查询索引的查询成本;基于预先获取的查询索引计算成本,确定所述查询索引的成本收益,并为所述查询索引添加成本收益标签。
- 根据权利要求1所述的云计算索引推荐方法,其特征在于,确定每个当前查询索引对应的计算资源使用量以及使用时间的方法包括:根据预先获取的源数据的数据抽样统计信息,确定所述查询索引在每个维度的倾斜率和重复率;基于每个维度的倾斜率和重复率,预测所述每个查询索引所需的计算资源、内存资源以及构建时长;基于所述每个查询索引所需的计算资源、内存资源以及构建时长,以及所述单位计算成本以及单位存储成本,确定每个当前查询索引对应的计算资源使用量 以及使用时间。
- 根据权利要求1所述的云计算索引推荐方法,其特征在于,所述向所述目标用户推荐目标查询索引之后,所述方法还包括:基于所述目标查询索引,构建预计算索引;基于所述预计算索引,以及预先构建数据集对所述预计算索引进行预聚合;基于所述进行预聚合后的预计算索引,分析所述目标用户的查询语句对数据库的查询效率以及扫描所述数据库的数据量;基于所述查询效率以及所述扫描所述数据库的数据量,确定所述目标查询索引的计算成本。
- 根据权利要求1所述的云计算索引推荐方法,其特征在于,所述获取目标用户所有的历史查询语句,提取所述所有的历史查询语句的共有特征,根据所述共有特征确定所述历史查询语句对应的查询索引之前,所述方法还包括:基于预先获取的多个用户的所有的历史查询语句,构建所述所有的历史查询语句对应的查询计划树;提取所述查询计划树的查询语句的共有特征,基于所述共有特征,匹配与所述共有特征对应的查询分析模型;根据所述查询分析模型确定所述历史查询语句对应的查询索引,其中,所述查询索引包括所述查询语句与所述查询索引之间的包含关系。
- 一种云计算索引推荐系统,其特征在于,所述系统包括:云上计算和存储成本收集模块,用于获取当前使用的云计算服务器在单位时间的单位计算成本以及单位存储成本;查询历史分析预测模块,用于获取目标用户所有的历史查询语句,提取所述所有的历史查询语句的共有特征,根据所述共有特征确定所述历史查询语句对应的查询索引;构建和存储成本分析预测模块,用于根据所述查询索引查询数据库的频率、时间以及所使用的计算资源,确定每个查询索引的查询成本;基于所获取的目标用户的当前查询语句,确定所述当前查询语句对应的多个当前查询索引;根据所述多个当前查询索引,通过所述单位计算成本以及单位存储成本、以及所述计算资源使用量以及使用时间,确定每个当前查询索引对应的总成本;智能中枢判断模块,向所述目标用户推荐目标查询索引,其中,所述目标查 询索引包括每个当前查询索引对应的总成本中成本最低的查询索引。
- 根据权利要求6所述的云计算索引推荐系统,其特征在于,所述构建和存储成本分析预测模块还用于:根据所述查询索引查询数据库的频率、所述查询索引查询数据库的时间、所述查询索引查询所用计算资源,以及预先获取的源数据的数据抽样统计信息,确定每个查询索引的查询成本;基于预先获取的查询索引计算成本,确定所述查询索引的成本收益,并为所述查询索引添加成本收益标签。
- 根据权利要求6所述的云计算索引推荐系统,其特征在于,所述构建和存储成本分析预测模块还用于:根据预先获取的源数据的数据抽样统计信息,确定所述查询索引在每个维度的倾斜率和重复率;基于每个维度的倾斜率和重复率,预测所述每个查询索引所需的计算资源、内存资源以及构建时长;基于所述每个查询索引所需的计算资源、内存资源以及构建时长,以及所述单位计算成本以及单位存储成本,确定每个当前查询索引对应的计算资源使用量以及使用时间。
- 根据权利要求6所述的云计算索引推荐系统,其特征在于,所述系统还包括成本计算模块,所述成本计算模块用于:基于所述目标查询索引,构建预计算索引;基于所述预计算索引,以及预先构建数据集对所述预计算索引进行预聚合;基于所述进行预聚合后的预计算索引,分析所述目标用户的查询语句对数据库的查询效率以及扫描所述数据库的数据量;基于所述查询效率以及所述扫描所述数据库的数据量,确定所述目标查询索引的计算成本。
- 根据权利要求6所述的云计算索引推荐系统,其特征在于,所述系统还包括模型匹配模块,所述模型匹配模块用于:基于预先获取的多个用户的所有的历史查询语句,构建所述所有的历史查询语句对应的查询计划树;提取所述查询计划树的查询语句的共有特征,基于所述共有特征,匹配与所 述共有特征对应的查询分析模型;根据所述查询分析模型确定所述历史查询语句对应的查询索引,其中,所述查询索引包括所述查询语句与所述查询索引之间的包含关系。
- 一种云计算索引推荐系统,所述系统包括:第一模块,用于自动收集当前使用的云服务提供商的计算主机类型,计算服务器的单位时间的使用成本,以及单位存储数据量在单位时间内的存储成本;第二模块,用于收集客户所有的历史分析查询语句,从所有的查询计划树中提取出来共有的特征,从而推荐出回答历史分析查询语句的模型;第三模块,用于收到智能中枢判断模块传入的构建的索引候选者,根据源数据的数据抽样统计信息,推测构建每个索引需要的构建计算成本和存储成本;第四模块,用于通知查询第二模块提供出所有的候选索引以及查询成本收益的情况,并交由第三模块分析出所有的构建成本支出的情况;第五模块,用于根据第四模块推荐出来的索引,进行构建预计算索引,拉取原始超大规模的数据集合来进行预聚合,并且构建出来索引。
- 一种电子设备,其特征在于,包括:处理器;用于存储处理器可执行指令的存储器;其中,所述处理器被配置为调用所述存储器存储的指令,以执行权利要求1至5中任意一项所述的方法.
- 一种计算机可读存储介质,其上存储有计算机程序指令,其特征在于,所述计算机程序指令被处理器执行时实现权利要求1至5中任意一项所述的方法。
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US18/021,563 US20230306027A1 (en) | 2021-06-04 | 2022-03-29 | Method and system for recommending indexes by cloud computation |
EP22814829.2A EP4191442A4 (en) | 2021-06-04 | 2022-03-29 | CLOUD CALCULATION INDEX RECOMMENDATION METHOD AND SYSTEM |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110624453.1A CN113407801B (zh) | 2021-06-04 | 2021-06-04 | 云计算索引推荐方法及系统 |
CN202110624453.1 | 2021-06-04 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2022252782A1 true WO2022252782A1 (zh) | 2022-12-08 |
Family
ID=77676457
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2022/083619 WO2022252782A1 (zh) | 2021-06-04 | 2022-03-29 | 云计算索引推荐方法及系统 |
Country Status (4)
Country | Link |
---|---|
US (1) | US20230306027A1 (zh) |
EP (1) | EP4191442A4 (zh) |
CN (1) | CN113407801B (zh) |
WO (1) | WO2022252782A1 (zh) |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113407801B (zh) * | 2021-06-04 | 2023-11-28 | 跬云(上海)信息科技有限公司 | 云计算索引推荐方法及系统 |
CN115114295B (zh) * | 2022-07-07 | 2023-07-14 | 北京奥星贝斯科技有限公司 | 用于确定复合索引的方法和装置 |
CN115146141A (zh) * | 2022-07-18 | 2022-10-04 | 上海跬智信息技术有限公司 | 基于数据特征的索引推荐方法及装置 |
CN116701429B (zh) * | 2023-05-19 | 2023-12-29 | 杭州云之重器科技有限公司 | 一种基于批量历史任务模糊化的公共查询方法 |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090030874A1 (en) * | 2007-07-27 | 2009-01-29 | Oracle International Corporation | Techniques for Optimizing SQL Statements Using User-Defined Indexes with Auxiliary Properties |
CN110362598A (zh) * | 2019-06-27 | 2019-10-22 | 东软集团股份有限公司 | 数据查询的方法、装置、存储介质及电子设备 |
CN111666279A (zh) * | 2020-04-14 | 2020-09-15 | 阿里巴巴集团控股有限公司 | 查询数据处理方法、装置、电子设备及计算机存储介质 |
CN112685540A (zh) * | 2021-01-07 | 2021-04-20 | 深圳市欢太科技有限公司 | 搜索方法、装置、存储介质以及终端 |
CN113407801A (zh) * | 2021-06-04 | 2021-09-17 | 跬云(上海)信息科技有限公司 | 云计算索引推荐方法及系统 |
Family Cites Families (20)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9146981B2 (en) * | 2012-07-06 | 2015-09-29 | International Business Machines Corporation | Automated electronic discovery collections and preservations |
US10664474B1 (en) * | 2013-03-15 | 2020-05-26 | Progress Software Corporation | Query system |
US9858283B2 (en) * | 2015-02-19 | 2018-01-02 | International Business Machines Corporation | Method for en passant workload shift detection |
US20160378822A1 (en) * | 2015-06-26 | 2016-12-29 | Microsoft Technology Licensing, Llc | Automated recommendation and creation of database index |
WO2017165914A1 (en) * | 2016-03-31 | 2017-10-05 | Wisetech Global Limited | "methods and systems for database optimisation" |
US11126623B1 (en) * | 2016-09-28 | 2021-09-21 | Amazon Technologies, Inc. | Index-based replica scale-out |
US10747764B1 (en) * | 2016-09-28 | 2020-08-18 | Amazon Technologies, Inc. | Index-based replica scale-out |
US10922273B1 (en) * | 2017-10-13 | 2021-02-16 | University Of South Florida | Forward-private dynamic searchable symmetric encryption (DSSE) with efficient search |
US11256695B1 (en) * | 2017-11-22 | 2022-02-22 | Amazon Technologies, Inc. | Hybrid query execution engine using transaction and analytical engines |
US11615083B1 (en) * | 2017-11-22 | 2023-03-28 | Amazon Technologies, Inc. | Storage level parallel query processing |
CN108268612B (zh) * | 2017-12-29 | 2021-05-25 | 上海跬智信息技术有限公司 | 一种基于olap预计算模型的预校验方法及预校验系统 |
US10726052B2 (en) * | 2018-07-03 | 2020-07-28 | Sap Se | Path generation and selection tool for database objects |
US11138266B2 (en) * | 2019-02-21 | 2021-10-05 | Microsoft Technology Licensing, Llc | Leveraging query executions to improve index recommendations |
US10423662B1 (en) * | 2019-05-24 | 2019-09-24 | Hydrolix Inc. | Efficient and scalable time-series data storage and retrieval over a network |
US11308100B2 (en) * | 2019-06-25 | 2022-04-19 | Amazon Technologies, Inc. | Dynamically assigning queries to secondary query processing resources |
US11455305B1 (en) * | 2019-06-28 | 2022-09-27 | Amazon Technologies, Inc. | Selecting alternate portions of a query plan for processing partial results generated separate from a query engine |
CN110807041B (zh) * | 2019-11-01 | 2022-05-20 | 广州华多网络科技有限公司 | 索引推荐方法、装置、电子设备及存储介质 |
US11354304B1 (en) * | 2019-11-27 | 2022-06-07 | Amazon Technologies, Inc. | Stored procedures for incremental updates to internal tables for materialized views |
US11366811B2 (en) * | 2020-05-21 | 2022-06-21 | Sap Se | Data imprints techniques for use with data retrieval methods |
US11947537B1 (en) * | 2020-12-01 | 2024-04-02 | Amazon Technologies, Inc. | Automatic index management for a non-relational database |
-
2021
- 2021-06-04 CN CN202110624453.1A patent/CN113407801B/zh active Active
-
2022
- 2022-03-29 US US18/021,563 patent/US20230306027A1/en active Pending
- 2022-03-29 WO PCT/CN2022/083619 patent/WO2022252782A1/zh active Application Filing
- 2022-03-29 EP EP22814829.2A patent/EP4191442A4/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090030874A1 (en) * | 2007-07-27 | 2009-01-29 | Oracle International Corporation | Techniques for Optimizing SQL Statements Using User-Defined Indexes with Auxiliary Properties |
CN110362598A (zh) * | 2019-06-27 | 2019-10-22 | 东软集团股份有限公司 | 数据查询的方法、装置、存储介质及电子设备 |
CN111666279A (zh) * | 2020-04-14 | 2020-09-15 | 阿里巴巴集团控股有限公司 | 查询数据处理方法、装置、电子设备及计算机存储介质 |
CN112685540A (zh) * | 2021-01-07 | 2021-04-20 | 深圳市欢太科技有限公司 | 搜索方法、装置、存储介质以及终端 |
CN113407801A (zh) * | 2021-06-04 | 2021-09-17 | 跬云(上海)信息科技有限公司 | 云计算索引推荐方法及系统 |
Non-Patent Citations (1)
Title |
---|
See also references of EP4191442A4 * |
Also Published As
Publication number | Publication date |
---|---|
EP4191442A4 (en) | 2024-03-13 |
CN113407801B (zh) | 2023-11-28 |
EP4191442A1 (en) | 2023-06-07 |
US20230306027A1 (en) | 2023-09-28 |
CN113407801A (zh) | 2021-09-17 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2022252782A1 (zh) | 云计算索引推荐方法及系统 | |
US10354201B1 (en) | Scalable clustering for mixed machine learning data | |
WO2019153487A1 (zh) | 系统性能的度量方法、装置、存储介质和服务器 | |
US20150379429A1 (en) | Interactive interfaces for machine learning model evaluations | |
WO2021159834A1 (zh) | 异常信息处理节点分析方法、装置、介质及电子设备 | |
WO2021139427A1 (zh) | 大数据指标构建方法、装置、设备及存储介质 | |
CN113918622B (zh) | 基于区块链的信息溯源方法及系统 | |
CN117743371A (zh) | 基于大语言模型的sql语句生成方法、装置、设备及介质 | |
TW201915777A (zh) | 金融非結構化文本分析系統及其方法 | |
CN117971606B (zh) | 基于ElasticSearch的日志管理系统及方法 | |
US11720563B1 (en) | Data storage and retrieval system for a cloud-based, multi-tenant application | |
CN114461644A (zh) | 一种数据采集方法、装置、电子设备及存储介质 | |
CN113297458A (zh) | 一种分页查询方法、装置和设备 | |
CN111209105A (zh) | 扩容处理方法、装置、设备及可读存储介质 | |
CN115481990A (zh) | 一种智能路由系统、方法、计算机设备及可存储介质 | |
CN112732685A (zh) | 金融数据处理方法、装置、计算机设备及存储介质 | |
CN115080607A (zh) | 一种结构化查询语句的优化方法、装置、设备及存储介质 | |
CN112199401B (zh) | 数据请求处理方法、装置、服务器、系统及存储介质 | |
CN111930604B (zh) | 联机交易性能分析方法及装置、电子设备和可读存储介质 | |
CN112907009B (zh) | 标准化模型的构建方法、装置、存储介质及设备 | |
CN113849618A (zh) | 基于知识图谱的策略确定方法、装置、电子设备及介质 | |
CN114298825A (zh) | 还款积极度评估方法及装置 | |
CN113901046A (zh) | 虚拟维度表构建方法及装置 | |
CN116843203B (zh) | 服务接入处理方法、装置、设备、介质及产品 | |
CN113672800B (zh) | 实名认证自然人用户的事项推荐方法及存储介质 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 22814829 Country of ref document: EP Kind code of ref document: A1 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2022814829 Country of ref document: EP |
|
ENP | Entry into the national phase |
Ref document number: 2022814829 Country of ref document: EP Effective date: 20230303 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |