WO2024108638A1 - 一种基于分片索引的自适应查询方法和装置 - Google Patents

一种基于分片索引的自适应查询方法和装置 Download PDF

Info

Publication number
WO2024108638A1
WO2024108638A1 PCT/CN2022/135893 CN2022135893W WO2024108638A1 WO 2024108638 A1 WO2024108638 A1 WO 2024108638A1 CN 2022135893 W CN2022135893 W CN 2022135893W WO 2024108638 A1 WO2024108638 A1 WO 2024108638A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
index
partitions
partition
data partitions
Prior art date
Application number
PCT/CN2022/135893
Other languages
English (en)
French (fr)
Inventor
陈志标
徐晓锋
化明虎
Original Assignee
深圳计算科学研究院
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳计算科学研究院 filed Critical 深圳计算科学研究院
Publication of WO2024108638A1 publication Critical patent/WO2024108638A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • the present application relates to the field of data query processing, and in particular to an adaptive query method and device based on shard indexing.
  • indexes cannot be used during the maintenance or creation process; and data is rarely updated and maintained, so materialized views cannot be used during the maintenance process; traditional databases cannot use indexes during the index creation process, and index creation is usually a very time-consuming operation; for real-time/quasi-real-time data warehouse systems with increasingly urgent application requirements, data loading and access need to be supported simultaneously. If data acceleration structures such as indexes cannot be used during the loading process, the query efficiency will be greatly reduced.
  • the present application is proposed to provide an adaptive query method and device based on shard indexes to overcome the above problems or at least partially solve the above problems, including:
  • An adaptive query method based on shard index is used to dynamically select indexes in a database for data query based on a preset structured query language statement.
  • the method comprises:
  • Data query is performed in the target data partition according to the execution plan.
  • the step of generating a plurality of data partitions according to the data table in the database includes:
  • a plurality of data partitions are generated according to the partition list.
  • the step of constructing indexes corresponding to the plurality of data partitions through the plurality of data partitions includes:
  • the corresponding index is constructed for each piece of data in all the data.
  • the step of determining a plurality of index status information corresponding to a plurality of the data partitions according to the index comprises:
  • a plurality of index status information corresponding to a plurality of the data partitions is generated according to the index partition status table.
  • the step of determining the target data partition to be accessed based on the preset structured query language statement, the plurality of index status information and the plurality of data partitions, wherein the index status information includes the steps of created, being created and not created comprises:
  • the target data partition is determined in all the data partitions by using the preset structured query language statement.
  • step of determining the target data partition in all the data partitions by using the preset structured query language statement includes:
  • the target data partition is determined among all the data partitions according to the characteristic information.
  • step of performing data query in the target data partition according to the execution plan includes:
  • Data query is performed according to the access path.
  • the embodiment of the present invention further provides an adaptive query optimization device based on shard index, which is used to dynamically select indexes in a database for data query based on a preset structured query language statement, and the device includes:
  • a construction module used to generate a plurality of data partitions according to a data table in the database, and to construct indexes corresponding to the plurality of data partitions through the plurality of data partitions;
  • a first determining module configured to determine a plurality of index status information corresponding to a plurality of the data partitions according to the index
  • a second determination module is used to determine the target data partition to be accessed according to the preset structured query language statement, a plurality of the index status information and a plurality of the data partitions, wherein the index status information includes created, being created and not created;
  • a generation module used to generate an execution plan according to the preset structured query language statement and the target data partition
  • a query module is used to perform data query in the target data partition according to the execution plan.
  • An embodiment of the present invention also provides a device, including a processor, a memory, and a computer program stored in the memory and capable of running on the processor, wherein when the computer program is executed by the processor, the steps of an adaptive query method based on shard index as described above are implemented.
  • An embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored.
  • the computer program is executed by a processor, the steps of the above-mentioned adaptive query method based on shard index are implemented.
  • the present application provides a solution for determining a plurality of index status information corresponding to a plurality of the data partitions based on the indexes, determining a target data partition to be accessed based on the preset structured query language statement, the plurality of the index status information and the plurality of the data partitions, and generating an execution plan based on the preset structured query language statement and the target data partitions, specifically: generating a plurality of data partitions based on a data table in the database, and constructing indexes corresponding to the plurality of the data partitions through the plurality of the data partitions; determining a plurality of index status information corresponding to the plurality of the data partitions based on the indexes; determining a target data partition to be accessed based on the preset structured query language statement, the plurality of the index status information and the plurality of the data partitions, where
  • the execution plan solves the problem that data acceleration structures such as indexes cannot be used during the loading process.
  • the shard index technology records the index maintenance status of each data partition of the table while maintaining the index. For the data partitions whose indexes have been maintained, the index maintenance status is set to completed, and the maintenance status is set to unmaintained.
  • the dynamic scanning partition method is adopted to implement index and table partition scanning at the same time.
  • the index capacity is utilized to the greatest extent without affecting the progress of data loading index maintenance tasks; improve the query access efficiency when data is loaded and used simultaneously in real-time/quasi-real-time data warehouse systems; during the index creation process, the sharding index acceleration function is used to increase the speed of ordinary queries by several to tens of thousands of times; the index can be used during the index creation process without waiting for the index to be created, thereby maximizing the query speed; index multiple data partitions are indexed in sequence, and access paths are selected based on the index status information of the data partitions; use index queries to retrieve data for partitions that have completed index creation; do not use index retrieval for partitions that are being created or have not yet created an index; and dynamically select index queries for partitions.
  • FIG1 is a flowchart of a method for adaptive querying based on shard indexing according to an embodiment of the present application
  • FIG2 is a structural block diagram of an adaptive query optimization device based on shard indexes provided in an embodiment of the present application
  • FIG. 3 is a schematic diagram of the structure of a computer device provided by an embodiment of the present invention.
  • Index creation and maintenance are performed as an atomic operation. During the creation and maintenance process, the index cannot be used. In the traditional solution, the index does not exist during this process, and queries can only use non-indexed access paths.
  • Materialized view The acceleration method of materialized view also relies on the update of materialized view, which is generally used in scenarios that do not strictly require real-time data query. Materialized view is also unavailable during maintenance.
  • the inventor provides an adaptive query method based on sharded index.
  • the sharded index technology records the index maintenance status of each partition of the table while maintaining the index.
  • the index maintenance status is set to completed, maintained or not yet maintained.
  • a dynamic scanning partition method is adopted. By simultaneously realizing index and table partition scanning, the index capacity is utilized to the greatest extent for queries without affecting the progress of data loading index maintenance tasks.
  • An adaptive query method based on shard index is used to dynamically select indexes in a database for data query based on a preset structured query language statement.
  • the method comprises:
  • S150 Perform data query in the target data partition according to the execution plan.
  • the present application provides a solution for determining a plurality of index status information corresponding to a plurality of the data partitions based on the index, determining a target data partition to be accessed based on the preset structured query language statement, the plurality of the index status information and the plurality of the data partitions, and generating an execution plan based on the preset structured query language statement and the target data partition, specifically: generating a plurality of data partitions based on a data table in the database, and constructing indexes corresponding to the plurality of the data partitions through the plurality of the data partitions; determining a plurality of index status information corresponding to the plurality of the data partitions based on the index; determining a target data partition to be accessed based on the preset structured query language statement, the plurality of the index status information and the plurality of the data partitions, wherein the index status information
  • the execution plan solves the problem that data acceleration structures such as indexes cannot be used during the loading process.
  • the shard index technology records the index maintenance status of each data partition of the table while maintaining the index. For the data partitions whose indexes have been maintained, the index maintenance status is set to completed, and the maintenance status is set to unmaintained.
  • the dynamic scanning partition method is adopted to implement index and table partition scanning at the same time.
  • the index capacity is utilized to the greatest extent without affecting the progress of data loading index maintenance tasks; improve the query access efficiency when data is loaded and used simultaneously in real-time/quasi-real-time data warehouse systems; during the index creation process, the sharding index acceleration function is used to increase the speed of ordinary queries by several to tens of thousands of times; the index can be used during the index creation process without waiting for the index to be created, thereby maximizing the query speed; index multiple data partitions are indexed in sequence, and access paths are selected based on the index status information of the data partitions; use index queries to retrieve data for partitions that have completed index creation; do not use index retrieval for partitions that are being created or have not yet created an index; and dynamically select index queries for partitions.
  • a plurality of data partitions are generated based on a data table in the database; and indexes corresponding to the plurality of data partitions are constructed through the plurality of data partitions; wherein a specific index is created for the data table, and index information of these partitions is constructed respectively through the plurality of data partitions.
  • a plurality of index status information corresponding to a plurality of the data partitions are determined based on the index; wherein, during the construction of the index information of the partition, the index status information of the partition for the index is maintained.
  • the target data partition to be accessed is determined according to the preset structured query language statement, a number of the index status information and a number of the data partitions.
  • an execution plan is generated according to the preset structured query language statement and the target data partition, that is, an index access path is created for the data partition corresponding to the created index through the execution plan.
  • data query is performed in the target data partition according to the execution plan, that is, index access is performed on the data partition for which the index has been created.
  • a plurality of data partitions are generated according to the data table in the database, and indexes corresponding to the plurality of data partitions are constructed through the plurality of data partitions.
  • step S110 the specific process of "generating a plurality of data partitions according to the data table in the database, and constructing indexes corresponding to the plurality of data partitions through the plurality of data partitions" in step S110 can be further explained in combination with the following description.
  • Indexing is a fast data retrieval method that can directly locate specific data based on a specified value or a specified value range without scanning and screening the data.
  • a partition list is obtained through a data table; partitioning is performed according to the partition list to generate a plurality of data partitions.
  • the partition list in the data table obtain the partition information of each partition, select one partition or multiple partitions, that is, several data partitions (in parallel), and read one or more storage location information under several data partitions.
  • a sales information table that is, a data table
  • a sales information table can establish 12 data partitions according to months on a single-node database, or can establish 12 data partitions according to months in a multi-node database cluster.
  • a plurality of data partitions are generated according to the data table in the database, and indexes corresponding to the plurality of data partitions are constructed through the plurality of data partitions.
  • step S110 the specific process of "generating a plurality of data partitions according to the data table in the database, and constructing indexes corresponding to the plurality of data partitions through the plurality of data partitions" described in step S110 can be further explained in combination with the following description.
  • the partition information corresponding to the data partition will be stored in a linked list or array, and will be traversed one by one.
  • the traversal process will record the current traversal index value. If the traversal index value exceeds the number of elements in the linked list or array, then the data partition represented by the index is an invalid partition; if the traversal and loading have been completed, all the data of each data partition in the several data partitions will be obtained.
  • the current data partition is valid. If the current data partition is valid, all current data corresponding to the current data partition is obtained in the current data partition, and a corresponding index is established for each piece of data in all the current data.
  • data and data types are defined by the user in the data table. Similar to an Excel spreadsheet, the user defines the column names, writes data into each row, and the index creation process is to read the data row by row from the beginning.
  • a plurality of index status information corresponding to a plurality of the data partitions is determined according to the index.
  • step S120 the specific process of "determining a plurality of index status information corresponding to a plurality of the data partitions according to the index" in step S120 can be further explained in combination with the following description.
  • S1202 Generate a plurality of index status information corresponding to a plurality of data partitions according to the index partition status table.
  • the index status information corresponding to the certain data partition i.e., the maintenance status
  • the maintenance status is set to completed maintenance, i.e., created
  • the index status information corresponding to the certain data partition i.e., the maintenance status
  • the index status information corresponding to the certain data partition is updated in the index partition status table, and the maintenance status is set to undergoing maintenance, i.e., creating
  • the index status information corresponding to the certain data partition i.e., the maintenance status
  • the maintenance status is set to not yet maintained, i.e., not created.
  • each index in the index set has a corresponding index status information, and the specific index status information of the index can be obtained through the index partition status table; in the subsequent data query process, the index status information is used to determine whether the index can be used.
  • other methods are used to query data. In a specific embodiment, other methods include table scan queries.
  • the target data partition to be accessed is determined based on the preset structured query language statement, the plurality of index status information and the plurality of data partitions, wherein the index status information includes created, creating and not created.
  • step S130 the specific process of "determining the target data partition to be accessed based on the preset structured query language statement, the plurality of index status information and the plurality of data partitions, wherein the index status information includes created, being created and not created" in step S130 can be further explained in combination with the following description.
  • S1302 Determine the target data partition in all the data partitions by using the preset structured query language statement.
  • index status information mentioned in the present application includes but is not limited to created, creating and not created, and may also include other information to indicate the index status information according to specific embodiments.
  • the index status information is obtained by filtering all the data partitions that have been created in several data partitions, and then a preset structured query language statement is used to determine the target data partition in all the data partitions obtained through the filtering.
  • the target data partition is determined in all the data partitions by using the preset structured query language statement.
  • step S1302 the specific process of "determining the target data partition by using the preset structured query language statement in all the data partitions" in step S1302 can be further explained in combination with the following description.
  • semantic analysis is performed on the text features, i.e., feature information, and analysis information is obtained through the semantic analysis. Then, a target data partition is determined from all data partitions based on the analysis information.
  • a database is connected and a preset structured query language statement is input; the database receives the preset structured query language statement and determines whether there is a query plan cache for the preset structured query language statement (it is a standard practice for all databases to establish a cache for the preset structured query language statement. Generally, an ID value is calculated from the preset structured query language statement text, and the ID value is used to retrieve whether a syntax tree has been generated. If there is a ready-made syntax tree, there is no need to repeatedly parse the preset structured query language statement, thereby improving query performance).
  • step S150 data query is performed in the target data partition according to the execution plan.
  • step S150 the specific process of "querying data in the target data partition according to the execution plan" in step S150 may be further explained in combination with the following description.
  • the preset structured query language statement is started to be executed.
  • the access path is obtained through the execution plan in the target data partition, and data query is performed through the access path; after the preset structured query language statement query is completed, the execution is completed.
  • the preset structured query language statement is used to use the index query in the first data partition; when a query is required in the remaining data partitions, a table scan query is performed; for multiple data partitions for which indexes have been built, the corresponding target data partition is determined according to the preset structured query language statement, and the index query is used in the target data partition.
  • a table scan query is performed.
  • data in a sales data table, data can be divided into 12 data partitions according to the month.
  • the index of the January data partition is created first, and a status mark indicating that the current partition has been indexed is maintained after the index is created, i.e., the index status information.
  • the index status information can be used to access the data for the created index; when a user queries the data for February, a table scan query will be used to access the data because no index has been created; when a user queries the data for January and February at the same time, an index query is performed on the data for January, and a table scan query is performed on the data for February according to the query logic and the index maintenance information, i.e., the index status information.
  • the executor and optimizer can perceive that index creation has been completed for a specific partition, so that the preset structured query language statement uses the index to access data during execution.
  • the description is relatively simple, and the relevant parts can be referred to the partial description of the method embodiment.
  • FIG. 2 a block diagram of a structure of an adaptive query optimization device based on shard indexes provided by an embodiment of the present application is shown;
  • An adaptive query optimization device based on shard index is used to dynamically select indexes in a database for data query based on preset structured query language statements, and the device includes:
  • a construction module 210 configured to generate a plurality of data partitions according to a data table in the database, and to construct indexes corresponding to the plurality of data partitions through the plurality of data partitions;
  • a first determining module 220 configured to determine a plurality of index status information corresponding to a plurality of the data partitions according to the index
  • a second determination module 230 is used to determine a target data partition to be accessed according to the preset structured query language statement, a plurality of the index status information and a plurality of the data partitions, wherein the index status information includes created, being created and not created;
  • a generating module 240 configured to generate an execution plan according to the preset structured query language statement and the target data partition;
  • the query module 250 is used to perform data query in the target data partition according to the execution plan.
  • the construction module 210 includes:
  • a first acquisition submodule is used to acquire a partition list in the data table
  • the first generating submodule is used to generate a plurality of the data partitions according to the partition list.
  • the construction module 210 includes:
  • a first determination submodule used to determine whether a plurality of the data partitions are valid
  • a second acquisition submodule is used to acquire all data in the data partitions when the data partitions are all valid
  • the first construction submodule is used to construct the corresponding index for each piece of data in all the data.
  • the first determining module 220 includes:
  • a second determining submodule configured to determine an index partition status table corresponding to the index
  • the second generating submodule is used to generate a plurality of index status information corresponding to a plurality of data partitions according to the index partition status table.
  • the second determining module 230 includes:
  • a third determining submodule is used to determine, based on the plurality of index status information, in the plurality of data partitions, that the index status information is all the created data partitions;
  • the fourth determination submodule is used to determine the target data partition among all the data partitions by using the preset structured query language statement.
  • the fourth determining submodule includes:
  • a first determining unit configured to determine whether a query plan for the preset structured query language statement exists in the database
  • a first generating unit configured to parse the preset structured query language statement to generate feature information when the preset structured query language statement does not have the query plan in the database;
  • a second determining unit is configured to determine the target data partition among all the data partitions according to the characteristic information.
  • the query module 250 includes:
  • a fifth determination submodule configured to determine an access path according to the execution plan and the target data partition
  • the query submodule is used to perform data query according to the access path.
  • a computer device of an adaptive query method based on shard index of the present invention is shown, which may specifically include the following:
  • the computer device 12 is in the form of a general-purpose computing device, and the components of the computer device 12 may include but are not limited to: one or more processors or processing units 16, a system memory 28, and a bus 18 connecting different system components (including the system memory 28 and the processing unit 16).
  • the bus 18 represents one or more of several types of bus 18 structures, including a memory bus 18 or memory controller, a peripheral bus 18, an accelerated graphics port, a processor or a local bus 18 using any of a variety of bus 18 structures.
  • These architectures include, by way of example, but are not limited to, an Industry Standard Architecture (ISA) bus 18, a Micro Channel Architecture (MAC) bus 18, an Enhanced ISA bus 18, an Audio Video Electronics Standards Association (VESA) local bus 18, and a Peripheral Component Interconnect (PCI) bus 18.
  • ISA Industry Standard Architecture
  • MAC Micro Channel Architecture
  • VESA Audio Video Electronics Standards Association
  • PCI Peripheral Component Interconnect
  • the computer device 12 typically includes a variety of computer system readable media. These media can be any available media that can be accessed by the computer device 12, including volatile and non-volatile media, removable and non-removable media.
  • the system memory 28 may include computer system readable media in the form of volatile memory, such as random access memory (RAM) 30 and/or cache memory 32.
  • the computer device 12 may further include other removable/non-removable, volatile/non-volatile computer system storage media.
  • the storage system 34 may be used to read and write non-removable, non-volatile magnetic media (commonly referred to as "hard drives").
  • a disk drive for reading and writing removable non-volatile disks such as "floppy disks”
  • an optical disk drive for reading and writing removable non-volatile optical disks (such as CD-ROMs, DVD-ROMs or other optical media) may be provided.
  • each drive may be connected to the bus 18 via one or more data medium interfaces.
  • the memory may include at least one program product having a set (e.g., at least one) of program modules 42 that are configured to perform the functions of various embodiments of the present invention.
  • a program/utility 40 having a set (at least one) of program modules 42 may be stored in, for example, a memory, such program modules 42 including, but not limited to, an operating system, one or more application programs, other program modules 42, and program data, each of which or some combination may include an implementation of a network environment.
  • the program modules 42 generally perform the functions and/or methods of the embodiments described herein.
  • the computer device 12 may also communicate with one or more external devices 14 (e.g., keyboards, pointing devices, displays 24, cameras, etc.), may communicate with one or more devices that enable an operator to interact with the computer device 12, and/or may communicate with any device that enables the computer device 12 to communicate with one or more other computing devices (e.g., network cards, modems, etc.). Such communication may be performed via an input/output (I/O) interface 22.
  • the computer device 12 may also communicate with one or more networks (e.g., local area networks (LANs)), wide area networks (WANs), and/or public networks (e.g., the Internet) via a network adapter 20. As shown, the network adapter 20 communicates with other modules of the computer device 12 via a bus 18.
  • LANs local area networks
  • WANs wide area networks
  • public networks e.g., the Internet
  • the processing unit 16 executes various functional applications and data processing by running programs stored in the system memory 28, such as implementing an adaptive query method based on shard indexing provided in an embodiment of the present invention.
  • the processing unit 16 when the processing unit 16 executes the program, it realizes: generating a plurality of data partitions according to the data table in the database, and constructing indexes corresponding to the plurality of data partitions through the plurality of data partitions; determining a plurality of index status information corresponding to the plurality of data partitions according to the indexes; determining a target data partition to be accessed according to the preset structured query language statement, the plurality of index status information and the plurality of data partitions, wherein the index status information includes created, being created and not created; generating an execution plan according to the preset structured query language statement and the target data partition; and performing data query in the target data partition according to the execution plan.
  • the present invention further provides a computer-readable storage medium on which a computer program is stored.
  • a computer program is stored.
  • an adaptive query method based on a shard index as provided in all embodiments of the present application is implemented:
  • the program when executed by the processor, it is implemented as follows: generating a plurality of data partitions according to the data table in the database, and constructing indexes corresponding to the plurality of data partitions through the plurality of data partitions; determining a plurality of index status information corresponding to the plurality of data partitions according to the indexes; determining a target data partition to be accessed according to the preset structured query language statement, the plurality of index status information and the plurality of data partitions, wherein the index status information includes created, being created and not created; generating an execution plan according to the preset structured query language statement and the target data partition; and performing data query in the target data partition according to the execution plan.
  • a computer-readable medium may be a computer-readable signal medium or a computer-readable storage medium.
  • a computer-readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared or semiconductor system, device or device, or any combination of the above.
  • a computer-readable storage medium may be any tangible medium containing or storing a program that may be used by or in conjunction with an instruction execution system, device or device.
  • Computer-readable signal media may include a data signal propagated in baseband or as part of a carrier wave, which carries a computer-readable program code. Such propagated data signals may take a variety of forms, including, but not limited to, electromagnetic signals, optical signals, or any suitable combination of the above. Computer-readable signal media may also be any computer-readable medium other than a computer-readable storage medium, which may send, propagate, or transmit a program for use by or in conjunction with an instruction execution system, apparatus, or device.
  • the computer program code for performing the operation of the present invention can be written in one or more programming languages or a combination thereof, including object-oriented programming languages such as Java, Smalltalk, C++, and conventional procedural programming languages such as "C" language or similar programming languages.
  • the program code can be executed entirely on the operator's computer, partially on the operator's computer, as an independent software package, partially on the operator's computer and partially on a remote computer, or entirely on a remote computer or server.
  • the remote computer can be connected to the operator's computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or can be connected to an external computer (for example, using an Internet service provider to connect through the Internet).
  • LAN local area network
  • WAN wide area network
  • Internet service provider for example, using an Internet service provider to connect through the Internet.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本申请提供了一种基于分片索引的自适应查询方法和装置,用于预设结构化查询语言语句在数据库中动态选择索引进行数据查询,所述方法包括:依据所述数据库内的数据表生成若干个数据分区,并通过若干个所述数据分区构建对应于若干个所述数据分区的索引;依据所述索引确定对应于若干个所述数据分区的若干个索引状态信息;依据所述预设结构化查询语言语句、若干个所述索引状态信息和若干个所述数据分区确定访问的目标数据分区;依据所述预设结构化查询语言语句和所述目标数据分区生成执行计划;依据所述执行计划在所述目标数据分区中进行数据查询。提高实时/准实时数据仓库系统中数据加载和使用同时进行时的查询访问效率;最大程度提升查询速度。

Description

一种基于分片索引的自适应查询方法和装置 技术领域
本申请涉及数据查询处理领域,特别是一种基于分片索引的自适应查询方法和装置。
背景技术
传统型数据库/数据仓库中,数据大多采用非实时一次性ETL(Extract-Transform-Load,数据仓库技术)的数据加载方式,在加载过程中,In-Flight数据是不能访问的,要完全加载准备完毕后才可以访问,这包括后续查询会使用的索引。
现有技术方案包括传统索引方案、基于历史表的技术以及物化视图。
在现有技术中,维护或创建过程时,索引无法被使用;且数据较少更新和维护,物化视图在维护过程不可使用;传统数据库在索引创建过程中是无法使用索引的,通常索引创建是非常耗时的操作;对于应用要求越来越迫切的实时/准实时数据仓库系统,数据的加载和访问需要同时支持,如果加载过程中,索引等数据加速结构不可以使用,会极大的降低查询的效率。
发明内容
鉴于所述问题,提出了本申请以便提供克服所述问题或者至少部分地解决所述问题的一种基于分片索引的自适应查询方法和装置,包括:
一种基于分片索引的自适应查询方法,用于预设结构化查询语言语句在数据库中动态选择索引进行数据查询,所述方法包括:
依据所述数据库内的数据表生成若干个数据分区,并通过若干个所述数据分区构建对应于若干个所述数据分区的索引;
依据所述索引确定对应于若干个所述数据分区的若干个索引状态信息;
依据所述预设结构化查询语言语句、若干个所述索引状态信息和若干个所述数据分区确定访问的目标数据分区,其中,所述索引状态信息包括已创建、正在创建和未创建;
依据所述预设结构化查询语言语句和所述目标数据分区生成执行计划;
依据所述执行计划在所述目标数据分区中进行数据查询。
进一步地,所述依据所述数据库内的数据表生成若干个数据分区的步骤,包括:
在所述数据表中获取分区列表;
依据所述分区列表生成若干个所述数据分区。
进一步地,所述通过若干个所述数据分区构建对应于若干个所述数据分区的索引的步骤,包括:
确定若干个所述数据分区是否有效;
当若干个所述数据分区均有效时,则获取若干个所述数据分区内的所有数据;
对所述所有数据中的每一条数据构建对应的所述索引。
进一步地,所述依据所述索引确定对应于若干个所述数据分区的若干个索引状态信息的步骤,包括:
确定对应于所述索引的索引分区状态表;
依据所述索引分区状态表生成对应于若干个所述数据分区的若干个所述索引状态信息。
进一步地,所述依据所述预设结构化查询语言语句、若干个所述索引状态信息和若干个所述数据分区确定访问的目标数据分区,其中,所述索引状态信息包括已创建、正在创建和未创建的步骤,包括:
依据若干个所述索引状态信息在若干个所述数据分区确定所述索引状态信息为所述已创建的所有数据分区;
在所述所有数据分区中通过所述预设结构化查询语言语句确定所述目标数据分区。
进一步地,所述在所述所有数据分区中通过所述预设结构化查询语言语句确定所述目标数据分区的步骤,包括:
确定所述预设结构化查询语言语句在所述数据库是否存在查询计划;
当所述预设结构化查询语言语句在所述数据库不存在所述查询计划时, 则对所述预设结构化查询语言语句进行解析生成特征信息;
在所述所有数据分区中依据所述特征信息确定所述目标数据分区。
进一步地,所述依据所述执行计划在所述目标数据分区中进行数据查询的步骤,包括:
依据所述执行计划和所述目标数据分区确定访问路径;
依据所述访问路径进行数据查询。
本发明实施例还提供了一种基于分片索引的自适应查询优化装置,用于预设结构化查询语言语句在数据库中动态选择索引进行数据查询,所述装置包括:
构建模块,用于依据所述数据库内的数据表生成若干个数据分区,并通过若干个所述数据分区构建对应于若干个所述数据分区的索引;
第一确定模块,用于依据所述索引确定对应于若干个所述数据分区的若干个索引状态信息;
第二确定模块,用于依据所述预设结构化查询语言语句、若干个所述索引状态信息和若干个所述数据分区确定访问的目标数据分区,其中,所述索引状态信息包括已创建、正在创建和未创建;
生成模块,用于依据所述预设结构化查询语言语句和所述目标数据分区生成执行计划;
查询模块,用于依据所述执行计划在所述目标数据分区中进行数据查询。
本发明实施例还提供了一种设备,包括处理器、存储器及存储在所述存储器上并能够在所述处理器上运行的计算机程序,所述计算机程序被所述处理器执行时实现如上所述的一种基于分片索引的自适应查询方法的步骤。
本发明实施例还提供了一种计算机可读存储介质,所述计算机可读存储介质上存储计算机程序,所述计算机程序被处理器执行时实现如上所述的一种基于分片索引的自适应查询方法的步骤。
本申请具有以下优点:
在本申请的实施例中,相对于现有技术中的加载过程中,索引等数据加 速结构不可以使用的问题,本申请提供了依据所述索引确定对应于若干个所述数据分区的若干个索引状态信息,依据所述预设结构化查询语言语句、若干个所述索引状态信息和若干个所述数据分区确定访问的目标数据分区,依据所述预设结构化查询语言语句和所述目标数据分区生成执行计划的解决方案,具体为:依据所述数据库内的数据表生成若干个数据分区,并通过若干个所述数据分区构建对应于若干个所述数据分区的索引;依据所述索引确定对应于若干个所述数据分区的若干个索引状态信息;依据所述预设结构化查询语言语句、若干个所述索引状态信息和若干个所述数据分区确定访问的目标数据分区,其中,所述索引状态信息包括已创建、正在创建和未创建;依据所述预设结构化查询语言语句和所述目标数据分区生成执行计划;依据所述执行计划在所述目标数据分区中进行数据查询。通过执行计划解决了加载过程中,索引等数据加速结构不可以使用的问题达到了分片索引技术会在维护索引的同时,记录表的每个数据分区上索引维护状态,对于已经维护完毕索引的数据分区,状态上设置索引维护状态为完成,进行维护和尚未维护的状态,扫描数据时对于正在维护或者尚未维护索引的数据分区,采用动态扫描分区的方式,通过同时实现索引和表分区扫描;
对于查询,最大程度上利用索引的能力,同时又不影响数据加载索引维护任务的进行;提高实时/准实时数据仓库系统中,数据加载和使用同时进行时的查询访问效率;在索引创建过程中,使用分片索引加速功能将普通查询速度提升几倍到几万倍不等;创建索引的过程中可以使用索引,不需要等待索引创建完毕,最大程度提升查询速度;依次对多个数据分区建立索引,根据数据分区的索引状态信息选择访问路径;对已完成索引创建的分区使用索引查询检索数据;对创建中或未创建索引的分区不使用索引检索;也可动态对分区选择索引查询。
附图说明
为了更清楚地说明本申请的技术方案,下面将对本申请的描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提 下,还可以根据这些附图获得其他的附图。
图1是本申请一实施例提供的一种基于分片索引的自适应查询方法的步骤流程图;
图2是本申请一实施例提供的一种基于分片索引的自适应查询优化装置的结构框图;
图3是本发明一实施例提供的一种计算机设备的结构示意图。
具体实施方式
为使本申请的所述目的、特征和优点能够更加明显易懂,下面结合附图和具体实施方式对本申请作进一步详细的说明。显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
发明人通过分析现有技术发现:
传统索引方案:索引创建和维护,做为一个原子操作进行,在创建和维护过程中,索引是无法被使用的,传统的方案,在这个过程中相当于没有索引的存在,查询只能采用无索引的访问路径。
基于历史表的技术:传统数据仓库系统中,采用历史表的方案,对于已经准备好数据的历史表,数据较少更新与维护,相对热的短时数据,放在不同的表中,通过业务层结构化查询语言语句联合的方式,将同时查询热数据分区与历史数据分区。来实现历史数据的查询,依赖业务层的改造来实现。
物化视图:物化视图的加速手段同样依赖物化视图的更新,一般针对不严格要求数据实时查询的场景。物化视图在维护期间,同样也是不可使用的状态。
发明人针对现有技术存在的问题提供了一种基于分片索引的自适应查询方法,其相较于传统索引,分片索引技术会在维护索引的同时,记录表的每个分区上索引维护状态,对于已经维护完毕索引的分区,状态上设置索引维护状态为完成,进行维护和尚未维护的状态,扫描数据时对于正在维护或者尚未维护索引的分区,采用动态扫描分区的方式,通过同时实现索引和表 分区扫描,对于查询,最大程度上利用索引的能力,同时又不影响数据加载索引维护任务的进行。
具体如下:
参照图1,示出了本申请一实施例提供的一种基于分片索引的自适应查询方法的步骤流程图;
一种基于分片索引的自适应查询方法,用于预设结构化查询语言语句在数据库中动态选择索引进行数据查询,所述方法包括:
S110、依据所述数据库内的数据表生成若干个数据分区,并通过若干个所述数据分区构建对应于若干个所述数据分区的索引;
S120、依据所述索引确定对应于若干个所述数据分区的若干个索引状态信息;
S130、依据所述预设结构化查询语言语句、若干个所述索引状态信息和若干个所述数据分区确定访问的目标数据分区,其中,所述索引状态信息包括已创建、正在创建和未创建;
S140、依据所述预设结构化查询语言语句和所述目标数据分区生成执行计划;
S150、依据所述执行计划在所述目标数据分区中进行数据查询。
在本申请的实施例中,相对于现有技术中的加载过程中,索引等数据加速结构不可以使用的问题,本申请提供了依据所述索引确定对应于若干个所述数据分区的若干个索引状态信息,依据所述预设结构化查询语言语句、若干个所述索引状态信息和若干个所述数据分区确定访问的目标数据分区,依据所述预设结构化查询语言语句和所述目标数据分区生成执行计划的解决方案,具体为:依据所述数据库内的数据表生成若干个数据分区,并通过若干个所述数据分区构建对应于若干个所述数据分区的索引;依据所述索引确定对应于若干个所述数据分区的若干个索引状态信息;依据所述预设结构化查询语言语句、若干个所述索引状态信息和若干个所述数据分区确定访问的目标数据分区,其中,所述索引状态信息包括已创建、正在创建和未创建;依据所述预设结构化查询语言语句和所述目标数据分区生成执行计划;依据 所述执行计划在所述目标数据分区中进行数据查询。通过执行计划解决了加载过程中,索引等数据加速结构不可以使用的问题达到了分片索引技术会在维护索引的同时,记录表的每个数据分区上索引维护状态,对于已经维护完毕索引的数据分区,状态上设置索引维护状态为完成,进行维护和尚未维护的状态,扫描数据时对于正在维护或者尚未维护索引的数据分区,采用动态扫描分区的方式,通过同时实现索引和表分区扫描;
对于查询,最大程度上利用索引的能力,同时又不影响数据加载索引维护任务的进行;提高实时/准实时数据仓库系统中,数据加载和使用同时进行时的查询访问效率;在索引创建过程中,使用分片索引加速功能将普通查询速度提升几倍到几万倍不等;创建索引的过程中可以使用索引,不需要等待索引创建完毕,最大程度提升查询速度;依次对多个数据分区建立索引,根据数据分区的索引状态信息选择访问路径;对已完成索引创建的分区使用索引查询检索数据;对创建中或未创建索引的分区不使用索引检索;也可动态对分区选择索引查询。
下面,将对本示例性实施例中一种基于分片索引的自适应查询方法作进一步地说明。
在本发明一实施例中,依据所述数据库内的数据表生成若干个数据分区;并通过若干个所述数据分区构建对应于若干个所述数据分区的索引;其中,对数据表创建特定索引,以及通过若干个所述数据分区,分别构建这些分区的索引信息。
在本发明一实施例中,依据所述索引确定对应于若干个所述数据分区的若干个索引状态信息;其中,分区的索引信息构建过程中,维护该分区对于该索引的索引状态信息。
在本发明一实施例中,依据所述预设结构化查询语言语句、若干个所述索引状态信息和若干个所述数据分区确定访问的目标数据分区。
在本发明一实施例中,依据所述预设结构化查询语言语句和所述目标数据分区生成执行计划,即通过执行计划对已创建索引对应的数据分区创建索引访问路径。
在本发明一实施例中,依据所述执行计划在所述目标数据分区中进行数据查询,即对已创建索引的数据分区进行索引访问。
如所述步骤S110所述,依据所述数据库内的数据表生成若干个数据分区,并通过若干个所述数据分区构建对应于若干个所述数据分区的索引。
在本发明一实施例中,可以结合下列描述进一步说明步骤S110所述“依据所述数据库内的数据表生成若干个数据分区,并通过若干个所述数据分区构建对应于若干个所述数据分区的索引”的具体过程。
如下列步骤所述,
S11010、在所述数据表中获取分区列表;
S11011、依据所述分区列表生成若干个所述数据分区。
需要说明的是,数据分区:数据库根据特定的数据列,将数据切分在不同的存储区域内,或切分到不同的数据库节点上。索引:索引是一种快速的数据检索方法,根据指定的值,或者指定的值范围,能够直接定位到具体的数据,而不用把这个数据扫描一遍筛选。
需要说明的是,通过执行索引创建命令;更新SQL查询计划缓存中,所有含数据表的查询计划,标记数据表的索引可用。也可根据数据库配置,直接删除查询缓存中含数据表所有的查询计划。
作为一种示例,通过数据表得到分区列表;依据分区列表进行分区生成若干个数据分区。
作为一种示例,创建索引时,读取数据表中的分区列表,获取每个分区的分区信息,选择一个分区或多个分区即若干个数据分区(并行情况下),读取若干个数据分区下一个或多个的存储位置信息。
在一具体实现中,例如销售信息表即数据表,可在单节点数据库上根据月份建立12个数据分区,也可在多节点数据库集群中按照月份建立12个数据分区。
如所述步骤S110所述,依据所述数据库内的数据表生成若干个数据分区,并通过若干个所述数据分区构建对应于若干个所述数据分区的索引。
在本发明一实施例中,可以结合下列描述进一步说明步骤S110所述“依 据所述数据库内的数据表生成若干个数据分区,并通过若干个所述数据分区构建对应于若干个所述数据分区的索引”的具体过程。
如下列步骤所述,
S11020、确定若干个所述数据分区是否有效;
S11021、当若干个所述数据分区均有效时,则获取若干个所述数据分区内的所有数据;
S11022、对所述所有数据中的每一条数据构建对应的所述索引。
需要说明的是,检查若干个数据分区是否有效,即所有数据分区是否已遍历加载完毕;其中,数据分区对应分区信息会存放在链表或数组内,逐个遍历,遍历过程会记录当前的遍历索引值,如果遍历索引值超出了链表或数组元素的个数,那么该索引代表的数据分区就是无效分区;如果已遍历加载完毕,则获取若干个数据分区内每一个数据分区的所有数据。
作为一种示例,确定当前数据分区是否有效,当前数据分区有效时,则在当前数据分区内获取对应于当前数据分区内的所有当前数据,对所有当前数据中的每一条数据建立对应的索引。
在一具体实现中,数据和数据类型是用户自定定义在数据表里面的。类似于excel表格,用户定义了列名,每行写入了数据,索引建立过程是从头逐行读取数据创建的。
如所述步骤S120所述,依据所述索引确定对应于若干个所述数据分区的若干个索引状态信息。
在本发明一实施例中,可以结合下列描述进一步说明步骤S120所述“依据所述索引确定对应于若干个所述数据分区的若干个索引状态信息”的具体过程。
如下列步骤所述,
S1201、确定对应于所述索引的索引分区状态表;
S1202、依据所述索引分区状态表生成对应于若干个所述数据分区的若干个所述索引状态信息。
需要说明的是,在某一数据分区内的所有数据都建立索引后,在索引分 区状态表里,更新对应于某一数据分区的索引状态信息即维护状态,将维护状态置为已完成维护即已创建;当某一数据分区内的所有数据还在建立索引时,在索引分区状态表里,更新对应于某一数据分区的索引状态信息即维护状态,将维护状态置为进行维护即正在创建;当某一数据分区内的所有数据未建立索引时,在索引分区状态表里,更新对应于某一数据分区的索引状态信息即维护状态,将维护状态置为尚未维护即未创建。
作为一种示例,索引集合中的每一个索引都有与之对应的一个索引状态信息,通过索引分区状态表可以得知索引的具体索引状态信息;在后续数据查询过程中,通过索引状态信息确定索引是否能够使用,对于不能使用的索引,使用其他方法进行数据查询,在一具体实施例中,其他方法包括表扫描查询。
如所述步骤S130所述,所述依据所述预设结构化查询语言语句、若干个所述索引状态信息和若干个所述数据分区确定访问的目标数据分区,其中,所述索引状态信息包括已创建、正在创建和未创建。
在本发明一实施例中,可以结合下列描述进一步说明步骤S130所述“所述依据所述预设结构化查询语言语句、若干个所述索引状态信息和若干个所述数据分区确定访问的目标数据分区,其中,所述索引状态信息包括已创建、正在创建和未创建”的具体过程。
如下列步骤所述,
S1301、依据若干个所述索引状态信息在若干个所述数据分区确定所述索引状态信息为所述已创建的所有数据分区;
S1302、在所述所有数据分区中通过所述预设结构化查询语言语句确定所述目标数据分区。
需要说明的是,本申请中提及的索引状态信息包括但不限于已创建、正在创建和未创建,根据具体实施例还可以包括其他信息用于表明索引状态信息。
作为一种示例,通过在若干个索引状态信息在若干个数据分区中将索引状态信息为已创建的所有数据分区筛选得到,再通过预设结构化查询语言语 句在通过筛选得到的所有数据分区中确定目标数据分区。
如所述步骤S1302所述,在所述所有数据分区中通过所述预设结构化查询语言语句确定所述目标数据分区。
在本发明一实施例中,可以结合下列描述进一步说明步骤S1302所述“在所述所有数据分区中通过所述预设结构化查询语言语句确定所述目标数据分区”的具体过程。
如下列步骤所述,
S13021、确定所述预设结构化查询语言语句在所述数据库是否存在查询计划;
S13022、当所述预设结构化查询语言语句在所述数据库不存在所述查询计划时,则对所述预设结构化查询语言语句进行解析生成特征信息;
S13023、在所述所有数据分区中依据所述特征信息确定所述目标数据分区。
需要说明的是,在数据库中确定预设结构化语言语句是否存在对应的查询计划,若不存在,则需要对预设结构化语言语句进行解析,提取预设结构化语言语句的文本特征即特征信息,在所有数据分区中依据特征信息确定目标数据分区。
作为一种示例,在提取预设结构化语言语句的文本特征即特征信息时,对文本特征即特征信息进行语义分析,通过语义分析得到分析信息,再由从所有数据分区中依据分析信息确定目标数据分区。
在一具体实现中,连接数据库,输入一条预设结构化查询语言语句;数据库接收到预设结构化查询语言语句,判断预设结构化查询语言语句是否有查询计划缓存(对预设结构化查询语言语句建立缓存是所有数据库的标准做法。一般是通过将预设结构化查询语言语句文本计算出一个ID值,通过ID值检索是否已产生了语法树,如果有现成的语法树,则不用重复解析预设结构化查询语言语句,从而提升查询性能)。
如所述步骤S150所述,依据所述执行计划在所述目标数据分区中进行数据查询。
在本发明一实施例中,可以结合下列描述进一步说明步骤S150所述“依据所述执行计划在所述目标数据分区中进行数据查询”的具体过程。
如下列步骤所述,
S1501、依据所述执行计划和所述目标数据分区确定访问路径;
S1502、依据所述访问路径进行数据查询。
需要说明的是,根据执行计划,开始执行预设结构化查询语言语句,在执行预设结构化查询语言语句时,在目标数据分区通过执行计划得到访问路径,通过访问路径进行数据查询;在完整预设结构化查询语言语句查询后,则执行完毕。
作为一种示例,设有12个数据分区,创建计划时候只有第一个数据分区对应的索引创建成功,其余数据分区正在创建中或未创建时,对于已构建索引的第一个数据分区,通过预设结构化查询语言语句在第一个数据分区内使用索引查询;对于其余数据分区内需要查询时,则通过表扫描查询;对于多个已构建索引的数据分区,依据预设结构化查询语言语句确定对应的目标数据分区,在目标数据分区使用索引查询,剩余数据分区处于正在构建和未构建时,则通过表扫描查询。
实施例一
例如销售数据表,可根据月份,将数据分为12个数据分区存放,在索引创建的时候先创建1月数据分区的索引,创建完维护一个状态标识当前分区已创建索引即索引状态信息。当用户查询1月份数据时候,就可以使用索引状态信息为已创建的索引访问数据;当用户查询2月份数据,因为未创建索引,将使用表扫描查询访问数据;当用户同时查询1月份和2月份数据时,根据查询逻辑和索引维护信息即索引状态信息,对1月份的数据进行索引查询,对2月份的数据进行表扫描查询。
而在传统数据库,数据分成12个数据分区,那么在12个分区未创建完成之前,是不能使用索引查询的。因此,即便是1月份的数据建立索引,查询时候也只能使用表扫描查询,这样极大降低了数据检索速度。
通过本发明提供的一种基于分片索引的自适应查询方法中,在索引维护 过程中,通过索引状态信息的维护,使得执行器和优化器能够感知特定的分区已完成索引创建,使得预设结构化查询语言语句在执行过程中使用索引来访问数据。
对于装置实施例而言,由于其与方法实施例基本相似,所以描述的比较简单,相关之处参见方法实施例的部分说明即可。
参照图2,示出了本申请一实施例提供的一种基于分片索引的自适应查询优化装置的结构框图;
一种基于分片索引的自适应查询优化装置,用于预设结构化查询语言语句在数据库中动态选择索引进行数据查询,所述装置包括:
构建模块210,用于依据所述数据库内的数据表生成若干个数据分区,并通过若干个所述数据分区构建对应于若干个所述数据分区的索引;
第一确定模块220,用于依据所述索引确定对应于若干个所述数据分区的若干个索引状态信息;
第二确定模块230,用于依据所述预设结构化查询语言语句、若干个所述索引状态信息和若干个所述数据分区确定访问的目标数据分区,其中,所述索引状态信息包括已创建、正在创建和未创建;
生成模块240,用于依据所述预设结构化查询语言语句和所述目标数据分区生成执行计划;
查询模块250,用于依据所述执行计划在所述目标数据分区中进行数据查询。
在本发明一实施例中,所述构建模块210,包括:
第一获取子模块,用于在所述数据表中获取分区列表;
第一生成子模块,用于依据所述分区列表生成若干个所述数据分区。
在本发明一实施例中,所述构建模块210,包括:
第一确定子模块,用于确定若干个所述数据分区是否有效;
第二获取子模块,用于当若干个所述数据分区均有效时,则获取若干个所述数据分区内的所有数据;
第一构建子模块,用于对所述所有数据中的每一条数据构建对应的所述 索引。
在本发明一实施例中,所述第一确定模块220,包括:
第二确定子模块,用于确定对应于所述索引的索引分区状态表;
第二生成子模块,用于依据所述索引分区状态表生成对应于若干个所述数据分区的若干个所述索引状态信息。
在本发明一实施例中,所述第二确定模块230,包括:
第三确定子模块,用于依据若干个所述索引状态信息在若干个所述数据分区确定所述索引状态信息为所述已创建的所有数据分区;
第四确定子模块,用于在所述所有数据分区中通过所述预设结构化查询语言语句确定所述目标数据分区。
在本发明一实施例中,所述第四确定子模块,包括:
第一确定单元,用于确定所述预设结构化查询语言语句在所述数据库是否存在查询计划;
第一生成单元,用于当所述预设结构化查询语言语句在所述数据库不存在所述查询计划时,则对所述预设结构化查询语言语句进行解析生成特征信息;
第二确定单元,用于在所述所有数据分区中依据所述特征信息确定所述目标数据分区。
在本发明一实施例中,所述查询模块250,包括:
第五确定子模块,用于依据所述执行计划和所述目标数据分区确定访问路径;
查询子模块,用于依据所述访问路径进行数据查询。
参照图3,示出了本发明的一种基于分片索引的自适应查询方法的计算机设备,具体可以包括如下:
上述计算机设备12以通用计算设备的形式表现,计算机设备12的组件可以包括但不限于:一个或者多个处理器或者处理单元16,系统存储器28,连接不同系统组件(包括系统存储器28和处理单元16)的总线18。
总线18表示几类总线18结构中的一种或多种,包括存储器总线18或 者存储器控制器,外围总线18,图形加速端口,处理器或者使用多种总线18结构中的任意总线18结构的局域总线18。举例来说,这些体系结构包括但不限于工业标准体系结构(ISA)总线18,微通道体系结构(MAC)总线18,增强型ISA总线18、音视频电子标准协会(VESA)局域总线18以及外围组件互连(PCI)总线18。
计算机设备12典型地包括多种计算机系统可读介质。这些介质可以是任何能够被计算机设备12访问的可用介质,包括易失性和非易失性介质,可移动的和不可移动的介质。
系统存储器28可以包括易失性存储器形式的计算机系统可读介质,例如随机存取存储器(RAM)30和/或高速缓存存储器32。计算机设备12可以进一步包括其他移动/不可移动的、易失性/非易失性计算机体统存储介质。仅作为举例,存储系统34可以用于读写不可移动的、非易失性磁介质(通常称为“硬盘驱动器”)。尽管图3中未示出,可以提供用于对可移动非易失性磁盘(如“软盘”)读写的磁盘驱动器,以及对可移动非易失性光盘(例如CD-ROM,DVD-ROM或者其他光介质)读写的光盘驱动器。在这些情况下,每个驱动器可以通过一个或者多个数据介质界面与总线18相连。存储器可以包括至少一个程序产品,该程序产品具有一组(例如至少一个)程序模块42,这些程序模块42被配置以执行本发明各实施例的功能。
具有一组(至少一个)程序模块42的程序/实用工具40,可以存储在例如存储器中,这样的程序模块42包括——但不限于——操作系统、一个或者多个应用程序、其他程序模块42以及程序数据,这些示例中的每一个或某种组合中可能包括网络环境的实现。程序模块42通常执行本发明所描述的实施例中的功能和/或方法。
计算机设备12也可以与一个或多个外部设备14(例如键盘、指向设备、显示器24、摄像头等)通信,还可与一个或者多个使得操作人员能与该计算机设备12交互的设备通信,和/或与使得该计算机设备12能与一个或多个其他计算设备进行通信的任何设备(例如网卡,调制解调器等等)通信。这种通信可以通过输入/输出(I/O)界面22进行。并且,计算机设备12还可以 通过网络适配器20与一个或者多个网络(例如局域网(LAN)),广域网(WAN)和/或公共网络(例如因特网)通信。如图所示,网络适配器20通过总线18与计算机设备12的其他模块通信。应当明白,尽管图3中未示出,可以结合计算机设备12使用其他硬件和/或软件模块,包括但不限于:微代码、设备驱动器、冗余处理单元16、外部磁盘驱动阵列、RAID系统、磁带驱动器以及数据备份存储系统34等。
处理单元16通过运行存储在系统存储器28中的程序,从而执行各种功能应用以及数据处理,例如实现本发明实施例所提供的一种基于分片索引的自适应查询方法。
也即,上述处理单元16执行上述程序时实现:依据所述数据库内的数据表生成若干个数据分区,并通过若干个所述数据分区构建对应于若干个所述数据分区的索引;依据所述索引确定对应于若干个所述数据分区的若干个索引状态信息;依据所述预设结构化查询语言语句、若干个所述索引状态信息和若干个所述数据分区确定访问的目标数据分区,其中,所述索引状态信息包括已创建、正在创建和未创建;依据所述预设结构化查询语言语句和所述目标数据分区生成执行计划;依据所述执行计划在所述目标数据分区中进行数据查询。
在本发明实施例中,本发明还提供一种计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时实现如本申请所有实施例提供的一种基于分片索引的自适应查询方法:
也即,给程序被处理器执行时实现:依据所述数据库内的数据表生成若干个数据分区,并通过若干个所述数据分区构建对应于若干个所述数据分区的索引;依据所述索引确定对应于若干个所述数据分区的若干个索引状态信息;依据所述预设结构化查询语言语句、若干个所述索引状态信息和若干个所述数据分区确定访问的目标数据分区,其中,所述索引状态信息包括已创建、正在创建和未创建;依据所述预设结构化查询语言语句和所述目标数据分区生成执行计划;依据所述执行计划在所述目标数据分区中进行数据查询。
可以采用一个或多个计算机可读的介质的任意组合。计算机可读介质可 以是计算机可读信号介质或者计算机可读存储介质。计算机可读存储介质例如可以是——但不限于——电、磁、光、电磁、红外线或半导体的系统、装置或器件,或者任意以上的组合。计算机可读存储介质的更具体的例子(非穷举的列表)包括:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储器件或者上述的任意合适的组合。在本文件中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。
计算机可读的信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式,包括——但不限于——电磁信号、光信号或上述的任意合适的组合。计算机可读的信号介质还可以是计算机可读存储介质以外的任何计算机可读介质,该计算机可读介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。
可以以一种或多种程序设计语言或其组合来编写用于执行本发明操作的计算机程序代码,上述程序设计语言包括面向对象的程序设计语言——诸如Java、Smalltalk、C++,还包括常规的过程式程序设计语言——诸如“C”语言或类似的程序设计语言。程序代码可以完全地在操作人员计算机上执行、部分地在操作人员计算机上执行、作为一个独立的软件包执行、部分在操作人员计算机上部分在远程计算机上执行或者完全在远程计算机或者服务器上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络——包括局域网(LAN)或广域网(WAN)——连接到操作人员计算机,或者,可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。本说明书中的各个实施例均采用递进的方式描述,每个实施例重点说明的都是与其他实施例的不同之处,各个实施例之间相同相似的部分互相参见即可。
尽管已描述了本申请实施例的优选实施例,但本领域内的技术人员一旦 得知了基本创造性概念,则可对这些实施例做出另外的变更和修改。所以,所附权利要求意欲解释为包括优选实施例以及落入本申请实施例范围的所有变更和修改。
最后,还需要说明的是,在本文中,诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者终端设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者终端设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、物品或者终端设备中还存在另外的相同要素。
以上对本申请所提供的一种基于分片索引的自适应查询方法和装置,进行了详细介绍,本文中应用了具体个例对本申请的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本申请的方法及其核心思想;同时,对于本领域的一般技术人员,依据本申请的思想,在具体实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本申请的限制。

Claims (10)

  1. 一种基于分片索引的自适应查询方法,其特征在于,用于预设结构化查询语言语句在数据库中动态选择索引进行数据查询,所述方法包括:
    依据所述数据库内的数据表生成若干个数据分区,并通过若干个所述数据分区构建对应于若干个所述数据分区的索引;
    依据所述索引确定对应于若干个所述数据分区的若干个索引状态信息;
    依据所述预设结构化查询语言语句、若干个所述索引状态信息和若干个所述数据分区确定访问的目标数据分区,其中,所述索引状态信息包括已创建、正在创建和未创建;
    依据所述预设结构化查询语言语句和所述目标数据分区生成执行计划;
    依据所述执行计划在所述目标数据分区中进行数据查询。
  2. 根据权利要求1所述的方法,其特征在于,所述依据所述数据库内的数据表生成若干个数据分区的步骤,包括:
    在所述数据表中获取分区列表;
    依据所述分区列表生成若干个所述数据分区。
  3. 根据权利要求1所述的方法,其特征在于,所述通过若干个所述数据分区构建对应于若干个所述数据分区的索引的步骤,包括:
    确定若干个所述数据分区是否有效;
    当若干个所述数据分区均有效时,则获取若干个所述数据分区内的所有数据;
    对所述所有数据中的每一条数据构建对应的所述索引。
  4. 根据权利要求1所述的方法,其特征在于,所述依据所述索引确定对应于若干个所述数据分区的若干个索引状态信息的步骤,包括:
    确定对应于所述索引的索引分区状态表;
    依据所述索引分区状态表生成对应于若干个所述数据分区的若干个所述索引状态信息。
  5. 根据权利要求1所述的方法,其特征在于,所述依据所述预设结构化查询语言语句、若干个所述索引状态信息和若干个所述数据分区确定访问的目标数据分区,其中,所述索引状态信息包括已创建、正在创建和未创建的步骤,包括:
    依据若干个所述索引状态信息在若干个所述数据分区确定所述索引状态信息为所述已创建的所有数据分区;
    在所述所有数据分区中通过所述预设结构化查询语言语句确定所述目标数据分区。
  6. 根据权利要求5所述的方法,其特征在于,所述在所述所有数据分区中通过所述预设结构化查询语言语句确定所述目标数据分区的步骤,包括:
    确定所述预设结构化查询语言语句在所述数据库是否存在查询计划;
    当所述预设结构化查询语言语句在所述数据库不存在所述查询计划时,则对所述预设结构化查询语言语句进行解析生成特征信息;
    在所述所有数据分区中依据所述特征信息确定所述目标数据分区。
  7. 根据权利要求1所述的方法,其特征在于,所述依据所述执行计划在所述目标数据分区中进行数据查询的步骤,包括:
    依据所述执行计划和所述目标数据分区确定访问路径;
    依据所述访问路径进行数据查询。
  8. 一种基于分片索引的自适应查询优化装置,其特征在于,用于预设结构化查询语言语句在数据库中动态选择索引进行数据查询,所述装置包括:
    构建模块,用于依据所述数据库内的数据表生成若干个数据分区,并通过若干个所述数据分区构建对应于若干个所述数据分区的索引;
    第一确定模块,用于依据所述索引确定对应于若干个所述数据分区的若干个索引状态信息;
    第二确定模块,用于依据所述预设结构化查询语言语句、若干个所述索引状态信息和若干个所述数据分区确定访问的目标数据分区,其中,所述索引状态信息包括已创建、正在创建和未创建;
    生成模块,用于依据所述预设结构化查询语言语句和所述目标数据分区生成执行计划;
    查询模块,用于依据所述执行计划在所述目标数据分区中进行数据查询。
  9. 一种计算机设备,其特征在于,包括处理器、存储器及存储在所述存储器上并能够在所述处理器上运行的计算机程序,所述计算机程序被所述处理器执行时实现如权利要求1至7中任一项所述的方法。
  10. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质上存储计算机程序,所述计算机程序被处理器执行时实现如权利要求1至7中任一项所述的方法。
PCT/CN2022/135893 2022-11-23 2022-12-01 一种基于分片索引的自适应查询方法和装置 WO2024108638A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202211475025.8 2022-11-23
CN202211475025.8A CN115809268B (zh) 2022-11-23 2022-11-23 一种基于分片索引的自适应查询方法和装置

Publications (1)

Publication Number Publication Date
WO2024108638A1 true WO2024108638A1 (zh) 2024-05-30

Family

ID=85483947

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/135893 WO2024108638A1 (zh) 2022-11-23 2022-12-01 一种基于分片索引的自适应查询方法和装置

Country Status (2)

Country Link
CN (1) CN115809268B (zh)
WO (1) WO2024108638A1 (zh)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100235348A1 (en) * 2009-03-10 2010-09-16 Oracle International Corporation Loading an index with minimal effect on availability of applications using the corresponding table
US20150302035A1 (en) * 2014-04-17 2015-10-22 Oracle International Corporation Partial indexes for partitioned tables
CN109815231A (zh) * 2018-12-26 2019-05-28 亚信科技(中国)有限公司 数据分片方法、装置、计算机设备和存储介质
CN111221840A (zh) * 2018-11-23 2020-06-02 阿里巴巴集团控股有限公司 数据处理方法及装置、数据缓存方法、存储介质、系统

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101216821B (zh) * 2007-01-05 2010-06-16 中兴通讯股份有限公司 数据采集系统的存储管理方法
CN102073697A (zh) * 2010-12-28 2011-05-25 中兴通讯股份有限公司 一种数据处理方法及装置
US8909615B2 (en) * 2011-08-30 2014-12-09 Open Text S.A. System and method of managing capacity of search index partitions
KR101546333B1 (ko) * 2014-02-20 2015-08-25 주식회사 티맥스데이터 복합 저장소를 가지는 데이터베이스에서 질의 처리 장치 및 방법
CN112162983A (zh) * 2020-09-22 2021-01-01 北京人大金仓信息技术股份有限公司 数据库索引建议处理方法、装置、介质和电子设备
CN114168594A (zh) * 2021-12-08 2022-03-11 上海达梦数据库有限公司 一种水平分区表的二级索引创建方法、装置、设备及存储介质
CN115129712A (zh) * 2022-06-09 2022-09-30 北京思特奇信息技术股份有限公司 基于Oracle数据库的分区快速创建方法及系统

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100235348A1 (en) * 2009-03-10 2010-09-16 Oracle International Corporation Loading an index with minimal effect on availability of applications using the corresponding table
US20150302035A1 (en) * 2014-04-17 2015-10-22 Oracle International Corporation Partial indexes for partitioned tables
CN111221840A (zh) * 2018-11-23 2020-06-02 阿里巴巴集团控股有限公司 数据处理方法及装置、数据缓存方法、存储介质、系统
CN109815231A (zh) * 2018-12-26 2019-05-28 亚信科技(中国)有限公司 数据分片方法、装置、计算机设备和存储介质

Also Published As

Publication number Publication date
CN115809268B (zh) 2024-01-16
CN115809268A (zh) 2023-03-17

Similar Documents

Publication Publication Date Title
CN107402990A (zh) 一种分布式NewSQL数据库系统和半结构化数据储存方法
CN109471851B (zh) 数据处理方法、装置、服务器和存储介质
US10783142B2 (en) Efficient data retrieval in staged use of in-memory cursor duration temporary tables
CN109947804B (zh) 数据集合查询的优化方法、装置、服务器和存储介质
CN112463800A (zh) 数据读取方法、装置、服务器及存储介质
US20060235820A1 (en) Relational query of a hierarchical database
CN103810219A (zh) 一种基于行存储数据库的数据处理方法及装置
US20230418824A1 (en) Workload-aware column inprints
WO2017107130A1 (zh) 数据查询方法和数据库系统
CN114168616A (zh) 数据采集方法、装置、电子设备及存储介质
CN113468209A (zh) 一种电网监控系统高速内存数据库访问方法
CN113254519A (zh) 多源异构数据库的访问方法、装置、设备和存储介质
CN111666302A (zh) 用户排名的查询方法、装置、设备及存储介质
CN109542912B (zh) 区间数据存储方法、装置、服务器及存储介质
WO2024108638A1 (zh) 一种基于分片索引的自适应查询方法和装置
CN111522820A (zh) 数据存储结构、存储检索方法、系统、设备及存储介质
CN112000666B (zh) 一种面向列的数据库管理系统
CN114064729A (zh) 一种数据检索方法、装置、设备及存储介质
US10762139B1 (en) Method and system for managing a document search index
CN109800233A (zh) 一种大数据融合搜索方法
CN115544074A (zh) 一种数据查询方法及装置
CN112818010B (zh) 数据库查询方法及装置
CN108984720B (zh) 基于列存储的数据查询方法、装置、服务器及存储介质
CN112800054A (zh) 数据模型的确定方法、装置、设备及存储介质
CN111723104A (zh) 一种数据处理系统中语法分析的方法、装置及系统