CN115964387A - Data query method and device, distributed database system and medium - Google Patents

Data query method and device, distributed database system and medium Download PDF

Info

Publication number
CN115964387A
CN115964387A CN202211597050.3A CN202211597050A CN115964387A CN 115964387 A CN115964387 A CN 115964387A CN 202211597050 A CN202211597050 A CN 202211597050A CN 115964387 A CN115964387 A CN 115964387A
Authority
CN
China
Prior art keywords
data
node
index
distributed database
database system
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211597050.3A
Other languages
Chinese (zh)
Inventor
朱仲颖
万伟
孟正凌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Dameng Database Co Ltd
Original Assignee
Shanghai Dameng Database Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Dameng Database Co Ltd filed Critical Shanghai Dameng Database Co Ltd
Priority to CN202211597050.3A priority Critical patent/CN115964387A/en
Publication of CN115964387A publication Critical patent/CN115964387A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a data query method, a data query device, a distributed database system and a medium, wherein the method comprises the following steps: receiving a data query request sent by a client; determining a first node based on the data query request and a global index of the distributed database system, the first node being a node in the distributed database system that queries for the first target data; and sending a data query instruction to the first node, and acquiring the first target data returned by the first node based on the data query instruction. According to the method, the first node is determined based on the data query request and the global index of the distributed database system, the data query instruction is sent to the first node, and the first target data corresponding to the data query instruction can be obtained, so that the function of performing data query in the distributed database system based on the global index is realized, and the processing performance of the distributed database system is improved.

Description

Data query method and device, distributed database system and medium
Technical Field
The present invention relates to the field of distributed database technologies, and in particular, to a data query method, apparatus, distributed database system, and medium.
Background
Compared with the traditional centralized database, the distributed database has the advantages of flexible architecture, high reliability and the like, but also has the defects of high system overhead, complex access structure and the like. The complex access structure makes the functions of many centralized database environments difficult to implement in distributed database environments, such as global indexing techniques.
Existing distributed databases generally cannot use some characteristics of the global index, resulting in lower processing performance.
Disclosure of Invention
The invention provides a data query method, a data query device, a distributed database system and a medium, which are used for improving the processing performance of the distributed database system.
According to an aspect of the present invention, there is provided a data query method applied to a distributed database system, including:
receiving a data query request sent by a client, wherein the data query request is used for requesting to query first target data;
determining a first node based on the data query request and a global index of the distributed database system, wherein the first node is a node in the distributed database system for querying the first target data;
and sending a data query instruction to the first node, and acquiring the first target data returned by the first node based on the data query instruction.
According to another aspect of the present invention, there is provided a data query apparatus configured in a distributed database system, including:
the first receiving module is used for receiving a data query request sent by a client, wherein the data query request is used for requesting to query first target data;
a first determining module, configured to determine a first node based on the data query request and a global index of the distributed database system, where the first node is a node in the distributed database system that queries the first target data;
and the first sending module is used for sending a data query instruction to the first node and acquiring the first target data returned by the first node based on the data query instruction.
According to another aspect of the present invention, there is provided a distributed database system comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein, the first and the second end of the pipe are connected with each other,
the memory stores a computer program executable by the at least one processor, the computer program being executable by the at least one processor to enable the at least one processor to perform the data query method of any of the embodiments of the invention.
According to another aspect of the present invention, there is provided a computer-readable storage medium storing computer instructions for causing a processor to implement a data query method according to any one of the embodiments of the present invention when the computer instructions are executed.
The embodiment of the invention provides a data query method, a data query device, a distributed database system and a medium, wherein the method is applied to the distributed database system and comprises the following steps: receiving a data query request sent by a client, wherein the data query request is used for requesting to query first target data; determining a first node based on the data query request and a global index of the distributed database system, the first node being a node in the distributed database system that queries for the first target data; and sending a data query instruction to the first node, and acquiring the first target data returned by the first node based on the data query instruction. By means of the technical scheme, the first node is determined based on the data query request and the global index of the distributed database system, the data query instruction is sent to the first node, and the first target data corresponding to the data query instruction can be obtained, so that the function of performing data query in the distributed database system based on the global index is achieved, and the processing performance of the distributed database system is improved.
It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present invention, nor do they necessarily limit the scope of the invention. Other features of the present invention will become apparent from the following description.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a flowchart of a data query method according to an embodiment of the present invention;
FIG. 2 is a flowchart of a data query method according to a second embodiment of the present invention;
FIG. 3 is a flowchart illustrating a data query method according to a second embodiment of the present invention;
fig. 4 is a flowchart illustrating a data insertion method according to a second embodiment of the present invention;
fig. 5 is a flowchart illustrating a data deleting method according to a second embodiment of the present invention;
FIG. 6 is a flowchart illustrating a data updating method according to a second embodiment of the present invention;
FIG. 7 is a schematic structural diagram of a data query device according to a third embodiment of the present invention;
fig. 8 is a schematic structural diagram of a distributed database system according to a fourth embodiment of the present invention.
Detailed Description
In order to make those skilled in the art better understand the technical solutions of the present invention, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without making any creative effort based on the embodiments in the present invention, shall fall within the protection scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
Example one
Fig. 1 is a flowchart of a data query method according to an embodiment of the present invention, where the present embodiment is applicable to a case of querying data, and the method may be executed by a data query device, where the data query device may be implemented in a form of hardware and/or software, and the data query device may be configured in a distributed database system.
It is believed that the distributed database is logically a unified whole and physically stored at different sites. An application may access a database distributed over different geographic locations through a network connection.
Compared with the traditional centralized database, the distributed database has the advantages of flexible architecture, high reliability, improved performance and the like, but also has the defects of high system overhead, complex access structure and the like. In the centralized database, the global index of the partition table has the following characteristics:
the global index does not depend on the distribution of the partitions, but independently stores the index column data of each partition, and index keys do not need to be respectively retrieved in each partition of the original partition table during retrieval; if the local index of the partition table does not completely contain the partition column, the constraint of each local index cannot ensure that the key value of the whole partition table is unique, and at the moment, when the constraint is created on the partition table, the global index must be used. Because the index data of the common secondary index is scattered on each local index, the constraint can only ensure the uniqueness of the key value of each local index, and the uniqueness of the whole index cannot be ensured. The global index is stored separately, and the unique constraint acts on the whole global index, so that the uniqueness of the whole global index can be ensured.
However, if the distributed database does not already support the global index, the distributed database cannot use some characteristics similar to the global index in the centralized database. For example: the index data is distributed independently of the base table; creating constraints on the table that do not completely contain partition columns also ensures that the constraint columns are globally unique.
In order to solve the above problems, the present invention provides a method for data query based on global index in distributed database environment. As shown in fig. 1, the method includes:
s110, receiving a data query request sent by a client, wherein the data query request is used for requesting to query first target data.
The data query request may be regarded as a request for requesting to query the first target data from the client, and the type of the data query request is not limited, for example, the data query request may be a certain query statement, the query statement may include a filter condition of the query, and the like. The first target data may refer to data to be queried by the client.
In this step, the data query request sent by the client may be received first, and then the subsequent query of the first target data is performed based on the received data query request, where the manner of receiving the data query request is not limited as long as the data query request sent by the client can be received.
S120, determining a first node based on the data query request and a global index of the distributed database system, wherein the first node is a node for querying the first target data in the distributed database system.
The first node may refer to a node in the distributed database system that queries for the first target data, for example, the first node may be a node in the distributed database system where the aggregation index is located. It is believed that, in the embodiment, the tables of the database (except the list storage table and the heap table) are mainly managed by using a B-tree index structure, each common table has and only corresponds to one aggregation index, data is sorted by an aggregation index key, and any record can be quickly queried according to the aggregation index key.
In addition, each general table may also correspond to a secondary index. The secondary index column and the aggregation index column can be stored on the leaf nodes of the B-tree together, and if a non-aggregation index key value or an aggregation index key value is searched, the non-aggregation index key value or the aggregation index key value can be directly found in the B-tree; if data other than the index key value is searched, the data needs to return to the first-level index for searching. When the data of the aggregated index on the base table is modified, the data on all secondary indexes needs to be synchronously maintained. For a common secondary index or a global index of a centralized database, each record of the secondary index and the corresponding record on the aggregated index are distributed on the same site, so that when the secondary index data is maintained, synchronous modification can be directly performed on a data page stored in the secondary index corresponding to the site according to the change of the aggregated index record.
The global index may be considered as one of the two-level indexes, and the ordinary two-level index in the present embodiment refers to the two-level index except the global index.
In one embodiment, the global index is partitioned according to a preset distribution mode.
In this embodiment, the global index in the distributed database environment may be allowed to be created in the form of a partitioned index, and the specific distribution manner is not limited, for example, index partitioning may be performed according to a preset distribution manner, the preset distribution manner may include a default distribution manner or a distribution manner specified by a user, that is, the global index may be distributed on each site (i.e., node) in the default distribution manner or the distribution manner specified by the user, the partition columns may be the first several columns or all the columns of the base table in sequence, and the distribution manner is independent of the base table itself.
Since the distribution of the global index is independent of the base table, that is, the data in the global index and the data on the base table can be distributed on different sites, when data is added or deleted on the base table, the data needs to be maintained across the sites.
In one embodiment, the global index stores global index data including at least one of an index key, an aggregation key, a belonging base table identifier, a transaction identifier, and a feature value identifier.
The global index data may be understood as data stored in the global index, the index key value may refer to a key value of the global index corresponding to the data, the aggregation key value may refer to a key value of the aggregation index corresponding to the data, the identifier of the base table to which the data belongs may refer to an identifier of the base table to which the data belongs, the transaction identifier is used for uniquely identifying a transaction in the system, and the identifier of the feature value may be considered as a detailed address of the data, which may be used to quickly locate a row number of the data.
In the embodiment, after the global index data in the global index is determined, the information of other columns in the data can be quickly found in the aggregation index of the base table based on the global index data.
Specifically, after the data query request is received, the first node may be determined based on the data query request and the global index of the distributed database system to obtain the subsequent first target data, and the step of determining the first node is not limited, for example, the global index data may be determined first, and then the first node may be determined based on the global index data.
S130, sending a data query instruction to the first node, and acquiring the first target data returned by the first node based on the data query instruction.
The data query instruction may be used to instruct the first node to perform a data query and return the first target data, e.g., the data query instruction may include information required for the query.
In this step, a data query instruction may be sent to the first node, and then after the first node queries the first target data based on the data query instruction, the first target data returned by the first node may be acquired, where a process of specifically acquiring the first target data is not further expanded.
The data query method provided by the embodiment of the invention receives a data query request sent by a client, wherein the data query request is used for requesting to query first target data; determining a first node based on the data query request and a global index of the distributed database system, the first node being a node in the distributed database system that queries for the first target data; and sending a data query instruction to the first node, and acquiring the first target data returned by the first node based on the data query instruction. By using the method, the first node is determined based on the data query request and the global index of the distributed database system, and the data query instruction is sent to the first node, so that the first target data corresponding to the data query instruction can be obtained, the function of performing data query in the distributed database system based on the global index is realized, and the processing performance of the distributed database system is improved.
In one embodiment, before said determining the first node based on the data query request and a global index of the distributed database system, further comprising:
judging whether the first target data needs to be inquired by using a global index or not according to a preset cost strategy and the data inquiry request;
the determining a first node based on the data query request and a global index of the distributed database system comprises:
if it is determined that a global index needs to be used, a first node is determined based on the data query request and the global index of the distributed database system.
The preset cost policy may be regarded as a predetermined policy for performing cost estimation on the query first target data.
Specifically, after receiving the data query request, it may be determined whether to use the global index for querying the first target data according to a preset cost policy and the data query request, where a specific determination process is not limited, for example, different first target data may correspond to different determination processes, for example, it may be determined in advance whether to use the aggregation index for querying the first target data, and then it may be determined whether to use the global index for querying the first target data on the basis of not using the aggregation index, for example, it may be determined whether to use the global index by estimating a cost for querying the first target data based on the global index.
Then, under the condition that the global index is determined to be needed, the first node can be determined based on the data query request and the global index of the distributed database system, so that the subsequent steps can be executed; on the other hand, in the event that it is determined that the global index need not be used, the first target data may be queried based on other secondary indexes.
In one embodiment, after the obtaining the first target data returned by the first node based on the data query instruction, the method further includes:
and sending the first target data to the client.
In one embodiment, after obtaining the first target data returned by the first node based on the data query instruction, the first target data may be sent to the client to complete the response to the client data query request.
In one embodiment, the method further comprises:
receiving a data processing request, wherein the data processing request is used for requesting processing of indexes to be processed in the distributed database system, the indexes to be processed comprise first secondary indexes, and the first secondary indexes are global indexes;
determining a third node based on the data processing request, and sending a target instruction containing processing information to the third node, where the target instruction is used to instruct the third node to process the processing information, the third node is a node in the distributed database system that performs data processing on the index to be processed, and the processing information includes data to be processed carried in the data processing request.
The data processing request may be a request for processing an index to be processed in the distributed database system, for example, inserting a row of data, that is, inserting data into the index to be processed in the distributed database system; the source of the data processing request is not limited, for example, the data processing request may be from a user or from a related technician, which is not limited in this embodiment. The index to be processed may be considered as an index that needs to be processed, for example, the index to be processed may include a global index, and may also include other indexes that need to be processed.
The third node may be a node that performs data processing on the index to be processed, such as a node that performs data processing on the global index; the target instruction may be used to instruct the third node to process processing information, where the processing information may include data to be processed carried in the data processing request, or may also include data hidden in the data processing request, such as a base table identifier and a feature value identifier.
In this embodiment, the data processing request may be received first, then the third node is determined based on the received data processing request, and after the third node is determined, a target instruction may be sent to the third node so that the third node processes the processing information, thereby completing the response to the data processing request. On the basis, the data processing of the index to be processed is realized, so that the processing performance of the distributed database system is further improved.
In one embodiment, the pending index further comprises an aggregated index and/or a second level index.
In one embodiment, the pending indexes may further include an aggregate index and/or a second level index, which may be understood as a normal second level index other than the global index.
It is to be appreciated that the aggregated index and/or the second level index may be similarly processed while the global index is maintained, and the processing steps may be similar to the processing of the global index, and the processing order is not limited. For example, the aggregate index may be processed first, and then the global index may be processed; or processing the global index first and then processing the aggregation index; the global index and the aggregate index may also be processed at the same time, which is not limited in this embodiment.
For example, after receiving the data insertion request, the aggregation node that performs data insertion on the aggregation index may also be determined first, and then a corresponding insertion instruction is sent to the determined aggregation node, so that the aggregation node inserts the data; and then, determining the third node based on the received data insertion request, and after the third node is determined, sending a target instruction to the third node so that the third node performs insertion processing on the processing information. On the basis, the comprehensive processing of the index to be processed is realized, and the processing accuracy is improved.
In one embodiment, when the data processing request is a data update request, the third node includes a fourth node and a fifth node, and determining the third node based on the data processing request and sending a target instruction containing processing information to the third node includes:
determining a fourth node based on first to-be-processed data in the data updating request, and sending a first target instruction containing first processing information to the fourth node, wherein the first processing information comprises the first to-be-processed data, and the first to-be-processed data is original data;
determining a fifth node based on second to-be-processed data in the data updating request, and sending a second target instruction containing second processing information to the fifth node, where the second processing information includes the second to-be-processed data, and the second to-be-processed data is updated data.
The data update request may be considered as a request for changing data, for example, some original data is changed into update data, the original data is data before changing, and the update data is data after changing. It is considered that the data update request needs to delete the original data and then insert the updated data, and the original data and the updated data can be located at different positions, so that the nodes processing the original data and the updated data can be distinguished.
The fourth node may be a node that deletes original data in the global index, and the fifth node may be a node that inserts changed data in the global index; the first target instruction may be used to instruct the fourth node to process the first processed information, such as to delete the original data, and the second target instruction may be used to instruct the fifth node to process the second processed information, such as to insert the updated data. The first processing information may include original data, information related to the original data, and the like, and the second processing information may include updated data, information related to the updated data, and the like.
Specifically, when the data processing request is a data update request, the fourth node may be determined based on the original data in the data update request, and a first target instruction including the first processing information may be sent to the determined fourth node, so as to complete deletion of the original data; and then, determining the fifth node based on the updating data in the data updating request, and sending a second target instruction containing second processing information to the determined fifth node to complete the insertion of the updating data.
Example two
Fig. 2 is a flowchart of a data query method according to a second embodiment of the present invention, and the second embodiment is optimized based on the foregoing embodiments. In this embodiment, the determining the first node based on the data query request and the global index of the distributed database system is further embodied as: determining a second node based on the data query request and a global index of the distributed database system, wherein the second node is a node in which global index data is located in the distributed database system, and the global index data corresponds to the data query request; obtaining the global index data from the second node; and determining a first node based on the aggregation key value in the global index data.
Please refer to the first embodiment for a detailed description of the present embodiment.
As shown in fig. 2, the method includes:
s210, receiving a data query request sent by a client.
S220, determining a second node based on the data query request and a global index of the distributed database system, wherein the second node is a node where global index data are located in the distributed database system, and the global index data correspond to the data query request.
The global index data may be regarded as data corresponding to the data query request in the global index, and the second node may be regarded as a node where the global index data is located in the distributed database system.
In an embodiment, the second node may be determined based on the data query request and the global index of the distributed database system, and the step of determining the second node may be determined according to a situation of an actual global index, which is not limited in this embodiment.
In one embodiment, the determining a second node based on the data query request and a global index of the distributed database system comprises:
determining a first partition identification corresponding to an index key value based on the index key value in the data query request;
and determining a first target table space to which the first partition identifier belongs, and determining a node where the first target table space is located as a second node.
The first partition identifier may refer to an identifier of a partition in which the index key is located, and the first target tablespace may be understood as a tablespace to which the first partition identifier belongs.
Specifically, in the process of determining the second node, the first partition identifier corresponding to the index key value may be determined based on the index key value in the data query request, for example, the first partition identifier corresponding to the index key value may be determined according to a first preset partition rule, and if the first preset partition rule may be that the index key value is smaller than 50, the corresponding partition identifier is P1; when the index key value is greater than or equal to 50, the corresponding partition is identified as P2.
After the first partition identifier is determined, a first target tablespace to which the first partition identifier belongs may be determined, and then a node where the first target tablespace is located is determined as a second node, where the first target tablespace may be determined according to a rule for creating a global index, which is not further expanded in this embodiment.
S230, obtaining the global index data from the second node.
After the second node is determined based on the above steps, the corresponding global index data may be acquired from the second node, for example, the second node may be requested to return the corresponding global index data by sending a query request, and after the second node receives the query request, the global index data may be acquired by querying and fed back.
S240, determining a first node based on the aggregation key values in the global index data.
Therefore, the first node may be determined according to the obtained aggregation key value in the global index data, and the process of determining the first node is not limited.
In one embodiment, the determining the first node based on the aggregation key value in the global index data includes:
determining a second partition identification corresponding to an aggregation key value based on the aggregation key value in the global index data;
and determining a second target table space to which the second partition identifier belongs, and determining a node where the second target table space is located as a first node.
The second partition identifier may refer to an identifier of a partition in which the aggregation key is located, and the second target tablespace may be understood as a tablespace to which the second partition identifier belongs.
Specifically, in the process of determining the first node, the second partition identifier corresponding to the aggregation key value may be determined based on the aggregation key value in the global index data, and if the second partition identifier may be determined according to a second preset partition rule, the content of the second preset partition rule may be the same as or different from the first preset partition rule, which is not limited in this embodiment.
After the second partition identifier is determined, the second target tablespace to which the second partition identifier belongs may be determined, and then the node where the second target tablespace is located is determined as the first node, and the second target tablespace may be determined according to a rule for creating an aggregation index, which is not further expanded in this embodiment.
S250, sending a data query instruction to the first node, and acquiring the first target data returned by the first node based on the data query instruction.
The second data query method provided by the embodiment of the invention receives a data query request sent by a client; determining a second node based on the data query request and a global index of the distributed database system, wherein the second node is a node in which global index data is located in the distributed database system, and the global index data corresponds to the data query request; obtaining the global index data from the second node; determining a first node based on an aggregation key value in the global index data; and sending a data query instruction to the first node, and acquiring the first target data returned by the first node based on the data query instruction. By the method, the first node can be determined based on the aggregation key values in the global index data by acquiring the global index data, so that a basis is provided for acquiring the first target data.
The following is an exemplary description of embodiments of the invention:
a DMDPC global index IDX1 with two partitions is created on a partition TABLE named TABLE 1.
CREATE TABLE TABLE1(C1 INT CLUSTER KEY,C2 INT)PARTITION BY RANGE(C1)
(
PARTITION P1 VALUES LESS THAN(100)STORAGE(ON TS1),
PARTITION P2 VALUES LESS THAN(200)STORAGE(ON TS2),
PARTITION P3 VALUES LESS THAN(MAXVALUE)STORAGE(ON TS3)
);
CREATE INDEX IDX1 ON TABLE1(C2)GLOBAL PARTITION BY RANGE(C2)
(
PARTITION P1 VALUES LESS THAN(50)STORAGE(ON TS4),
PARTITION P2 VALUES LESS THAN(MAXVALUE)STORAGE(ON TS5)
)。
Wherein, TS1/TS2/TS3/TS4/TS5 is the name of a table space, each table space is located on a specific site, for example, the whole distributed environment has two data storage sites, TS1/TS2/TS3 is located on site 1, and TS4/TS5 is located on site 2.
Fig. 3 is a schematic flowchart of a data query method according to a second embodiment of the present invention, and as shown in fig. 3, a query statement is first input: SELECT C1 FROM TABLE1 WHERE C2=100; then, whether to use the aggregate index or the global index is determined according to the cost estimation, and it is considered that, in the query, the filtering condition C2=100 is a condition related to the global index column, so that it may be determined to perform the query using the global index (i.e., whether to use the global index is required to query the first target data according to the preset cost policy and the data query request).
Subsequently, the target record (i.e., the global index data) may be searched at the site where the global index is located, that is, the site (i.e., the second node) where the global index data is located in the distributed database system is determined based on the data query request and the global index, and then the corresponding global index data is obtained from the node.
Illustratively, the process of determining the second node may be: according to the index key value 100 (i.e. the index key value in the data query request), it may be determined that the corresponding partition identifier is IDX1_ P2 (i.e. the first partition identifier); partition identification IDX1_ P2 belongs to table space TS5 (i.e., the first target table space), and table space TS5 is located at site 2 (i.e., the second node).
Searching a target record (namely, global index data) at a site where a global index is located, namely finding a record (100, 1, tab, trxid, rowid) meeting the condition of C2=100, where 100 may be a key value of a secondary index, 1 may be an aggregation index key value corresponding to the record, and tab, trxid, and rowid may be used to refer to a table ID where the record is located, a transaction ID inserted into the record, and a row number (such as a unique identifier in the table) of the record, respectively.
Then, a site where the aggregate index is located is calculated according to the aggregate index key value of the record and relevant information is sent, that is, according to the aggregate index key value in the global index record (that is, the global index data), the aggregate index record is calculated to fall on a partition TABLE1_ P1 (that is, a second partition identifier) of the TABLE, the partition identifier TABLE1_ P1 belongs to a TABLE space TS1 (that is, a second target TABLE space), and the TABLE space TS1 is located at the site 1 (that is, a first node). And transmits the recording-related information (i.e., global index data) to site 1 via one communication operation.
Then, the aggregated index record is obtained according to the aggregated index key value, and visibility judgment is performed, specifically, a corresponding record (i.e., the first target data) is found on the site 1 according to information such as the aggregated key value. Visibility decisions may also be made based on the status of the record, such as whether the record has been modified by other transactions during the query, resulting in a record that is not visible.
Finally, after the record is obtained, subsequent operations such as filtering, connecting, grouping, projecting, returning a result set and the like can be performed according to the query statement, and in this example, the first target data is returned to the client.
Fig. 4 is a schematic flow chart of a data insertion method according to a second embodiment of the present invention, and as shown in fig. 4, an INSERT statement may be first received: INSERT TABLE1 VALUES (1, 100) (i.e., receiving a data processing request), it may be considered that inserting data INTO a TABLE is equivalent to inserting data INTO all of the indices (i.e., the first secondary index, the aggregate index, and the second secondary index) on the TABLE that need to be maintained.
Then, it is determined that the statement is not a query insertion statement, a site where an insertion value is located may be calculated, and data is inserted into the aggregate index and the normal secondary index (i.e., the aggregate index and the second secondary index), for example, if the value of the TABLE partition column C1 is 1, the insertion should be performed into the partition TABLE1_ P1, a site corresponding to the TABLE space TS1 to which the TABLE1_ P1 belongs is site 1 (i.e., a third node is determined based on a data processing request), and thus, data insertion of the aggregate index and the other secondary indexes may be performed on site 1.
In addition, the global index key value corresponding to the record is 100, and it can be calculated according to the index partition range value that the global index should fall in the partition IDX1_ P2, and the site corresponding to the table space TS5 to which the index partition IDX1_ P2 belongs is the site 2, so that a target instruction including processing information can be sent to the site 2, and the processing information can be to-be-processed data and associated data (such as secondary key, aggregation key, table id, transaction id, and rowid information) carried in the data processing request (i.e., the target instruction including the processing information is sent to the third node). The station 2 may perform the insertion of the global index data after receiving the scheduling command and the required information.
FIG. 5 is a flowchart illustrating a data deleting method according to a second embodiment of the present invention, and as shown in FIG. 5, a DELETE statement (i.e., a data processing request) may be received first; then, calculating the site of the aggregation index of the record to be deleted according to the deletion statement, and deleting the aggregation index and the common secondary index record; then, calculating a global index site corresponding to the record, and sending related information to the global index site (namely, determining a third node based on the data processing request, and sending a target instruction containing processing information to the third node, wherein the target instruction is used for indicating the third node to process the processing information); and the global index site deletes the global index records.
Fig. 6 is a flowchart illustrating a data updating method according to a second embodiment of the present invention, and as shown in fig. 6, first an UPDATE statement may be received (i.e., a data updating request is received), and then data updating is performed on the aggregated index and the ordinary secondary index; calculating a corresponding global index station according to a record original value (namely original data), and sending related information to a target station, wherein the target station executes a global index record deleting instruction (namely, determining a fourth node based on first to-be-processed data in a data updating request, and sending a first target instruction containing first processing information to the fourth node); then, a global index site is calculated according to the updated value of the record (i.e., the updated data), relevant information is sent to the target site, and the target site executes a global index record insertion instruction (i.e., a fifth node is determined based on the second to-be-processed data in the data update request, and a second target instruction containing second processing information is sent to the fifth node).
EXAMPLE III
Fig. 7 is a schematic structural diagram of a data query device according to a third embodiment of the present invention. As shown in fig. 7, the apparatus includes:
a first receiving module 310, configured to receive a data query request sent by a client, where the data query request is used to request to query first target data;
a first determining module 320, configured to determine a first node based on the data query request and a global index of the distributed database system, where the first node is a node in the distributed database system that queries the first target data;
a first sending module 330, configured to send a data query instruction to the first node, and obtain the first target data returned by the first node based on the data query instruction.
In the data query device provided in the third embodiment of the present invention, a first receiving module 310 receives a data query request sent by a client, where the data query request is used to request to query first target data; determining, by a first determination module 320, a first node based on the data query request and a global index of the distributed database system, the first node being a node in the distributed database system that queries for the first target data; and sending a data query instruction to the first node through a sending module, and acquiring the first target data returned by the first node based on the data query instruction. By using the device, the first node is determined based on the data query request and the global index of the distributed database system, and the data query instruction is sent to the first node, so that the first target data corresponding to the data query instruction can be obtained, the function of performing data query in the distributed database system based on the global index is realized, and the processing performance of the distributed database system is improved.
Optionally, the first determining module 320 includes:
a first determining unit, configured to determine a second node based on the data query request and a global index of the distributed database system, where the second node is a node in which global index data is located in the distributed database system, and the global index data corresponds to the data query request;
an obtaining unit, configured to obtain the global index data from the second node;
and the second determining unit is used for determining the first node based on the aggregation key values in the global index data.
Optionally, the first determining unit is specifically configured to:
determining a first partition identification corresponding to an index key value based on the index key value in the data query request;
and determining a first target table space to which the first partition identifier belongs, and determining a node where the first target table space is located as a second node.
Optionally, the second determining unit is specifically configured to:
determining a second partition identification corresponding to an aggregation key value based on the aggregation key value in the global index data;
and determining a second target table space to which the second partition identifier belongs, and determining a node where the second target table space is located as a first node.
Optionally, the data query apparatus provided in this embodiment further includes:
the judging module is used for judging whether the first target data needs to be queried by using the global index according to a preset cost strategy and the data query request before the first node is determined based on the data query request and the global index of the distributed database system;
the first determining module 320 is specifically configured to:
if it is determined that a global index needs to be used, a first node is determined based on the data query request and the global index of the distributed database system.
Optionally, the data query apparatus provided in this embodiment further includes:
and the second sending module is used for sending the first target data to the client after the first target data returned by the first node based on the data query instruction is obtained.
Optionally, the data query apparatus provided in this embodiment further includes:
a second receiving module, configured to receive a data processing request, where the data processing request is used to request processing of to-be-processed indexes in the distributed database system, where the to-be-processed indexes include a first secondary index, and the first secondary index is a global index;
a second determining module, configured to determine a third node based on the data processing request, and send a target instruction including processing information to the third node, where the target instruction is used to instruct the third node to process the processing information, the third node is a node in the distributed database system that performs data processing on the index to be processed, and the processing information includes data to be processed that is carried in the data processing request.
Optionally, the to-be-processed index further includes an aggregation index and/or a second-level index.
Optionally, when the data processing request is a data update request, the third node includes a fourth node and a fifth node, and the second determining module is specifically configured to:
determining a fourth node based on first to-be-processed data in the data updating request, and sending a first target instruction containing first processing information to the fourth node, wherein the first processing information comprises the first to-be-processed data, and the first to-be-processed data is original data;
determining a fifth node based on second to-be-processed data in the data updating request, and sending a second target instruction containing second processing information to the fifth node, where the second processing information includes the second to-be-processed data, and the second to-be-processed data is updated data.
The data query device provided by the embodiment of the invention can execute the data query method provided by any embodiment of the invention, and has corresponding functional modules and beneficial effects of the execution method.
Example four
FIG. 8 illustrates a block diagram of a distributed database system 10 that can be used to implement embodiments of the present invention. The distributed database system is intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The distributed database system may also represent various forms of mobile devices, such as personal digital assistants, cellular phones, smart phones, wearable devices (e.g., helmets, glasses, watches, etc.), and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the inventions described and/or claimed herein.
As shown in fig. 8, the distributed database system 10 includes at least one processor 11, and a memory communicatively connected to the at least one processor 11, such as a Read Only Memory (ROM) 12, a Random Access Memory (RAM) 13, and the like, wherein the memory stores a computer program executable by the at least one processor, and the processor 11 can perform various suitable actions and processes according to the computer program stored in the Read Only Memory (ROM) 12 or the computer program loaded from the storage unit 18 into the Random Access Memory (RAM) 13. In the RAM 13, various programs and data necessary for the operation of the distributed database system 10 can also be stored. The processor 11, the ROM 12, and the RAM 13 are connected to each other via a bus 14. An input/output (I/O) interface 15 is also connected to the bus 14.
A number of components in the distributed database system 10 are connected to the I/O interface 15, including: an input unit 16 such as a keyboard, a mouse, or the like; an output unit 17 such as various types of displays, speakers, and the like; a storage unit 18 such as a magnetic disk, an optical disk, or the like; and a communication unit 19 such as a network card, modem, wireless communication transceiver, etc. The communication unit 19 allows the distributed database system 10 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.
The processor 11 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of processor 11 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various processors running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, or the like. Processor 11 performs the various methods and processes described above, such as method data queries.
In some embodiments, the method data query may be implemented as a computer program tangibly embodied in a computer-readable storage medium, such as storage unit 18. In some embodiments, part or all of the computer program can be loaded and/or installed onto the distributed database system 10 via the ROM 12 and/or the communication unit 19. When the computer program is loaded into RAM 13 and executed by processor 11, one or more steps of the method data query described above may be performed. Alternatively, in other embodiments, the processor 11 may be configured to perform method data queries by any other suitable means (e.g., by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.
A computer program for implementing the methods of the present invention may be written in any combination of one or more programming languages. These computer programs may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the computer programs, when executed by the processor, cause the functions/acts specified in the flowchart and/or block diagram block or blocks to be performed. A computer program can execute entirely on a machine, partly on a machine, as a stand-alone software package partly on a machine and partly on a remote machine or entirely on a remote machine or server.
In the context of the present invention, a computer-readable storage medium may be a tangible medium that can contain, or store a computer program for use by or in connection with an instruction execution system, apparatus, or device. A computer readable storage medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. Alternatively, the computer readable storage medium may be a machine readable signal medium. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with users, the systems and techniques described here can be implemented on a distributed database system having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the distributed database system. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), blockchain networks, and the internet.
The computing system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical host and VPS service are overcome.
It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present invention may be executed in parallel, sequentially, or in different orders, and are not limited herein as long as the desired results of the technical solution of the present invention can be achieved.
The above-described embodiments should not be construed as limiting the scope of the invention. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (12)

1. A data query method, applied to a distributed database system, the method comprising:
receiving a data query request sent by a client, wherein the data query request is used for requesting to query first target data;
determining a first node based on the data query request and a global index of the distributed database system, the first node being a node in the distributed database system that queries for the first target data;
and sending a data query instruction to the first node, and acquiring the first target data returned by the first node based on the data query instruction.
2. The method of claim 1, wherein determining the first node based on the data query request and a global index of the distributed database system comprises:
determining a second node based on the data query request and a global index of the distributed database system, wherein the second node is a node in which global index data is located in the distributed database system, and the global index data corresponds to the data query request;
obtaining the global index data from the second node;
and determining a first node based on the aggregation key value in the global index data.
3. The method of claim 2, wherein determining the second node based on the data query request and a global index of the distributed database system comprises:
determining a first partition identification corresponding to an index key value based on the index key value in the data query request;
and determining a first target table space to which the first partition identifier belongs, and determining a node where the first target table space is located as a second node.
4. The method of claim 2, wherein determining the first node based on the aggregate key value in the global index data comprises:
determining a second partition identification corresponding to an aggregation key value based on the aggregation key value in the global index data;
and determining a second target table space to which the second partition identifier belongs, and determining a node where the second target table space is located as a first node.
5. The method of claim 1, further comprising, prior to said determining a first node based on the data query request and a global index of the distributed database system:
judging whether the first target data needs to be inquired by using a global index or not according to a preset cost strategy and the data inquiry request;
the determining a first node based on the data query request and a global index of the distributed database system comprises:
if it is determined that a global index needs to be used, a first node is determined based on the data query request and a global index of the distributed database system.
6. The method according to claim 1, further comprising, after the obtaining the first target data returned by the first node based on the data query instruction:
and sending the first target data to the client.
7. The method of claim 1, further comprising:
receiving a data processing request, wherein the data processing request is used for requesting processing of indexes to be processed in the distributed database system, the indexes to be processed comprise first secondary indexes, and the first secondary indexes are global indexes;
determining a third node based on the data processing request, and sending a target instruction containing processing information to the third node, where the target instruction is used to instruct the third node to process the processing information, the third node is a node in the distributed database system that performs data processing on the index to be processed, and the processing information includes data to be processed carried in the data processing request.
8. The method of claim 7, wherein the pending index further comprises an aggregate index and/or a second level index.
9. The method according to claim 7, wherein when the data processing request is a data update request, the third node comprises a fourth node and a fifth node, and the determining a third node based on the data processing request and sending a target instruction containing processing information to the third node comprises:
determining a fourth node based on first to-be-processed data in the data updating request, and sending a first target instruction containing first processing information to the fourth node, wherein the first processing information comprises the first to-be-processed data, and the first to-be-processed data is original data;
determining a fifth node based on second to-be-processed data in the data updating request, and sending a second target instruction containing second processing information to the fifth node, where the second processing information includes the second to-be-processed data, and the second to-be-processed data is updated data.
10. A data query apparatus, configured in a distributed database system, the apparatus comprising:
the first receiving module is used for receiving a data query request sent by a client, wherein the data query request is used for requesting to query first target data;
a first determining module, configured to determine a first node based on the data query request and a global index of the distributed database system, where the first node is a node in the distributed database system that queries the first target data;
and the first sending module is used for sending a data query instruction to the first node and acquiring the first target data returned by the first node based on the data query instruction.
11. A distributed database system, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the data query method of any one of claims 1-9.
12. A computer-readable storage medium storing computer instructions for causing a processor to implement the data query method of any one of claims 1-9 when executed.
CN202211597050.3A 2022-12-12 2022-12-12 Data query method and device, distributed database system and medium Pending CN115964387A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211597050.3A CN115964387A (en) 2022-12-12 2022-12-12 Data query method and device, distributed database system and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211597050.3A CN115964387A (en) 2022-12-12 2022-12-12 Data query method and device, distributed database system and medium

Publications (1)

Publication Number Publication Date
CN115964387A true CN115964387A (en) 2023-04-14

Family

ID=87362754

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211597050.3A Pending CN115964387A (en) 2022-12-12 2022-12-12 Data query method and device, distributed database system and medium

Country Status (1)

Country Link
CN (1) CN115964387A (en)

Similar Documents

Publication Publication Date Title
CN108228817B (en) Data processing method, device and system
CN107402992B (en) Distributed NewSQL database system and full-text retrieval establishing method
US7890541B2 (en) Partition by growth table space
CN109299157B (en) Data export method and device for distributed big single table
US9734176B2 (en) Index merge ordering
CN108334596B (en) Massive relational data efficient parallel migration method for big data platform
CN114116613A (en) Metadata query method, equipment and storage medium based on distributed file system
US10558636B2 (en) Index page with latch-free access
CN108932258B (en) Data index processing method and device
US8548980B2 (en) Accelerating queries based on exact knowledge of specific rows satisfying local conditions
US20230153286A1 (en) Method and system for hybrid query based on cloud analysis scene, and storage medium
EP2662783A1 (en) Data archiving approach leveraging database layer functionality
CN111414422A (en) Data distribution method, device, equipment and storage medium
CN116028517A (en) Fusion database system and electronic equipment
CN116340318A (en) Method, device, equipment and storage medium for processing secondary index record
CN109710698A (en) A kind of data assemblage method, device, electronic equipment and medium
CN115964387A (en) Data query method and device, distributed database system and medium
CN115469810A (en) Data acquisition method, device, equipment and storage medium
CN115328917A (en) Query method, device, equipment and storage medium
KR102214697B1 (en) A computer program for providing space managrment for data storage in a database management system
CN115455010B (en) Data processing method based on milvus database, electronic equipment and storage medium
CN115563114A (en) Distributed unique key constraint method, device, equipment and storage medium
CN117596298A (en) Data processing method and device, electronic equipment and storage medium
Qi et al. Distributed structured database system HugeTable
CN116860700A (en) Method, device, equipment and medium for processing metadata in distributed file system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination