CN111782654A - Method for storing data in distributed database in partition mode - Google Patents
Method for storing data in distributed database in partition mode Download PDFInfo
- Publication number
- CN111782654A CN111782654A CN202010617993.2A CN202010617993A CN111782654A CN 111782654 A CN111782654 A CN 111782654A CN 202010617993 A CN202010617993 A CN 202010617993A CN 111782654 A CN111782654 A CN 111782654A
- Authority
- CN
- China
- Prior art keywords
- partition
- tables
- connection
- node
- relation
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/2282—Tablespace storage structures; Management thereof
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2471—Distributed queries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/27—Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/901—Indexing; Data structures therefor; Storage structures
- G06F16/9024—Graphs; Linked lists
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Software Systems (AREA)
- Computing Systems (AREA)
- Fuzzy Systems (AREA)
- Mathematical Physics (AREA)
- Probability & Statistics with Applications (AREA)
- Computational Linguistics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention provides a method for storing data in a distributed database in a partitioned mode, and belongs to the field of distributed databases. The method records the query history of the user by introducing a structure TABLE _ RELATION, and is convenient and effective. On the basis of taking the frequency as the basis of table dump, further analyzing by using a connection graph between tables, and storing all node tables of the strongly-connected subgraph in the same partition or node; and a few key intermediate nodes in the graph are stored redundantly, so that the query efficiency is ensured, and the safety and reliability of key information are also ensured. According to the method, through the partition dumping of the table, a plurality of query operations which are originally required to be performed on the partition or the node can be completed in one partition or node, so that the query efficiency is improved, and the query time is reduced. The method does not need to substantially change the existing database system, and is convenient to implement and deploy.
Description
Technical Field
The invention belongs to the technical field of distributed databases, and particularly relates to a method for storing data in a distributed database in a partitioned mode.
Background
The adding, deleting, modifying and checking are the most common operations of the database, and when the series of operations are carried out, the process can not avoid the need of accessing the data of the table in the database, and in many cases, the process can not only access the data of one table. For example, querying the relevant information of a student and the school where the student is located needs to connect the two tables of student and unity and return a result meeting the condition.
For a traditional database, the perhaps most important factor affecting the efficiency of the connections between different tables is the cartesian product of the two tables; but for distributed databases, the communication time between partitions that are far apart also plays an important role.
For a distributed database, it is common to include multiple storage nodes, each storing different data, without regard to redundancy. When data is queried through a database, in many cases, the data is not only queried for one table, but also for multiple tables at the same time. In the latter case, the partitions (nodes) stored in different tables are uncertain, and it is certainly good if the tables are stored in the same partition (node), but if the tables are stored in different partitions (nodes), the tables need to be queried across partitions (nodes), which results in slow and long query efficiency. Therefore, there is a need to optimize this problem and improve the query efficiency of distributed databases.
Disclosure of Invention
The technical task of the invention is to solve the defects of the prior art and provide a method for storing data in a distributed database in a partitioning manner, so that the efficiency of the distributed database in query execution is improved, and the time for query is reduced.
The technical scheme adopted by the invention for solving the technical problems is as follows:
a method for storing data in a distributed database in a partition mode is characterized in that the method measures the strength of the relation between tables based on the frequency of table-to-table connection; then introducing a processing method of the undirected graph, and further partitioning the table.
Preferably, the scheme introduces a structure, and records the connection and relation between tables and partitions.
The solution is preferably such that the pressure in the tank,
in one aspect, the frequency of table connection is used to represent the strength of the relationship between a table and a partition (node) of a partitioned storage, specifically:
introducing a TABLE _ relative structure, and recording the RELATION strength between different TABLEs and each partition by maintaining the connection RELATION TABLE _ relative between one TABLE and the partition (node);
each operation related to table connection causes the record of the table related to the connection to change in TEABLE _ RELATION; in the initial state, the state of TABLE _ relative changes through certain DML operations;
then, the TABLE is stored in different partitions (nodes) according to the association strength represented by the connection frequency through the record of a TABLE _ RELATION TABLE;
on the other hand, a connection relation graph between tables is generated according to the connection relation between the tables, and all nodes (representative tables) in the strongly-connected subgraph in the connection relation graph are stored in the same partition.
Preferably, in the connection relationship diagram, the node represents the table, and the edge represents whether the table is connected or not.
The scheme preferably prioritizes the latter for tables that satisfy both aspects.
The solution is preferably such that the pressure in the tank,
taking the strength of the connection between the tables and the partitions as an index, and if the partition with the strongest connection of one table is changed, considering the partition dumping of the table;
from the aspect of the connection relation graph, when a strongly connected subgraph appears in the graph, all node tables in the same strongly connected subgraph should be stored in the same partition storage node.
The scheme preferably takes into account, if storage is allowed, that the few tables are stored redundantly in different partitions for the tables acting as intermediate nodes in the plurality of strongly connected subgraphs.
Preferably, when a table is found to need to be subjected to partition dumping, a proper time needs to be selected for the partition dumping, so that the normal production activity is not influenced or the influence on the production activity is reduced as much as possible.
Compared with the prior art, the method for storing the data in the distributed database in the partitioned mode has the following beneficial effects that:
1. the method is convenient and effective by introducing a structure TABLE _ RELATION to record the query history of the user.
2. On the basis of taking the frequency as the basis of table dump, further analyzing by using a connection graph between tables, and storing all node tables of the strongly-connected subgraph in the same partition or node; and a few key intermediate nodes in the graph are stored redundantly, so that the query efficiency is ensured, and the safety and reliability of key information are also ensured.
3. According to the method, through the partition dumping of the table, a plurality of query operations which are originally required to be performed on the partition or the node can be completed in one partition or node, so that the query efficiency is improved, and the query time is reduced.
4. The method does not need to substantially change the existing database system, and is convenient to implement and deploy.
Drawings
In order to more clearly describe the working principle of the method for partitioned storage of data in a distributed database according to the present invention, a schematic diagram is attached for further explanation.
FIG. 1 is a connection diagram of a first bid in accordance with an embodiment of the present invention;
FIG. 2 is a connection diagram of 7 tables in a database according to an embodiment.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to fig. 1 and 2 in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The invention relates to a method for storing data in a distributed database in a partition mode, which is based on the frequency of table-to-table connection and the strength of the relationship between a measurement table and the table; then introducing a processing method of the undirected graph, and further partitioning the table.
With reference to fig. 1, a structure is introduced to record the connection and relationship between tables and partitions, and the connection and relationship between tables and partitions mainly has the following two aspects:
in one aspect, the frequency of table connection is used to represent the strength of the relationship between a table and a partition (node) of a partitioned storage, specifically:
introducing a TABLE _ relative structure, and recording the RELATION strength between different TABLEs and each partition by maintaining the connection RELATION TABLE _ relative between one TABLE and the partition (node);
each operation related to table connection causes the record of the table related to the connection to change in TEABLE _ RELATION; in the initial state, the state of TABLE _ relative changes through certain DML operations;
then, the TABLE is stored in different partitions (nodes) according to the association strength represented by the connection frequency through the record of a TABLE _ RELATION TABLE;
on the other hand, a connection relation graph between tables is generated according to the connection relation between the tables, and all nodes (representative tables) in the strongly-connected subgraph in the connection relation graph are stored in the same partition.
In the connection relation graph, a node represents a table, and an edge represents whether the table is connected or not.
Wherein for a table that satisfies both aspects, the latter is prioritized.
The method comprises the following steps that the connection strength between tables and partitions is used as an index, and if the partition with the strongest connection of one table is changed, the partition dumping of the table is considered;
from the aspect of the connection relation graph, when a strongly connected subgraph appears in the graph, all node tables in the same strongly connected subgraph should be stored in the same partition storage node.
For the tables serving as intermediate nodes in the multiple strongly connected subgraphs, redundant storage on different partitions should be performed on the few tables in consideration of storage permission.
Through the data information in the TABLE _ relative TABLE and the information of the graph formed by the connection relationship between the TABLEs, the data stored in the database can be transferred and stored between the partitions, but the database needs to provide external services, and the update and storage of the TABLE partitions (nodes) cannot be performed at any time, so that a time period with small service quantity, such as 12 am every day, can be selected for performing the work of dumping the TABLE partitions (nodes).
Example one
Assume that the database has three partitions, partition A, B, C. There are 7 tables in the current database, tables t1, t2, t3, t4, t5, t6, t7, t 8. t1 and t2 are located in partition A, t3, t4 and t8 are located in partition B, and t5, t6 and t7 are located in partition C.
The general structure of TABLE _ RELATION is as follows:
CREATE TABLE TABLE table_relation (
table_name TEXT,
current_partition TEXT,
weight_with_partA BIGINT,
weight_with_partB BIGINT,
weight_with_partC BIGINT
}
for a sentence
SELECT * FROM t1,t2;
SELECT * FROM t1,t3;
SELECT * FROM t2,t3;
SELECT * FROM t3,t4;
SELECT * FROM t4,t5;
SELECT * FROM t4,t6;
SELECT * FROM t5,t6;
SELECT * FROM t4,t7;
SELECT * FROM t8,t7;
Each time each TABLE is connected with other TABLEs, the partition corresponding to the other TABLE has a weight of +1, and after the command, the data of TABLE _ RELATION is updated as follows:
the connection diagram of the corresponding table is shown in fig. 2.
According to the data in TABLE _ correlation, the partitioning result obtained by the frequency method should be a (t1, t 2), B (t3, t7, t8), C (t4, t5, t6), but according to the figure, t1, t2, t3 should be divided into one partition, and t4, t5, t6 should be divided into one partition. In summary, combining the two approaches, the final table partitioning results are a (t1, t2, t3), B (t7, t8), C (t4, t5, t 6).
If the partition result is not changed, the data of the database is subjected to partition dump at a proper time. In the subsequent query process, the probability and the situation of the cross-partition search are reduced.
It is to be understood that the phraseology and terminology employed herein are for the purpose of description and that the present method is not to be regarded as limited to such terminology and terminology. The use of such terms and expressions is not intended to exclude any equivalents of the features shown and described (or portions thereof), and it is recognized that various modifications may be made within the scope of the claims. Other modifications, variations, and alternatives are also possible. Accordingly, the claims should be looked to in order to cover all such equivalents.
Also, it should be noted that while the method has been described with reference to the specific embodiments, those skilled in the art will recognize that the above embodiments are merely illustrative of the method and that various changes or substitutions of equivalents may be made without departing from the spirit of the method, and therefore, it is intended that all changes and modifications to the above embodiments be within the scope of the appended claims.
Claims (8)
1. A method for storing data in a distributed database in a partition mode is characterized in that the method measures the strength of the relation between tables based on the frequency of table-to-table connection; then introducing a processing method of the undirected graph, and further partitioning the table.
2. The method according to claim 1, wherein a structure is introduced to record the connection and association between tables and partitions.
3. The method of claim 2, wherein the data is stored in the distributed database in a partitioned manner,
in one aspect, the frequency of table connection is used to represent the strength of the relationship between a table and a partition (node) of a partitioned storage, specifically:
introducing a TABLE _ relative structure, and recording the RELATION strength between different TABLEs and each partition by maintaining the connection RELATION TABLE _ relative between one TABLE and the partition (node);
each operation related to table connection causes the record of the table related to the connection to change in TEABLE _ RELATION; in the initial state, the state of TABLE _ relative changes through certain DML operations;
then, the TABLE is stored in different partitions (nodes) according to the association strength represented by the connection frequency through the record of a TABLE _ RELATION TABLE;
on the other hand, a connection relation graph between tables is generated according to the connection relation between the tables, and all nodes (representative tables) in the strongly-connected subgraph in the connection relation graph are stored in the same partition.
4. The method according to claim 3, wherein in the connection relationship diagram, the node represents the table, and the edge represents whether the table is connected to the table or not.
5. A method according to claim 3, wherein tables satisfying both aspects are prioritized.
6. The method of claim 3, wherein the data is stored in the distributed database in a partitioned manner,
taking the strength of the connection between the tables and the partitions as an index, and if the partition with the strongest connection of one table is changed, considering the partition dumping of the table;
from the aspect of the connection relation graph, when a strongly connected subgraph appears in the graph, all node tables in the same strongly connected subgraph should be stored in the same partition storage node.
7. A method according to claim 3, characterized in that, for the tables acting as intermediate nodes in the strongly connected subgraphs, if the storage allows, it should be considered that the few tables are redundantly stored in different partitions.
8. The method as claimed in claim 4, wherein when it is found that there is a table to be subjected to partition dumping, it is required to select an appropriate time for the table to be subjected to partition dumping, so as to avoid affecting normal production activities or minimize the impact on the production activities.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010617993.2A CN111782654A (en) | 2020-07-01 | 2020-07-01 | Method for storing data in distributed database in partition mode |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010617993.2A CN111782654A (en) | 2020-07-01 | 2020-07-01 | Method for storing data in distributed database in partition mode |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111782654A true CN111782654A (en) | 2020-10-16 |
Family
ID=72761612
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010617993.2A Pending CN111782654A (en) | 2020-07-01 | 2020-07-01 | Method for storing data in distributed database in partition mode |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111782654A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112434029A (en) * | 2020-11-02 | 2021-03-02 | 浙商银行股份有限公司 | Table storage structure construction method for efficiently supporting mixed distributed transaction and analytic query |
CN113254527A (en) * | 2021-04-22 | 2021-08-13 | 杭州欧若数网科技有限公司 | Optimization method of distributed storage map data, electronic device and storage medium |
EP4336373A1 (en) * | 2022-09-06 | 2024-03-13 | Sap Se | Configuring a distributed database |
-
2020
- 2020-07-01 CN CN202010617993.2A patent/CN111782654A/en active Pending
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112434029A (en) * | 2020-11-02 | 2021-03-02 | 浙商银行股份有限公司 | Table storage structure construction method for efficiently supporting mixed distributed transaction and analytic query |
CN113254527A (en) * | 2021-04-22 | 2021-08-13 | 杭州欧若数网科技有限公司 | Optimization method of distributed storage map data, electronic device and storage medium |
CN113254527B (en) * | 2021-04-22 | 2022-04-08 | 杭州欧若数网科技有限公司 | Optimization method of distributed storage map data, electronic device and storage medium |
EP4336373A1 (en) * | 2022-09-06 | 2024-03-13 | Sap Se | Configuring a distributed database |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111782654A (en) | Method for storing data in distributed database in partition mode | |
CN112559554B (en) | Query statement optimization method and device | |
CN106547796B (en) | Database execution method and device | |
US7886028B2 (en) | Method and system for system migration | |
US8332366B2 (en) | System and method for automatic weight generation for probabilistic matching | |
US7856436B2 (en) | Dynamic holds of record dispositions during record management | |
US20150088857A1 (en) | Method and system for performing query optimization using a hybrid execution plan | |
CN103646111A (en) | System and method for realizing real-time data association in big data environment | |
CN104424287B (en) | Data query method and apparatus | |
WO2007139751A2 (en) | Method and system for indexing information about entities with respect to hierarchies | |
US20150154194A1 (en) | Non-exclusionary search within in-memory databases | |
CN109344157A (en) | Read and write abruption method, apparatus, computer equipment and storage medium | |
CN107783980A (en) | Index data generates and data query method and device, storage and inquiry system | |
US11150996B2 (en) | Method for optimizing index, master database node and subscriber database node | |
US11782894B2 (en) | User connection degree measurement | |
CN102654863A (en) | Real-time database history data organizational management method | |
CN110263104A (en) | JSON character string processing method and device | |
CN104598652B (en) | A kind of data base query method and device | |
Mukherjee | Synthesis of non-replicated dynamic fragment allocation algorithm in distributed database systems | |
CN116521956A (en) | Graph database query method and device, electronic equipment and storage medium | |
US20170270149A1 (en) | Database systems with re-ordered replicas and methods of accessing and backing up databases | |
CN108334565A (en) | A kind of data mixing storage organization, data store query method, terminal and medium | |
Batini et al. | A survey of data quality issues in cooperative information systems | |
CN106649584A (en) | Index processing method and device in master-slave database system | |
Mukherjee | Non-replicated dynamic fragment allocation in distributed database systems |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |