CN111782654A - Method for storing data in distributed database in partition mode - Google Patents

Method for storing data in distributed database in partition mode Download PDF

Info

Publication number
CN111782654A
CN111782654A CN202010617993.2A CN202010617993A CN111782654A CN 111782654 A CN111782654 A CN 111782654A CN 202010617993 A CN202010617993 A CN 202010617993A CN 111782654 A CN111782654 A CN 111782654A
Authority
CN
China
Prior art keywords
partition
tables
connection
node
relation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010617993.2A
Other languages
Chinese (zh)
Inventor
张豪
季业
刘阳
刘壮
王世航
陈明松
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inspur Cloud Information Technology Co Ltd
Original Assignee
Inspur Cloud Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inspur Cloud Information Technology Co Ltd filed Critical Inspur Cloud Information Technology Co Ltd
Priority to CN202010617993.2A priority Critical patent/CN111782654A/en
Publication of CN111782654A publication Critical patent/CN111782654A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2471Distributed queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9024Graphs; Linked lists

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Fuzzy Systems (AREA)
  • Mathematical Physics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a method for storing data in a distributed database in a partitioned mode, and belongs to the field of distributed databases. The method records the query history of the user by introducing a structure TABLE _ RELATION, and is convenient and effective. On the basis of taking the frequency as the basis of table dump, further analyzing by using a connection graph between tables, and storing all node tables of the strongly-connected subgraph in the same partition or node; and a few key intermediate nodes in the graph are stored redundantly, so that the query efficiency is ensured, and the safety and reliability of key information are also ensured. According to the method, through the partition dumping of the table, a plurality of query operations which are originally required to be performed on the partition or the node can be completed in one partition or node, so that the query efficiency is improved, and the query time is reduced. The method does not need to substantially change the existing database system, and is convenient to implement and deploy.

Description

Method for storing data in distributed database in partition mode
Technical Field
The invention belongs to the technical field of distributed databases, and particularly relates to a method for storing data in a distributed database in a partitioned mode.
Background
The adding, deleting, modifying and checking are the most common operations of the database, and when the series of operations are carried out, the process can not avoid the need of accessing the data of the table in the database, and in many cases, the process can not only access the data of one table. For example, querying the relevant information of a student and the school where the student is located needs to connect the two tables of student and unity and return a result meeting the condition.
For a traditional database, the perhaps most important factor affecting the efficiency of the connections between different tables is the cartesian product of the two tables; but for distributed databases, the communication time between partitions that are far apart also plays an important role.
For a distributed database, it is common to include multiple storage nodes, each storing different data, without regard to redundancy. When data is queried through a database, in many cases, the data is not only queried for one table, but also for multiple tables at the same time. In the latter case, the partitions (nodes) stored in different tables are uncertain, and it is certainly good if the tables are stored in the same partition (node), but if the tables are stored in different partitions (nodes), the tables need to be queried across partitions (nodes), which results in slow and long query efficiency. Therefore, there is a need to optimize this problem and improve the query efficiency of distributed databases.
Disclosure of Invention
The technical task of the invention is to solve the defects of the prior art and provide a method for storing data in a distributed database in a partitioning manner, so that the efficiency of the distributed database in query execution is improved, and the time for query is reduced.
The technical scheme adopted by the invention for solving the technical problems is as follows:
a method for storing data in a distributed database in a partition mode is characterized in that the method measures the strength of the relation between tables based on the frequency of table-to-table connection; then introducing a processing method of the undirected graph, and further partitioning the table.
Preferably, the scheme introduces a structure, and records the connection and relation between tables and partitions.
The solution is preferably such that the pressure in the tank,
in one aspect, the frequency of table connection is used to represent the strength of the relationship between a table and a partition (node) of a partitioned storage, specifically:
introducing a TABLE _ relative structure, and recording the RELATION strength between different TABLEs and each partition by maintaining the connection RELATION TABLE _ relative between one TABLE and the partition (node);
each operation related to table connection causes the record of the table related to the connection to change in TEABLE _ RELATION; in the initial state, the state of TABLE _ relative changes through certain DML operations;
then, the TABLE is stored in different partitions (nodes) according to the association strength represented by the connection frequency through the record of a TABLE _ RELATION TABLE;
on the other hand, a connection relation graph between tables is generated according to the connection relation between the tables, and all nodes (representative tables) in the strongly-connected subgraph in the connection relation graph are stored in the same partition.
Preferably, in the connection relationship diagram, the node represents the table, and the edge represents whether the table is connected or not.
The scheme preferably prioritizes the latter for tables that satisfy both aspects.
The solution is preferably such that the pressure in the tank,
taking the strength of the connection between the tables and the partitions as an index, and if the partition with the strongest connection of one table is changed, considering the partition dumping of the table;
from the aspect of the connection relation graph, when a strongly connected subgraph appears in the graph, all node tables in the same strongly connected subgraph should be stored in the same partition storage node.
The scheme preferably takes into account, if storage is allowed, that the few tables are stored redundantly in different partitions for the tables acting as intermediate nodes in the plurality of strongly connected subgraphs.
Preferably, when a table is found to need to be subjected to partition dumping, a proper time needs to be selected for the partition dumping, so that the normal production activity is not influenced or the influence on the production activity is reduced as much as possible.
Compared with the prior art, the method for storing the data in the distributed database in the partitioned mode has the following beneficial effects that:
1. the method is convenient and effective by introducing a structure TABLE _ RELATION to record the query history of the user.
2. On the basis of taking the frequency as the basis of table dump, further analyzing by using a connection graph between tables, and storing all node tables of the strongly-connected subgraph in the same partition or node; and a few key intermediate nodes in the graph are stored redundantly, so that the query efficiency is ensured, and the safety and reliability of key information are also ensured.
3. According to the method, through the partition dumping of the table, a plurality of query operations which are originally required to be performed on the partition or the node can be completed in one partition or node, so that the query efficiency is improved, and the query time is reduced.
4. The method does not need to substantially change the existing database system, and is convenient to implement and deploy.
Drawings
In order to more clearly describe the working principle of the method for partitioned storage of data in a distributed database according to the present invention, a schematic diagram is attached for further explanation.
FIG. 1 is a connection diagram of a first bid in accordance with an embodiment of the present invention;
FIG. 2 is a connection diagram of 7 tables in a database according to an embodiment.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to fig. 1 and 2 in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The invention relates to a method for storing data in a distributed database in a partition mode, which is based on the frequency of table-to-table connection and the strength of the relationship between a measurement table and the table; then introducing a processing method of the undirected graph, and further partitioning the table.
With reference to fig. 1, a structure is introduced to record the connection and relationship between tables and partitions, and the connection and relationship between tables and partitions mainly has the following two aspects:
in one aspect, the frequency of table connection is used to represent the strength of the relationship between a table and a partition (node) of a partitioned storage, specifically:
introducing a TABLE _ relative structure, and recording the RELATION strength between different TABLEs and each partition by maintaining the connection RELATION TABLE _ relative between one TABLE and the partition (node);
each operation related to table connection causes the record of the table related to the connection to change in TEABLE _ RELATION; in the initial state, the state of TABLE _ relative changes through certain DML operations;
then, the TABLE is stored in different partitions (nodes) according to the association strength represented by the connection frequency through the record of a TABLE _ RELATION TABLE;
on the other hand, a connection relation graph between tables is generated according to the connection relation between the tables, and all nodes (representative tables) in the strongly-connected subgraph in the connection relation graph are stored in the same partition.
In the connection relation graph, a node represents a table, and an edge represents whether the table is connected or not.
Wherein for a table that satisfies both aspects, the latter is prioritized.
The method comprises the following steps that the connection strength between tables and partitions is used as an index, and if the partition with the strongest connection of one table is changed, the partition dumping of the table is considered;
from the aspect of the connection relation graph, when a strongly connected subgraph appears in the graph, all node tables in the same strongly connected subgraph should be stored in the same partition storage node.
For the tables serving as intermediate nodes in the multiple strongly connected subgraphs, redundant storage on different partitions should be performed on the few tables in consideration of storage permission.
Through the data information in the TABLE _ relative TABLE and the information of the graph formed by the connection relationship between the TABLEs, the data stored in the database can be transferred and stored between the partitions, but the database needs to provide external services, and the update and storage of the TABLE partitions (nodes) cannot be performed at any time, so that a time period with small service quantity, such as 12 am every day, can be selected for performing the work of dumping the TABLE partitions (nodes).
Example one
Assume that the database has three partitions, partition A, B, C. There are 7 tables in the current database, tables t1, t2, t3, t4, t5, t6, t7, t 8. t1 and t2 are located in partition A, t3, t4 and t8 are located in partition B, and t5, t6 and t7 are located in partition C.
The general structure of TABLE _ RELATION is as follows:
CREATE TABLE TABLE table_relation (
table_name TEXT,
current_partition TEXT,
weight_with_partA BIGINT,
weight_with_partB BIGINT,
weight_with_partC BIGINT
}
for a sentence
SELECT * FROM t1,t2;
SELECT * FROM t1,t3;
SELECT * FROM t2,t3;
SELECT * FROM t3,t4;
SELECT * FROM t4,t5;
SELECT * FROM t4,t6;
SELECT * FROM t5,t6;
SELECT * FROM t4,t7;
SELECT * FROM t8,t7;
Each time each TABLE is connected with other TABLEs, the partition corresponding to the other TABLE has a weight of +1, and after the command, the data of TABLE _ RELATION is updated as follows:
Figure RE-233303DEST_PATH_IMAGE002
the connection diagram of the corresponding table is shown in fig. 2.
According to the data in TABLE _ correlation, the partitioning result obtained by the frequency method should be a (t1, t 2), B (t3, t7, t8), C (t4, t5, t6), but according to the figure, t1, t2, t3 should be divided into one partition, and t4, t5, t6 should be divided into one partition. In summary, combining the two approaches, the final table partitioning results are a (t1, t2, t3), B (t7, t8), C (t4, t5, t 6).
If the partition result is not changed, the data of the database is subjected to partition dump at a proper time. In the subsequent query process, the probability and the situation of the cross-partition search are reduced.
It is to be understood that the phraseology and terminology employed herein are for the purpose of description and that the present method is not to be regarded as limited to such terminology and terminology. The use of such terms and expressions is not intended to exclude any equivalents of the features shown and described (or portions thereof), and it is recognized that various modifications may be made within the scope of the claims. Other modifications, variations, and alternatives are also possible. Accordingly, the claims should be looked to in order to cover all such equivalents.
Also, it should be noted that while the method has been described with reference to the specific embodiments, those skilled in the art will recognize that the above embodiments are merely illustrative of the method and that various changes or substitutions of equivalents may be made without departing from the spirit of the method, and therefore, it is intended that all changes and modifications to the above embodiments be within the scope of the appended claims.

Claims (8)

1. A method for storing data in a distributed database in a partition mode is characterized in that the method measures the strength of the relation between tables based on the frequency of table-to-table connection; then introducing a processing method of the undirected graph, and further partitioning the table.
2. The method according to claim 1, wherein a structure is introduced to record the connection and association between tables and partitions.
3. The method of claim 2, wherein the data is stored in the distributed database in a partitioned manner,
in one aspect, the frequency of table connection is used to represent the strength of the relationship between a table and a partition (node) of a partitioned storage, specifically:
introducing a TABLE _ relative structure, and recording the RELATION strength between different TABLEs and each partition by maintaining the connection RELATION TABLE _ relative between one TABLE and the partition (node);
each operation related to table connection causes the record of the table related to the connection to change in TEABLE _ RELATION; in the initial state, the state of TABLE _ relative changes through certain DML operations;
then, the TABLE is stored in different partitions (nodes) according to the association strength represented by the connection frequency through the record of a TABLE _ RELATION TABLE;
on the other hand, a connection relation graph between tables is generated according to the connection relation between the tables, and all nodes (representative tables) in the strongly-connected subgraph in the connection relation graph are stored in the same partition.
4. The method according to claim 3, wherein in the connection relationship diagram, the node represents the table, and the edge represents whether the table is connected to the table or not.
5. A method according to claim 3, wherein tables satisfying both aspects are prioritized.
6. The method of claim 3, wherein the data is stored in the distributed database in a partitioned manner,
taking the strength of the connection between the tables and the partitions as an index, and if the partition with the strongest connection of one table is changed, considering the partition dumping of the table;
from the aspect of the connection relation graph, when a strongly connected subgraph appears in the graph, all node tables in the same strongly connected subgraph should be stored in the same partition storage node.
7. A method according to claim 3, characterized in that, for the tables acting as intermediate nodes in the strongly connected subgraphs, if the storage allows, it should be considered that the few tables are redundantly stored in different partitions.
8. The method as claimed in claim 4, wherein when it is found that there is a table to be subjected to partition dumping, it is required to select an appropriate time for the table to be subjected to partition dumping, so as to avoid affecting normal production activities or minimize the impact on the production activities.
CN202010617993.2A 2020-07-01 2020-07-01 Method for storing data in distributed database in partition mode Pending CN111782654A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010617993.2A CN111782654A (en) 2020-07-01 2020-07-01 Method for storing data in distributed database in partition mode

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010617993.2A CN111782654A (en) 2020-07-01 2020-07-01 Method for storing data in distributed database in partition mode

Publications (1)

Publication Number Publication Date
CN111782654A true CN111782654A (en) 2020-10-16

Family

ID=72761612

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010617993.2A Pending CN111782654A (en) 2020-07-01 2020-07-01 Method for storing data in distributed database in partition mode

Country Status (1)

Country Link
CN (1) CN111782654A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112434029A (en) * 2020-11-02 2021-03-02 浙商银行股份有限公司 Table storage structure construction method for efficiently supporting mixed distributed transaction and analytic query
CN113254527A (en) * 2021-04-22 2021-08-13 杭州欧若数网科技有限公司 Optimization method of distributed storage map data, electronic device and storage medium
EP4336373A1 (en) * 2022-09-06 2024-03-13 Sap Se Configuring a distributed database

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112434029A (en) * 2020-11-02 2021-03-02 浙商银行股份有限公司 Table storage structure construction method for efficiently supporting mixed distributed transaction and analytic query
CN113254527A (en) * 2021-04-22 2021-08-13 杭州欧若数网科技有限公司 Optimization method of distributed storage map data, electronic device and storage medium
CN113254527B (en) * 2021-04-22 2022-04-08 杭州欧若数网科技有限公司 Optimization method of distributed storage map data, electronic device and storage medium
EP4336373A1 (en) * 2022-09-06 2024-03-13 Sap Se Configuring a distributed database

Similar Documents

Publication Publication Date Title
CN111782654A (en) Method for storing data in distributed database in partition mode
CN112559554B (en) Query statement optimization method and device
CN106547796B (en) Database execution method and device
US7886028B2 (en) Method and system for system migration
US8332366B2 (en) System and method for automatic weight generation for probabilistic matching
US7856436B2 (en) Dynamic holds of record dispositions during record management
US20150088857A1 (en) Method and system for performing query optimization using a hybrid execution plan
CN103646111A (en) System and method for realizing real-time data association in big data environment
CN104424287B (en) Data query method and apparatus
WO2007139751A2 (en) Method and system for indexing information about entities with respect to hierarchies
US20150154194A1 (en) Non-exclusionary search within in-memory databases
CN109344157A (en) Read and write abruption method, apparatus, computer equipment and storage medium
CN107783980A (en) Index data generates and data query method and device, storage and inquiry system
US11150996B2 (en) Method for optimizing index, master database node and subscriber database node
US11782894B2 (en) User connection degree measurement
CN102654863A (en) Real-time database history data organizational management method
CN110263104A (en) JSON character string processing method and device
CN104598652B (en) A kind of data base query method and device
Mukherjee Synthesis of non-replicated dynamic fragment allocation algorithm in distributed database systems
CN116521956A (en) Graph database query method and device, electronic equipment and storage medium
US20170270149A1 (en) Database systems with re-ordered replicas and methods of accessing and backing up databases
CN108334565A (en) A kind of data mixing storage organization, data store query method, terminal and medium
Batini et al. A survey of data quality issues in cooperative information systems
CN106649584A (en) Index processing method and device in master-slave database system
Mukherjee Non-replicated dynamic fragment allocation in distributed database systems

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination