CN112115146A - Data redistribution method, device, equipment and storage medium of database - Google Patents

Data redistribution method, device, equipment and storage medium of database Download PDF

Info

Publication number
CN112115146A
CN112115146A CN202010969854.6A CN202010969854A CN112115146A CN 112115146 A CN112115146 A CN 112115146A CN 202010969854 A CN202010969854 A CN 202010969854A CN 112115146 A CN112115146 A CN 112115146A
Authority
CN
China
Prior art keywords
data
redistribution
data table
database
incremental
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010969854.6A
Other languages
Chinese (zh)
Other versions
CN112115146B (en
Inventor
王鸿翔
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Kingbase Information Technologies Co Ltd
Original Assignee
Beijing Kingbase Information Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Kingbase Information Technologies Co Ltd filed Critical Beijing Kingbase Information Technologies Co Ltd
Priority to CN202010969854.6A priority Critical patent/CN112115146B/en
Publication of CN112115146A publication Critical patent/CN112115146A/en
Application granted granted Critical
Publication of CN112115146B publication Critical patent/CN112115146B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The disclosure provides a data redistribution method, a device, equipment and a storage medium of a database. The method comprises the following steps: receiving a redistribution command sent by a client, wherein the redistribution command is used for instructing redistribution of basic data stored in a first data table to generate a second data table, the first data table stores data according to a first preset rule, the second data table stores data according to a second preset rule, redistributing the basic data, writing the basic data into a second data table, receiving a write-in instruction of incremental data sent by a client in a period of redistributing the basic data, writing the incremental data into a first data table, after the redistribution of the basic data is finished, redistributing the incremental data, writing the incremental data into the second data table, prohibiting the first data table from writing data during the time period for redistributing the incremental data, and after the incremental data are redistributed, pointing the pointer of the first data table to the second data table.

Description

Data redistribution method, device, equipment and storage medium of database
Technical Field
The present disclosure relates to the field of database technologies, and in particular, to a method, an apparatus, a device, and a storage medium for redistributing data in a database.
Background
Under the condition of service requirement change, the shared-nothing distributed database needs to be expanded or reduced, that is, nodes are increased or decreased, so that the purpose of changing the storage or calculation capacity of the shared-nothing distributed database is achieved, and the change requirement of the service is adapted.
Taking database capacity expansion as an example, the conventional database capacity expansion method needs to shut down the database, add nodes into the database, and redistribute table data to each node of a new database, however, the conventional capacity expansion method cannot access the database in the data redistribution process, which results in failure of corresponding database business.
In order to solve the above problems of the conventional database capacity expansion method, an online capacity expansion method may be adopted, that is, without shutdown, a new node may be added online, data redistribution of each data table may be performed online, any operation such as normal read/write may be performed on a data table on which data redistribution is not being performed, and any operation may not be performed on a table being redistributed.
However, this would result in a service interruption or long periods of unresponsiveness that would require access to the tables being redistributed, reducing the availability of the database.
Disclosure of Invention
To solve the technical problem or at least partially solve the technical problem, the present disclosure provides a data redistribution method, apparatus, device and storage medium for a database.
In a first aspect, the present disclosure provides a data redistribution method for a database, including:
receiving a redistribution command sent by a client, wherein the redistribution command is used for indicating to redistribute basic data stored in a first data table to generate a second data table, the first data table stores data according to a first preset rule, and the second data table stores data according to a second preset rule;
redistributing the basic data and writing the basic data into the second data table;
in the time period of redistributing the basic data, receiving a writing instruction of incremental data sent by the client, and writing the incremental data into the first data table;
after the redistribution of the basic data is finished, the incremental data are redistributed and written into a second data table;
forbidding the first data table to write data in the time period for redistributing the incremental data;
and after the incremental data are redistributed, pointing the pointer of the first data table to a second data table.
Optionally, after the pointing the pointer of the first data table to the second data table, the method further includes:
and receiving a write-in instruction sent by the client, and writing data corresponding to the write-in instruction into the second data table.
Optionally, after the redistribution of the basic data is finished, before prohibiting the data from being written in the first data table, the method further includes:
an interrupt is performed on the outstanding write transaction.
Optionally, before the receiving the redistribution command sent by the client, the method further includes:
and receiving a second preset rule sent by the client.
Optionally, before receiving the second preset rule sent by the client, the method further includes:
and receiving a node change instruction sent by a client, wherein the node change instruction is used for indicating that storage nodes are increased or decreased.
In a second aspect, the present disclosure provides a data redistribution device for a database, including:
the system comprises a first receiving module, a second receiving module and a third receiving module, wherein the first receiving module is used for receiving a redistribution command sent by a client, the redistribution command is used for indicating that basic data stored in a first data table is redistributed to generate a second data table, the first data table stores data according to a first preset rule, and the second data table stores data according to a second preset rule;
the basic data redistribution module is used for redistributing the basic data and writing the basic data into the second data table;
the second receiving module is used for receiving a writing instruction of the incremental data sent by the client and writing the incremental data into the first data table in the time period when the basic data redistribution module redistributes the basic data;
the incremental data redistribution module is used for redistributing the incremental data after the basic data redistribution module redistributes the basic data and writing the incremental data into a second data table;
the forbidding module is used for forbidding the first data table to write data in the time period when the incremental data redistribution module redistributes the incremental data;
and the pointing module is used for pointing the pointer of the first data table to a second data table after the incremental data redistribution module redistributes the incremental data.
Optionally, the apparatus further comprises:
and the writing module is used for receiving a writing instruction sent by the client and writing data corresponding to the writing instruction into the second data table.
Optionally, the apparatus further comprises:
and the interruption module is used for interrupting the uncompleted write transaction.
Optionally, the apparatus further comprises:
and the third receiving module is used for receiving a node change instruction sent by the client, wherein the node change instruction is used for indicating that the storage nodes are increased or decreased.
In a third aspect, the present disclosure provides a data redistribution device for a database, including:
a memory for storing processor-executable instructions;
a processor for implementing the method according to the first aspect as described above when the computer program is executed.
In a fourth aspect, the present disclosure provides a computer-readable storage medium having stored therein computer-executable instructions for implementing the data redistribution method for a database as described in the first aspect above when the computer-executable instructions are executed by a processor.
Compared with the prior art, the technical scheme provided by the embodiment of the disclosure has the following advantages: in the process of redistributing the database, the first stage redistributes the basic data to allow reading and/or writing of data, and the second stage redistributes the incremental data to allow reading of data and prohibit writing of data.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure.
In order to more clearly illustrate the embodiments or technical solutions in the prior art of the present disclosure, the drawings used in the description of the embodiments or prior art will be briefly described below, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without inventive exercise.
FIG. 1 is a schematic diagram of a database system framework;
fig. 2 is a schematic flowchart of a data redistribution method for a database according to an embodiment of the present disclosure;
fig. 3 is a schematic flowchart of another data redistribution method for a database according to an embodiment of the present disclosure;
fig. 4 is a schematic flowchart of a data redistribution method for a database according to an embodiment of the present disclosure;
fig. 5 is a schematic flowchart of a data redistribution method for a database according to an embodiment of the present disclosure;
fig. 6 is a schematic flowchart of a data redistribution method for a database according to an embodiment of the present disclosure;
fig. 7 is a schematic structural diagram of a data redistribution device for a database according to an embodiment of the present disclosure;
fig. 8 is a schematic structural diagram of a data redistribution device of a database according to an embodiment of the present disclosure.
Detailed Description
In order that the above objects, features and advantages of the present disclosure may be more clearly understood, aspects of the present disclosure will be further described below. It should be noted that the embodiments and features of the embodiments of the present disclosure may be combined with each other without conflict.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present disclosure, but the present disclosure may be practiced in other ways than those described herein; it is to be understood that the embodiments disclosed in the specification are only a few embodiments of the present disclosure, and not all embodiments.
The terms to which the present invention relates will be explained first:
the distributed database is a logically unified database formed by connecting a plurality of physically dispersed database units by using a computer network. Wherein each connected database unit is referred to as a node. The distributed database includes at least two nodes. The nodes may be physical nodes distributed in different places, or may be logical nodes distributed in the same physical database.
Fig. 1 is a schematic diagram of a database system framework, as shown in fig. 1, the database system includes at least one client and a shared-nothing distributed database. The shared-nothing distributed database may include a master node and at least one node, such as 3 nodes shown in fig. 1, node 2 and node 3, respectively, it is understood that the number of nodes in fig. 1 is only an example and does not constitute a limitation to the present disclosure. At least one client is connected with the main node of the database. The master node is deployed on a server or a terminal device. One node may be deployed on one server or terminal device, or a plurality of nodes may be deployed on one server or terminal device.
The master node stores system tables including, but not limited to: node information, metadata of the data table and distribution of data in the data table at each node. The master node does not store the data in the data tables, but stores the data in the respective data tables of the respective nodes according to the distribution rule.
When the service requirement of the existing shared-nothing distributed database changes, the capacity of the database needs to be expanded or reduced, that is, nodes are increased or decreased, so that the storage or calculation capacity of the database is changed to adapt to the requirement of the service change. An application scenario of the present invention is described below by taking database capacity expansion as an example.
With the rapid development of the internet and internet of things technology, more and more data need to be stored, and the application of a distributed database is more and more extensive based on a Massively Parallel Processing (MPP) system without sharing, and because the originally designed database cluster cannot meet the read-write requirements of the existing situation when the increase of the application data amount reaches a certain scale, a host needs to be added to expand the storage or calculation capacity of the distributed cluster. The traditional database capacity expansion method needs to shut down the database, add the newly added nodes into the database, and redistribute the table data to each node of the new database, however, the traditional capacity expansion method cannot access the database in the data redistribution process, so that the corresponding database service cannot be performed.
In order to solve the above problems of the conventional database capacity expansion method, an online capacity expansion method may be adopted, that is, a new node is added online to the database without shutdown, data redistribution of each data table of the database is performed online, and any operation such as normal read/write and the like may be performed on a table on which data redistribution is not being performed, but any operation performed on the redistributed table by other tasks is not supported, that is, a lock for prohibiting read/write is added to the redistributed table. For example, the redistribution operation of online capacity expansion can be realized by using a consistent hash (hash) algorithm and the like in the MPP distributed database to distribute data.
Compared with the traditional database capacity expansion method, the online capacity expansion method can meet the requirement of not interrupting the service of most business applications because most tables can be accessed, however, business applications needing to access redistributed tables cannot access the tables, so that the service is interrupted or no response is carried out for a long time, and the availability of the database is reduced. For example, for some analytic service scenarios, query analysis needs to be performed on a database in real time, and data is periodically entered, and if data query or data entry cannot be performed, the service cannot be used, which affects normal operation of the service.
To solve the above technical problems in the prior art, the present disclosure provides a method for expanding a database, in the scene of the change of the number of the database nodes, the method of the disclosure allows the reading operation and the writing operation of the first data table after the database receives the redistribution command sent by the client, and creating a second data table, redistributing the basic data in the first data table into the second data table, in the process, if there are data read operation and data write operation, the first data table is directly operated, after the redistribution process of the basic data is finished, the writing operation of the first data table is forbidden, the incremental data written into the first data table in the redistribution process of the basic data is counted, the counted data written into the first data table is redistributed into the second data table, and the second data table is set as a database storage table after the redistribution process of the basic data is finished. By the database capacity expansion method, the data redistribution is carried out under the condition that the number of the database nodes is changed, the reading operation of the data in the database is not influenced, and meanwhile, the writing operation time of the data in the database is limited when the redistribution is carried out on the part of data due to the limited data quantity of the first data table written in the statistical redistribution process, and the service cannot be greatly influenced.
The following describes in detail how the technical solution of the present invention solves the above technical problems with specific examples.
Fig. 2 is a schematic flow chart of a data redistribution method for a database provided in an embodiment of the present disclosure, as shown in fig. 2, the method of the present embodiment is executed by a distributed database, and the method of the present embodiment is as follows:
s201, receiving a redistribution command sent by a client.
The redistribution command is used for indicating that basic data stored in the first data table is redistributed to generate a second data table, the first data table stores data according to a first preset rule, and the second data table stores data according to a second preset rule.
The database stores a first data table of basic data in a distributed mode in each node according to a first preset rule, when the preset rule is changed into a second preset rule, the data in the first data table is redistributed to generate a second data table stored in the second preset rule, and the second data table is stored in each new node of the database in a distributed mode. The first preset rule and the second preset rule are distribution rules of storing data of the database to each node. The first data table is a table needing to be redistributed according to a second preset rule, namely the data table originally stored in the database. The second data table has the same table definition as the first data table, i.e. the metadata of the second data table is the same as the metadata of the first data table. The first preset rule and the second preset rule may be respectively one of various existing distribution rules, such as hash distribution, random distribution, and the like, and the category of the distribution rule is not limited in the present invention.
And after the client determines the second preset rule, generating a redistribution command according to the second preset rule, and sending the redistribution command to the database main node. The redistribution command is used for instructing redistribution of the basic data stored in the first data table to generate a second data table.
Optionally, the redistribution command may include table information of the first data table, and the first data table may be one or multiple, and if the first data table is multiple, the redistribution command may further include: the execution sequence of the redistribution operation performed by the plurality of first data tables, for example, 10 first data tables, may be set to execute 10 redistribution operations of the first data tables in parallel, may also be set to execute 5 redistribution operations of the first data tables in parallel at a time, and may also execute 10 redistribution operations of the first data tables in series.
S202, the basic data are redistributed, and the basic data are written into a second data table.
S203, in the time period of redistributing the basic data, receiving a writing instruction of the incremental data sent by the client, and writing the incremental data into the first data table.
The basic data are redistributed into the second data table according to a second preset rule, in the process of redistributing the basic data, the data of the first data table in which the basic data are redistributed can be read, the data can also be written into the first data table, the data written into the first data table in the process are incremental data, in the process of redistributing the basic data, the data are written into the first data table, therefore, the data read from the first data table are the latest data in the process of redistributing the basic data, and the normal progress of data writing and the accuracy of data reading in the process of redistributing are guaranteed. For example, generally, when redistributing data to the first data table, a blocking authority for prohibiting the table operation is applied to the first data table, that is, the first data table is locked until the redistribution is completed.
Alternatively, during the redistribution of the underlying data, any database operation may be performed on a data table stored in a database other than the first data table, that is, any database operation may be performed on a data table in the database that is not being redistributed.
S204, after the basic data are redistributed, the first data table is prohibited from writing data, incremental data are redistributed, and the incremental data are written into the second data table.
And recording the incremental data written in the first data table in the process of redistributing the basic data, redistributing the incremental data after the basic data are redistributed, and writing the incremental data in the second data table.
Optionally, the redistribution process may be quickly and conveniently completed by recording the snapshot, and the specific method is as follows:
before the basic data redistribution, a snapshot of the first data table is obtained, and the maximum position of the data written in the first data table, namely the maximum file number and the maximum row number of the first data table, is inquired according to the snapshot of the first data table.
When the basic data of the first data table is redistributed, the writing mode of the first data table is set as an additional writing mode, namely, the writing operation aiming at the first data table needs to be written after the maximum file number and the maximum row number of the snapshot record.
When the basic data are redistributed, the data in the first data table before the maximum position of the written data recorded in the snapshot of the first data table are the basic data, and the basic data can be scanned to redistribute the basic data into the second data table.
After the redistribution of the basic data is finished, the table is blocked and upgraded, namely, any write operation to the table is forbidden at the moment, the data in the first data table after the maximum position is scanned according to the snapshot of the first data table, the data is incremental data, and the incremental data is redistributed to the second data table.
S205, after the incremental data are redistributed, the pointer of the first data table points to the second data table.
After the two stages of data redistribution are finished, the data redistribution of the first data table into the second data table is finished, at this time, the data table switching needs to be performed on the first data table and the second data table, the second data table is stored into a data table stored in the database, and then a pointer pointing to the first data table in the database system table points to the second data table, that is, the pointer in the system table points to the redistributed second data table. During the switching of the data tables, it is necessary to inhibit the read and write operations to the first data table. This is a short time because such a table switch only requires modification of the pointers of the system tables.
Optionally, after the pointer of the first data table points to the second data table, the first data table may be deleted. Further, a pointer to the second data table may also be pointed to the first data table. The second data table is a temporary table of one transaction established before redistribution, and the temporary table in the transaction is deleted when the transaction of the database is submitted, so that the deletion of the first data table can be realized by transforming the pointer.
Optionally, the distribution rule in the database is modified into a second preset rule corresponding to the second data table. If there is a change in the nodes of the database, the number of nodes in the database, etc. also needs to be modified to a new number of nodes.
In this embodiment, a first data table in a database stores data according to a first preset rule, after the database receives a redistribution command sent by a client, the database redistributes basic data stored in the first data table to generate a second data table with a number stored according to a second preset rule, the redistribution process is divided into two stages, the first stage is to redistribute the basic data in the first data table, write the basic data into the second data table, within the time period of redistributing the basic data, perform read and/or write operations on the first data table of the database, write incremental data of a write instruction received by the database into the first data table, after the redistribution of the basic data is finished, perform the second stage to redistribute the incremental data, write the incremental data into the second data table, within the time period of redistributing the incremental data, and prohibiting the first data table from writing data, and after the incremental data is redistributed, pointing the pointer of the first data table to the second data table, thereby completing the redistribution operation. In the process of redistributing the database, the first stage redistributes the basic data to allow reading and/or writing of data, and the second stage redistributes the incremental data to allow reading of data and prohibit writing of data.
Fig. 3 is a schematic flowchart of another method for redistributing data in a database according to an embodiment of the present disclosure, and fig. 3 is based on the embodiment shown in fig. 2, and further, as shown in fig. 3, S205 is followed by S206:
and S206, receiving a writing instruction sent by the client, and writing data corresponding to the writing instruction into a second data table.
After step S205, the table actually stored in the database is already the redistributed second data table, then all database operations on the database can be performed, and the corresponding operations are performed on the second data table.
According to the embodiment, after the redistribution of the database is finished, any database operation including a write operation is performed on the database, and the availability of the database is realized.
In some scenes, before incremental data are redistributed, if data are written into a first data table at the moment, the redistribution of the incremental data cannot apply for the lock, a deadlock phenomenon occurs, the redistribution of the incremental data enters a transaction rollback program, namely the current redistribution operation is cancelled, the redistribution operation is carried out again after a preset time interval, and if the redistribution process is cancelled, resources consumed in the base data redistribution process are wasted, so that the overall performance of the database is influenced. How the present disclosure solves the above technical problems is described below with specific examples.
Fig. 4 is a schematic flowchart of a data redistribution method for a database according to an embodiment of the present disclosure, where fig. 4 is based on the embodiment shown in fig. 2 or fig. 3, and further, as shown in fig. 4, after the redistribution of the basic data is finished, before S204, S204a is further included:
s204a, the uncompleted write transaction is interrupted.
Before incremental data is redistributed, blocking write-prohibited data of the first data table needs to be applied, the write-prohibited data of the first data table is prohibited, if an unfinished write transaction is in progress at the moment, the write transaction is interrupted, redistribution of the incremental data is continued, the write transaction can be rolled back, and retry of the write transaction is conducted at intervals of a preset time period.
In the embodiment, the write transaction which is carried out before the incremental data is redistributed is interrupted, so that the smooth proceeding of the redistribution process of the database is ensured, the resources of the database system are saved, and the performance of the database system is improved.
Fig. 5 is a schematic flow chart of a further data redistribution method for a database according to an embodiment of the present disclosure, and fig. 5 is based on the embodiments shown in fig. 2 to fig. 4, and further, as shown in fig. 5, before S201, the method further includes S200:
s200, receiving a second preset rule sent by the client.
The scenario of newly adding the second preset rule may be that when the database is expanded or reduced, the user may add or delete a node in the database through the client, and the client needs to re-specify the distribution rule of data distribution in the database as the second preset rule according to the change condition of the node. For example, an original database has 2 nodes, and when the database is expanded, if 1 node needs to be added, data in the original database is distributed to the original 2 nodes according to a certain distribution rule, and data in the original database needs to be distributed to 3 nodes according to a new distribution rule.
The client may send the second preset rule to the master node of the database, so that the database may perform subsequent redistribution operations according to the second preset rule.
In this embodiment, the database receives the second preset rule sent by the client, and the database can perform redistribution operation according to the second preset rule, thereby ensuring accurate redistribution operation.
In some scenarios, if the number of nodes in the database changes, for example, when the database expands or contracts, a user may add or delete a node in the database through a client, and further, the method of this embodiment further includes, before S200: and the client generates new node configuration information according to the received node change instruction.
A user may add or delete a node of a database through a client, and the client generates new node configuration information according to the node change instruction, for example, the user adds a node on the basis of an original database node, the user inputs a host name of the newly added node in the client, and the client generates node configuration information of the new node according to the node change instruction, where the node configuration information includes but is not limited to: host name, port, data directory, node unique identification and the like.
Further, S200 also includes, before S200 a:
s200a, the database receives a node change instruction sent by the client, and the node change instruction is used for indicating the addition of storage nodes or the reduction of the storage nodes.
The client sends the node change instruction to the database, taking the capacity expansion of the database as an example, the node change instruction includes the node configuration information of the new node, that is, the client sends the node configuration information of the new node to the master node of the database.
In this embodiment, the storage node is updated in the database by receiving the node change instruction sent by the client.
Based on the above embodiments, a specific embodiment of an implementation manner of the database receiving the node change instruction sent by the client is further described below.
Fig. 6 is a schematic flowchart of a data redistribution method for a database according to an embodiment of the present disclosure, where fig. 6 is based on any one of the embodiments shown in fig. 2 to fig. 5, and as shown in fig. 6, S601 and S602 are further included before S200a, S200a includes S603, and S604 and S605 are further included after S200 a:
s601, receiving a system table locking command sent by a client.
Wherein the lock system table command is to instruct the database to set the system table to a read-only state.
The client can start to execute capacity expansion operation according to the new node configuration information, firstly, a read-only lock is added to the system table of the database master node, and operations of modifying the system table such as database schema Definition Language (DDL) and the like are prevented from being executed by other sessions of the database cluster. The system table is stored in the database master node, and includes but is not limited to the following information: directories, metadata for tables, node configuration information, etc.
S602, according to the system table locking command, setting a read-only lock for the system table.
The main operation of the database cluster main node is reading and writing data operation to the data table for the analysis type service scene, the DDL condition is rarely executed, and the blocking time is very short, so the service is hardly influenced.
And S603, receiving a node adding command sent by the client.
And the client generates the metadata of the directory and the table of the new node according to the metadata of the directory and the table of the database master node.
The client can obtain the metadata of the directory and the table of the main node, and the data which are the same as the metadata of the directory and the table of the main node are generated on the new node, so that the consistency of the metadata of the directory and the table of the new node and the database main node is ensured.
Optionally, the client may invoke a backup process to perform physical backup on the database master node, send backup data to each new node in parallel, and perform physical recovery on the new node in parallel, that is, generate metadata of the data directory and table on the new node.
Through the backup and recovery operation, the consistency of the metadata of the directory and the table of the new node and the database main node is ensured, and meanwhile, the backup and recovery time is very short because the main node stores the metadata and does not store the actual data of the table.
The client side starts a new node in parallel and sends a node adding command to the database main node, the node adding command can contain node configuration information of the new node, and the node adding command is used for indicating the database main node to add the new node into the database.
And S604, determining a new system table according to the node adding command.
After receiving the command of adding the node, the database master node writes the node configuration information of the new node into the node information system table to obtain a new system table, so that the registration of the new node to the database is completed, and the construction of the new database is completed.
And S605, canceling the read-only lock of the system table.
Optionally, the client may send a read-only lock command to release the system table to the database master node after the new database is built.
The database master node releases the system table read-only lock according to the received read-only lock releasing command of the system, and after the system table read-only lock is released, all operations on the system table of the database can be normally performed, for example, DDL operations can be normally performed.
In the embodiment, when the nodes are added to the database, the read-only lock is added to the system table of the database, so that the addition of the nodes to the database can be smoothly performed, the time required by the operation of the nodes is short, the operation of the database is hardly influenced, and the usability of the database is ensured.
The following describes, by way of specific examples, the influence of the existing online capacity expansion method and the online capacity expansion method of the present disclosure on the service during redistribution.
Assuming that an additional table is created and 1 hundred million lines of data are inserted before capacity expansion, specific codes are as follows:
Create table test_ao(a int,b int,c varchar(1010))with(appendonly=true);
insert into test_ao select generate_series(1,100000000),generate_series(1,10000000),repeat(‘1234567890’,100);
the capacity expansion needs to expand the original 2 nodes to 3 nodes, and then data redistribution operation is performed on the table data, and an alter table test _ ao expanded table is used.
The following applications are executed in the process of data redistribution:
application 1: the query with the condition is executed circularly: select from test _ ao where a is 100;
application 2-circular execution logging 1000 pieces of data: insert _ test _ ao select generation _ series (1, 1000), generation _ series (1, 1000).
The existing online capacity expansion method is adopted, and the execution result of the application 1 is as follows: the first query is stuck until the data redistribution operation is complete, with a stuck time of about 400 seconds. Execution result of application 2: the first query is stuck until the data redistribution operation is complete, with a stuck time of about 400 seconds.
By adopting the database capacity expansion method provided by the invention, the execution result of the application 1 is as follows: the query is not blocked and can be executed at all times. Execution result of application 2: most of the time can be inserted with the exception that the incremental data will jam for a short period of time, about 5 seconds, during which the incremental data is about 100 ten thousand.
The results show that: in the process of redistributing the data by adopting the data capacity expansion method, reading (inquiring) operation can be executed all the time, data writing (recording) operation can be executed most of the time, and only the last incremental data redistribution period can be blocked very temporarily, the blocking time is determined by the size of the recorded data in the redistribution process, the blocking time is longer when the incremental data is more, but the quantity of the incremental data and the data of the first data table has the magnitude difference, so the blocking time is relatively short, while the inquiring and data recording operation is blocked and can not be executed all the time when the existing online capacity expansion method is adopted in the data redistribution process, the blocking time is determined by the quantity of the data of the first data table, generally the data is more, the data quantity is more, the blocking time is longer, so compared with the original data redistribution method that the data inquiring and data recording operation are blocked all the time, the data redistribution by adopting the database capacity expansion method can greatly reduce the influence of blocking data query and data entry service, thereby improving the availability of the system.
Fig. 7 is a schematic structural diagram of a data redistribution apparatus for a database according to an embodiment of the present disclosure, and as shown in fig. 7, the apparatus of this embodiment includes:
the first receiving module 701 is configured to receive a redistribution command sent by a client, where the redistribution command is used to instruct to redistribute basic data stored in a first data table to generate a second data table, the first data table stores data according to a first preset rule, and the second data table stores data according to a second preset rule;
a basic data redistribution module 702, configured to redistribute the basic data and write the basic data into the second data table;
the second receiving module 703 is configured to receive a write instruction of the incremental data sent by the client in the time period when the basic data redistribution module redistributes the basic data, and write the incremental data into the first data table;
the incremental data redistribution module 704 is used for redistributing the incremental data after the basic data redistribution module redistributes the basic data, and writing the incremental data into the second data table;
the forbidding module 705 is configured to forbid the first data table from writing data in a time period when the incremental data redistribution module redistributes the incremental data;
and a pointing module 706, configured to point the pointer of the first data table to the second data table after the incremental data redistribution module redistributes the incremental data.
Optionally, the apparatus further comprises:
and the writing module is used for receiving a writing instruction sent by the client and writing data corresponding to the writing instruction into the second data table.
Optionally, the apparatus further comprises:
and the interruption module is used for interrupting the uncompleted write transaction.
Optionally, the apparatus further comprises:
and the third receiving module is used for receiving a node change instruction sent by the client, wherein the node change instruction is used for indicating that the storage nodes are increased or decreased.
The apparatus of the foregoing embodiment may be configured to implement the technical solution of the foregoing method embodiment, and the implementation principle and the technical effect are similar, which are not described herein again.
Fig. 8 is a schematic structural diagram of a data redistribution device of a database according to an embodiment of the present disclosure, and as shown in fig. 8, the device according to this embodiment includes:
a memory 801, a memory for storing processor-executable instructions;
a processor 802 for implementing the method of any one of the embodiments of fig. 1 to 6 as described above when the computer program is executed.
The apparatus of the foregoing embodiment may be configured to implement the technical solution of the foregoing method embodiment, and the implementation principle and the technical effect are similar, which are not described herein again.
The present disclosure provides a computer-readable storage medium having stored therein computer-executable instructions for implementing a data redistribution method for a database as in any one of the embodiments of fig. 1 to 6 described above when executed by a processor.
It is noted that, in this document, relational terms such as "first" and "second," and the like, may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The foregoing are merely exemplary embodiments of the present disclosure, which enable those skilled in the art to understand or practice the present disclosure. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the disclosure. Thus, the present disclosure is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (10)

1. A data redistribution method for a database is characterized by comprising the following steps:
receiving a redistribution command sent by a client, wherein the redistribution command is used for indicating to redistribute basic data stored in a first data table to generate a second data table, the first data table stores data according to a first preset rule, and the second data table stores data according to a second preset rule;
redistributing the basic data and writing the basic data into the second data table;
in the time period of redistributing the basic data, receiving a writing instruction of incremental data sent by the client, and writing the incremental data into the first data table;
after the redistribution of the basic data is finished, the incremental data are redistributed and written into a second data table;
forbidding the first data table to write data in the time period for redistributing the incremental data;
and after the incremental data are redistributed, pointing the pointer of the first data table to a second data table.
2. The method of claim 1, wherein after pointing the pointer of the first data table to the second data table, further comprising:
and receiving a write-in instruction sent by the client, and writing data corresponding to the write-in instruction into the second data table.
3. The method according to claim 1 or 2, further comprising, after the redistribution of the base data is finished, before prohibiting the first data table from writing data:
an interrupt is performed on the outstanding write transaction.
4. The method of claim 3, wherein before the receiving the redistribution command sent by the client, the method further comprises:
and receiving a second preset rule sent by the client.
5. The method according to claim 4, wherein before the receiving the second preset rule sent by the client, the method further comprises:
and receiving a node change instruction sent by a client, wherein the node change instruction is used for indicating that storage nodes are increased or decreased.
6. A data redistribution device for a database, comprising:
the system comprises a first receiving module, a second receiving module and a third receiving module, wherein the first receiving module is used for receiving a redistribution command sent by a client, the redistribution command is used for indicating that basic data stored in a first data table is redistributed to generate a second data table, the first data table stores data according to a first preset rule, and the second data table stores data according to a second preset rule;
the basic data redistribution module is used for redistributing the basic data and writing the basic data into the second data table;
the second receiving module is used for receiving a writing instruction of the incremental data sent by the client and writing the incremental data into the first data table in the time period when the basic data redistribution module redistributes the basic data;
the incremental data redistribution module is used for redistributing the incremental data after the basic data redistribution module redistributes the basic data and writing the incremental data into a second data table;
the forbidding module is used for forbidding the first data table to write data in the time period when the incremental data redistribution module redistributes the incremental data;
and the pointing module is used for pointing the pointer of the first data table to a second data table after the incremental data redistribution module redistributes the incremental data.
7. The apparatus of claim 6, further comprising:
and the writing module is used for receiving a writing instruction sent by the client and writing data corresponding to the writing instruction into the second data table.
8. The apparatus of claim 6 or 7, further comprising:
and the interruption module is used for interrupting the uncompleted write transaction.
9. A data redistribution device for a database, comprising:
a memory for storing processor-executable instructions;
a processor for implementing the method of any one of claims 1 to 5 when the computer program is executed.
10. A computer-readable storage medium having stored therein computer-executable instructions for implementing a method of data redistribution for a database as claimed in any one of claims 1 to 5 when executed by a processor.
CN202010969854.6A 2020-09-15 2020-09-15 Data redistribution method, device, equipment and storage medium of database Active CN112115146B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010969854.6A CN112115146B (en) 2020-09-15 2020-09-15 Data redistribution method, device, equipment and storage medium of database

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010969854.6A CN112115146B (en) 2020-09-15 2020-09-15 Data redistribution method, device, equipment and storage medium of database

Publications (2)

Publication Number Publication Date
CN112115146A true CN112115146A (en) 2020-12-22
CN112115146B CN112115146B (en) 2023-09-15

Family

ID=73802759

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010969854.6A Active CN112115146B (en) 2020-09-15 2020-09-15 Data redistribution method, device, equipment and storage medium of database

Country Status (1)

Country Link
CN (1) CN112115146B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113486029A (en) * 2021-06-28 2021-10-08 上海万物新生环保科技集团有限公司 Data compensation method, system and equipment for service degradation

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040186832A1 (en) * 2003-01-16 2004-09-23 Jardin Cary A. System and method for controlling processing in a distributed system
GB201322049D0 (en) * 2013-12-13 2014-01-29 Ibm Incremental and collocated redistribution for expansion of an online shared nothing database
CN110427383A (en) * 2019-05-13 2019-11-08 国网冀北电力有限公司 Mysql cluster online data redistribution method and relevant device based on data redundancy
CN111291112A (en) * 2018-12-07 2020-06-16 阿里巴巴集团控股有限公司 Read-write control method and device for distributed database and electronic equipment

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040186832A1 (en) * 2003-01-16 2004-09-23 Jardin Cary A. System and method for controlling processing in a distributed system
GB201322049D0 (en) * 2013-12-13 2014-01-29 Ibm Incremental and collocated redistribution for expansion of an online shared nothing database
CN111291112A (en) * 2018-12-07 2020-06-16 阿里巴巴集团控股有限公司 Read-write control method and device for distributed database and electronic equipment
CN110427383A (en) * 2019-05-13 2019-11-08 国网冀北电力有限公司 Mysql cluster online data redistribution method and relevant device based on data redundancy

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
张莹;郑学智;: "基于Mycat的大数据存储研究", 电子设计工程, no. 05 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113486029A (en) * 2021-06-28 2021-10-08 上海万物新生环保科技集团有限公司 Data compensation method, system and equipment for service degradation

Also Published As

Publication number Publication date
CN112115146B (en) 2023-09-15

Similar Documents

Publication Publication Date Title
US10437795B2 (en) Upgrading systems with changing constraints
US6523036B1 (en) Internet database system
US7831569B2 (en) Preserving a query plan cache
CN110209735B (en) Database backup method, database backup device, computing device, and storage medium
US7895172B2 (en) System and method for writing data dependent upon multiple reads in a distributed database
US7698319B2 (en) Database system management method, database system, database device, and backup program
US20130054531A1 (en) Database cloning
US20070088769A1 (en) Reorganization and repair of an icf catalog while open and in-use in a digital data storage system
KR100961739B1 (en) Maintaining consistency for remote copy using virtualization
EP3519987B1 (en) Intents and locks with intent
US20120284244A1 (en) Transaction processing device, transaction processing method and transaction processing program
CN110909087A (en) Method and device for modifying table structure of online DDL (distributed data storage) of relational database
US8131700B2 (en) Transitioning clone data maps and synchronizing with a data query
JP2004318288A (en) Method and device for processing data and its processing program
CN112115146A (en) Data redistribution method, device, equipment and storage medium of database
US8561050B2 (en) Method and system for updating an application
US11461201B2 (en) Cloud architecture for replicated data services
US8862550B1 (en) Reliable in-place bootstrap metadata transformation in a shared data store
US20110099347A1 (en) Managing allocation and deallocation of storage for data objects
US7051051B1 (en) Recovering from failed operations in a database system
US20170177615A1 (en) TRANSACTION MANAGEMENT METHOD FOR ENHANCING DATA STABILITY OF NoSQL DATABASE BASED ON DISTRIBUTED FILE SYSTEM
CN115391337A (en) Database partitioning method and device, storage medium and electronic equipment
CN113641686B (en) Data processing method, data processing apparatus, electronic device, storage medium, and program product
US20070100791A1 (en) Method and system for re-population of data in a database
JPH1091405A (en) Software maintenance method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant