CN115794823A - Operation method of distributed database, server and storage medium - Google Patents

Operation method of distributed database, server and storage medium Download PDF

Info

Publication number
CN115794823A
CN115794823A CN202211511184.9A CN202211511184A CN115794823A CN 115794823 A CN115794823 A CN 115794823A CN 202211511184 A CN202211511184 A CN 202211511184A CN 115794823 A CN115794823 A CN 115794823A
Authority
CN
China
Prior art keywords
page number
server
sub
data
data node
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211511184.9A
Other languages
Chinese (zh)
Inventor
邱炜伟
张珂杰
郑柏川
黄方蕾
胡麦芳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Qulian Technology Co Ltd
Original Assignee
Hangzhou Qulian Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Qulian Technology Co Ltd filed Critical Hangzhou Qulian Technology Co Ltd
Priority to CN202211511184.9A priority Critical patent/CN115794823A/en
Publication of CN115794823A publication Critical patent/CN115794823A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application relates to the technical field of databases, and provides an operation method of a distributed database, a server and a storage medium. The method comprises the steps of setting a main server and at least one sub-server, and dividing data of data nodes into memory chip areas of the sub-servers for storage. When a key value pair needs to be added to the database, the main server determines a data node into which the key value pair needs to be inserted, and then sends the page number and the key value pair corresponding to the data node to a sub-server where the data of the data node is located. After receiving the page number and the key value pair, the sub-server executes key value pair combination and data node splitting operation on the data node, and returns an obtained data node splitting result to the main server. And finally, the main server executes the splitting operation of the index nodes according to the splitting result of the data nodes, thereby finishing the operation of adding key value pairs to the database. By this arrangement, the data transmission pressure on each server can be reduced.

Description

Operation method of distributed database, server and storage medium
Technical Field
The present application relates to the field of database technologies, and in particular, to an operating method, a server, and a storage medium for a distributed database.
Background
The Mercker B + tree is a tree structure which is improved on the basis of a B + tree and comprises two different types of nodes, namely index nodes and data nodes, wherein the index nodes store minimum keys and hash values of child nodes, and the data nodes store data in the form of key value pairs. The data stored by each node can be stored in a disk of the server in the form of a data page, that is, each node corresponds to a page number, and the page number is the number of the data page stored by the data. At present, when a key value adding peer-to-peer operation is carried out on a database based on a merkel B + tree, high data transmission pressure is brought to a server. For example, assuming that N key-value pairs are inserted into the database at a time, the N key-value pairs may fall into N data nodes and may also fall into N-2N index nodes, which means that the read/write operations of 2N-3N data pages may cause a high data transmission pressure.
Disclosure of Invention
In view of this, embodiments of the present application provide an operation method of a distributed database, a server, and a storage medium, which can reduce data transmission pressure brought to the server when operating the database.
A first aspect of an embodiment of the present application provides an operation method for a distributed database applied to a master server, including:
obtaining key value pairs to be added to a distributed database; the distributed database is based on a Mercker B + tree, the main server is connected with at least one sub-server, each sub-server is provided with at least zero memory slice regions, the total number of the memory slice regions of all the sub-servers is at least 1, each memory slice region stores a set number of data pages, and the data pages are used for storing data of data nodes of the Mercker B + tree;
determining a target data node in the Mercker B + tree, into which the key value pair needs to be inserted, according to the key of the key value pair;
determining a first sub-server where the data of the target data node is located from the at least one sub-server according to the target page number corresponding to the target data node; the target page number is the number of a target data page, the target data page stores the data of the target data node, and a first memory chip area of the first sub-server stores the target data page;
sending the key value pair and the target page number to the first sub-server to instruct the first sub-server to execute key value pair combination and data node splitting operation on the target data node according to the key value pair and the target page number to obtain a data node splitting result, and returning the data node splitting result to the main server;
and after the data node splitting result is received, executing index node splitting operation of the Merckel B + tree according to the data node splitting result so as to complete the operation of adding the key value pair to the distributed database.
In the embodiment of the application, a main server and at least one sub-server are provided, and considering that the data nodes of the merkel B + tree store key value pair data with large data volume, and the index nodes only store index information with small data volume, the data of the data nodes are divided into the sub-servers for storage, wherein each sub-server is provided with at least zero memory chip regions, and each memory chip region stores a set number of data pages for storing the data of the corresponding data nodes. When a key value pair needs to be added to the database, the main server determines a data node into which the key value pair needs to be inserted, and then sends the page number and the key value pair corresponding to the data node to a sub-server where the data of the data node is located. After receiving the page number and the key value pair, the sub-server executes key value pair combination and data node splitting operation on the data node, and returns an obtained data node splitting result to the main server. And finally, the main server executes index node splitting operation according to the data node splitting result, so that the operation of adding key value pairs to the database is completed. By the arrangement, the data transmission pressure of the main server can be distributed to the sub servers when the database is operated, so that the data transmission pressure brought to each server is reduced.
A second aspect of the embodiments of the present application provides an operation method applied to a distributed database of a first sub-server, including:
receiving a target page number sent by a main server and a key value pair to be added to a distributed database; the distributed database is based on a merkel B + tree, the main server is connected with at least one sub-server, each sub-server is provided with at least zero memory slice regions, the total number of the memory slice regions of all the sub-servers is at least 1, each memory slice region stores a set number of data pages, the data pages are used for storing data of data nodes of the merkel B + tree, the first sub-server is a sub-server where data of a target data node determined by the main server from the at least one sub-server is located, the target page number is the number of the target data page, the target data page stores data of the target data node, the first memory slice region of the first sub-server stores the target data page, and the target data node is a data node in the merkel B + tree, which the main server needs to insert, and is determined according to the key of the key pair;
executing key-value pair combination and data node splitting operation on the target data node according to the key-value pair and the target page number to obtain a data node splitting result;
and returning the data node splitting result to the main server to indicate the main server to execute the index node splitting operation of the Mercker B + tree according to the data node splitting result after receiving the data node splitting result, so as to finish the operation of adding the key value pair to the distributed database.
A third aspect of the embodiments of the present application provides an operating apparatus for a distributed database applied to a master server, including:
the key value pair acquisition module is used for acquiring the key value pairs to be added to the distributed database; the distributed database is based on a Mercker B + tree, the main server is connected with at least one sub-server, each sub-server is provided with at least zero memory slice regions, the total number of the memory slice regions of all the sub-servers is at least 1, each memory slice region stores a set number of data pages, and the data pages are used for storing data of data nodes of the Mercker B + tree;
the data node determining module is used for determining a target data node in the Mercker B + tree, into which the key value pair needs to be inserted, according to the key of the key value pair;
a sub-server determining module, configured to determine, according to a target page number corresponding to the target data node, a first sub-server where data of the target data node is located from the at least one sub-server; the target page number is the number of a target data page, the target data page stores the data of the target data node, and a first memory chip area of the first sub-server stores the target data page;
the key value pair sending module is used for sending the key value pair and the target page number to the first sub-server so as to instruct the first sub-server to execute key value pair combination and data node splitting operation on the target data node according to the key value pair and the target page number to obtain a data node splitting result, and returning the data node splitting result to the main server;
and the index node splitting module is used for executing the index node splitting operation of the Mercker B + tree according to the data node splitting result after receiving the data node splitting result so as to finish the operation of adding the key value pair to the distributed database.
A fourth aspect of the embodiments of the present application provides an operating apparatus for a distributed database applied to a first sub-server, including:
the key value pair receiving module is used for receiving a target page number sent by the main server and a key value pair to be added to the distributed database; the distributed database is based on a merkel B + tree, the main server is connected with at least one sub-server, each sub-server is provided with at least zero memory slice regions, the total number of the memory slice regions of all the sub-servers is at least 1, each memory slice region stores a set number of data pages, the data pages are used for storing data of data nodes of the merkel B + tree, the first sub-server is a sub-server where data of a target data node determined by the main server from the at least one sub-server is located, the target page number is the number of the target data page, the target data page stores data of the target data node, the first memory slice region of the first sub-server stores the target data page, and the target data node is a data node in the merkel B + tree, which the main server needs to insert, and is determined according to the key of the key pair;
the data node splitting module is used for executing key value pair combination and data node splitting operation on the target data node according to the key value pair and the target page number to obtain a data node splitting result;
and the splitting result returning module is used for returning the data node splitting result to the main server so as to indicate the main server to execute the index node splitting operation of the Mercker B + tree according to the data node splitting result after receiving the data node splitting result, so that the operation of adding the key value pair to the distributed database is completed.
A fifth aspect of embodiments of the present application provides a server, including a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements an operation method as provided by the first aspect of embodiments of the present application or implements an operation method as provided by the second aspect of embodiments of the present application when executing the computer program.
A sixth aspect of embodiments of the present application provides a computer-readable storage medium storing a computer program which, when executed by a processor, implements the method of operation as provided by the first aspect of embodiments of the present application, or implements the method of operation as provided by the second aspect of embodiments of the present application.
A seventh aspect of embodiments of the present application provides a computer program product, which, when run on a server, causes the server to perform the operating method as provided by the first aspect of embodiments of the present application, or to perform the operating method as provided by the second aspect of embodiments of the present application.
It is to be understood that, the beneficial effects of the second to seventh aspects may be referred to the relevant description of the first aspect, and are not repeated herein.
Drawings
FIG. 1 is a schematic representation of a Mercker B + tree provided by an embodiment of the present application;
fig. 2 is a schematic structural diagram of a server system provided in an embodiment of the present application;
FIG. 3 is a schematic diagram of a storage slice area provided in an embodiment of the present application and configured for each of the sub servers;
FIG. 4 is a flowchart of a method for operating a distributed database according to an embodiment of the present application;
FIG. 5 is a schematic diagram illustrating an operation of determining a forwarded memory slice area according to a global maximum page number according to an embodiment of the present application;
FIG. 6 is a diagram illustrating operations of updating a global maximum page number and creating a new memory slice area according to an embodiment of the present application;
FIG. 7 is a schematic diagram illustrating operations for adding key-value pairs to a distributed database according to embodiments of the present application;
FIG. 8 is a schematic diagram of an initial state of the Mercker B + tree provided by an embodiment of the present application;
FIG. 9 is a schematic diagram of a key-value pair distribution and merging operation performed on the Mercker B + tree of FIG. 8 according to an embodiment of the present application;
FIG. 10 is a schematic diagram of a node splitting operation performed on the Mercker B + tree of FIG. 9 according to an embodiment of the present application;
FIG. 11 is a diagram illustrating a data rollback operation on a database according to an embodiment of the present application;
fig. 12 is a schematic structural diagram of an operating device applied to a distributed database of a main server according to an embodiment of the present application;
fig. 13 is a schematic structural diagram of an operating device applied to a distributed database of a first sub-server according to an embodiment of the present application;
fig. 14 is a schematic diagram of a server according to an embodiment of the present application.
Detailed Description
In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system structures, techniques, etc. in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail. Furthermore, in the description of the present application and the appended claims, the terms "first," "second," "third," and the like are used for distinguishing between descriptions and not necessarily for describing or implying relative importance.
The merkel B + tree is based on a B + tree, and includes a data node (dataNode) at the last level and index nodes (indexnodes) at each level upwards, the index node at the highest level is called a root node (rootNode), and the index node at the lowest level (i.e. leaf level) is called a leaf index node. Each node (including data node and index node) of the Mercker B + tree has an ID, the child node list of the index node stores the minimum key and hash value of the child node, the child node list of the data node stores the key value pair [ key, value ], and the hash value of each node can be calculated by the hash value of the sum of its child node lists.
FIG. 1 shows a schematic representation of a Mercker B + tree according to an embodiment of the present application. In fig. 1, nodes n1, n2, and n3 are index nodes, and nodes n4, n5, and n6 are data nodes. The data stored in the database is stored in the data nodes in the form of key-value pairs, and each node (including the data nodes and the index nodes) is stored in the disk of the server in the form of data pages. In order to better adapt to the read-write strategy of the disk, the size of each data page may be specified to be 4 kbytes (the size of a disk page is usually 4 kbytes), so that data of a certain node can be read from the disk by calculating an offset of a page number (i.e., a page number, a page ID) by a page size (4 kbytes), and the data stored in the disk includes an index file (i.e., data of an index node) and a data file (i.e., data of a data node).
Taking fig. 1 as an example, when data query is performed, assuming that query with key c is required, starting from a node n1 (root node), comparing the minimum key of two child nodes with c, finding that c falls within the range of the first child node of n1, so as to query a node n2, repeating the process until the node falls on a data node, and finally finding a data node n5, and then querying the position of c and a corresponding key value.
In terms of page number management, the server may set a maximum page ID and a list of free page numbers. When a node (including an index node and a data node) is newly created or updated in a database based on a merkel B + tree, a new node is created, and a page ID needs to be allocated to the new node. Firstly, the server will go to the free page number list to inquire whether there is a free page number (i.e. a page number not allocated to a node), if so, it will be taken out and allocated to the new node, otherwise, it will allocate the maximum page ID to the new node and increment it. The page ID distribution naturally has page ID recycling, and when the node A is divided into a node B and a node C, the page ID of the node A can be recycled and is included in a free page number list. In addition, when a node is found to become a null node, the page ID is also recycled to the free page number list.
At present, when data query or key value addition peer-to-peer operation is performed on a database based on a merkel B + tree, data read-write operation of a large number of nodes is usually involved, which brings great data transmission pressure to a server. In view of this, the embodiment of the present application provides an operation method for a distributed database, which can reduce data transmission pressure brought to a server when the database is operated. For more detailed technical implementation details of the embodiments of the present application, refer to the following embodiments.
Please refer to fig. 2, which is a schematic structural diagram of a server system according to an embodiment of the present disclosure.
In FIG. 2, a main server and N (N ≧ 1) sub-servers are included, and data interaction can be performed between the servers. Considering that the data nodes of the merkel B + tree store key value pair data with a large data size and the index nodes only store index information with a small data size, the embodiments of the present application mainly segment the data files of the data nodes, that is, store the data of the data nodes in the N sub-servers, and the main server only stores the data of the index nodes. Through the arrangement, most of data transmission pressure of the main server can be distributed to each sub-server when the database is operated, so that the data transmission pressure of the main server can be reduced on one hand, and the data transmission pressure of a single sub-server cannot be too large on the other hand.
The sub-servers may also be referred to as sharded servers, each sharded server may be provided with at least zero shard areas, all sharded servers are provided with at least 1 total number of shard areas, each shard area stores a set number of data pages, where this number is referred to as storage capacity of the shard area, and the data pages are used for storing data of data nodes of the mercker B + tree. In addition, each memory chip area stores data pages with respective corresponding page number ranges. As shown in fig. 3, each storage slice area may be specified to store 100 data pages (i.e. storage capacity =100, which may be set as required), each sub server may set a plurality of storage slice areas, and the main server may store various index files (e.g. a maximum page ID and a free page number list of the index file, etc.), related sub server information (e.g. an IP address, a port number of the sub server, and a page number range of the stored data pages), a global maximum page number MaxDataID (the role of MaxDataID will be described later), and a mapping information (the role of the mapping information will be described later). Each sub server needs to maintain the information of each respective memory slice (mainly, the page number range of the data page stored in each memory slice), each memory slice stores the data of the data node (including the maximum page ID and the free page number list, etc.), the global maximum page number MaxDataID, and a mapping information.
In fig. 3, each memory slice area has been numbered according to the page number range corresponding to the data page stored by each memory slice area from small to large, each sub-server has completed numbering, and each memory slice area is sequentially distributed to each sub-server arranged according to the number from small to large according to the sequence of the numbering from small to large for storage. For example, a slice area 1 (page number range: 0-99) is distributed at sub-server 1, a slice area 2 (page number range: 100-199) is distributed at sub-server 2, and a slice area 3 (page number range: 200-299) is distributed at sub-server 3; as for the storage area 4 (page number range: 300-399), since only 3 sub-servers are currently provided, the insertion sub-servers 1 are returned in sequence, and so on, and there is no data page with repeated page numbers in each storage area. It should be noted that fig. 3 is only an example provided by the embodiment of the present application, and in an actual operation, more (for example, more than 4) or fewer (for example, 1) sub-servers may be provided, and the number of the memory slice areas and the storage capacity of the memory slice areas provided by each sub-server may be different from each other, and it is also not limited that the memory slice areas are necessarily distributed to the sub-servers in sequence according to the size of the corresponding page number range.
Referring to fig. 4, an operation method of a distributed database provided in an embodiment of the present application is shown, including:
401. the method comprises the steps that a main server obtains key value pairs to be added to a distributed database;
the embodiment of the present application is based on the server system structure shown in fig. 2, and implements a distributed database based on the merkel B + tree. When key value pairs need to be added to the distributed database, the main server first obtains the key value pairs to be added, and the number of the key value pairs can be 1 or more, namely, batch insertion of the key value pairs is supported.
402. The main server determines a target data node in the Mercker B + tree, into which the key value pair needs to be inserted, according to the key of the key value pair;
after obtaining the key value pair to be added, the main server determines the data node (represented by the target data node) in the merkel B + tree into which the key value pair needs to be inserted according to the key value of the key value pair. Since the main server stores the data of the index nodes of the merkel B + tree, the target data node to be inserted by the key value pair can be found according to the key value and the range of each index node.
Specifically, the determining, according to the key of the key value pair, a target data node in the merkel B + tree into which the key value pair needs to be inserted may include:
(1) Distributing the key pairs from the root node of the Mercker B + tree to the leaf index nodes of the Mercker B + tree in sequence according to the key and the minimum key of each index node in the Mercker B + tree until the key pairs are distributed to the leaf index nodes of the Mercker B + tree;
(2) And determining the target data node from the data nodes of the Mercker B + tree connected with the leaf index node according to the key and the minimum key of the leaf index node.
In a specific operation, the key value pair may be distributed from the root node of the merkel B + tree to the leaf index node (i.e., the index node at the last layer) in sequence by comparing the key value with the minimum key of each index node. Then, the target data node can be found from the data nodes connected with the leaf index node according to the key and the minimum key of the leaf index node. For example, in fig. 1, assuming that the key value pair has been distributed to the leaf index node n2, the key value is b, and the data nodes connected to the leaf index node n2 are n4 and n5, it can be found that b falls within the range of the first child node of the leaf index node n2 (the minimum key is a, that is, the key value ≧ a), and thus the data node n4 connected to the first child node of the leaf index node n2 can be determined as the target data node.
403. The main server determines a first sub-server where the data of the target data node is located from the at least one sub-server according to the target page number corresponding to the target data node;
after determining the target data node, the host server may obtain a page number (indicated by the target page number) corresponding to the target data node. The data of each data node is stored by using a data page, each data page has a corresponding number (namely a page number), and here, the data page storing the data file of the target data node is represented by a target data page, and the target page number is the number of the target data page. The memory slice area where the target data page is located is represented by a first memory slice area, and the sub-server provided with the first memory slice area is represented by a first sub-server.
The main server needs to find the first sub-server where the data of the target data node is located from each sub-server according to the target page number. In one implementation, the main server may store related information of the memory slice area (mainly including a page number range of the stored data page) provided by each sub-server, so that after the main server obtains the target page number, the main server may find the corresponding memory slice area (i.e., the first memory slice area) through the page number range where the target page number is located, and may further find the corresponding sub-server (i.e., the first sub-server).
In another implementation, for the distribution of the memory slice area as shown in fig. 3, another method of indexing the sub-servers by page numbers may be adopted. The indexing method comprises the following steps:
(1) Performing modular operation on the target page number and the target number to obtain a first numerical value;
(2) Performing modular operation on the first numerical value and the number of the sub-servers to obtain a second numerical value;
(3) And determining the sub-server which is numbered as the second numerical value in each sub-server as the first sub-server.
When the number of the sub-servers is not changed and the storage capacity of each storage slice area is the same target number, the number of the sub-server can be calculated by adopting a mode of (target page number)% of the target number) of the number of the sub-servers, and then the sub-server of the number can be determined as the first sub-server, wherein% represents the modulus operation. For example, in fig. 3, assuming that the target page number is 150, the target number is 100, and the number of sub-servers is 3, the (150)% 3=50% is calculated as 3=2, and thus it can be determined that the sub-server 2 is the first sub-server, which corresponds to the actual situation. By adopting the method for calculation, the main server does not need to store the related information of the memory slice area arranged in each sub server, and the data volume stored in the main server can be reduced to a certain extent.
Further, before performing a modulo operation on the first value and the number of each of the sub-servers to obtain a second value, the method may further include:
(1) Acquiring preset mapping information, wherein the mapping information records the corresponding relation between each preset page number range and the number of servers;
(2) Determining a target page number range of the target page number in each preset page number range;
(3) And searching the number of servers corresponding to the target page number range from the mapping information as the number of each sub-server.
Since the distributed database of the embodiment of the present application supports dynamic addition of the number of sub servers, the main server needs to obtain the current number of sub servers before performing the modulo operation. Here, the mapping information mentioned above is used, and the mapping information records the corresponding relationship between each preset page number range and the number of servers. The number of servers corresponding to the target page number range can be found from the mapping information according to the target page number range of the target page number in each preset page number range, and the number is used as the current number of sub-servers. For example, assuming that page number ranges 0-499 correspond to server number 3, page number ranges 500-699 correspond to server number 4, page number ranges 700-899 correspond to server number 5 …, recorded in the mapping information, if the target page number is 150, the corresponding sub-server number is 3, if the target page number is 550, the corresponding sub-server number is 4, and so on.
404. The main server sends the key value pairs and the target page number to the first sub-server;
after the main server determines the first sub-server, the key-value pairs and the target page numbers are sent to the first sub-server, and then the first sub-server executes processes such as key-value pair combination and data node splitting operation. Specifically, the step of the main server sending the key-value pair and the target page number to the first sub-server may include:
(1) Generating a gPRC request carrying the key-value pair and the target page number;
(2) And sending the gPC request to the first sub-server.
The host server may package the key-value pair and the target page number into a gRPC request and then send the gRPC request to the first child server. The data interaction between the servers is carried out by using the gRPC mode, and the method has the advantages of simplicity, easiness in use, high data transmission efficiency, strong compatibility and the like.
405. The first sub-server executes key-value pair combination and data node splitting operation on the target data node according to the key-value pairs and the target page number to obtain a data node splitting result;
after receiving the key value pair and the target page number sent by the main server, the first sub-server executes corresponding key value pair combination and data node splitting operation aiming at the target data node to obtain a data node splitting result.
In an implementation manner of the embodiment of the present application, the performing, according to the key-value pair and the target page number, key-value pair merging and data node splitting operations on the target data node to obtain a data node splitting result may include:
(1) Determining the first memory area from all memory areas of the first sub-server according to the target page number and the memory area information of the first sub-server; the memory slice area information records a page number range corresponding to each memory slice area arranged in the first sub server;
(2) Distributing the key value pairs to the target data nodes stored in the first memory slice area, and executing key value pair merging operation on the target data nodes;
(3) If the data volume of the target data node after key-value pair merging operation exceeds a set threshold, splitting the target data node into at least two new data nodes, so that the data of each new data node can be written into one data page;
(4) Distributing a corresponding page number for each new data node, and writing the data of each new data node into the data page corresponding to the respective page number;
(5) And determining the data node splitting result according to the key value, the hash value and the corresponding page number of each new data node.
For step (1) above, the first sub server may determine the first memory slice area from all memory slice areas of the first sub server according to the target page number and the memory slice area information of the first sub server. For example, in fig. 3, if the target page number is 150, it may be found to be within the page number range (100-199) corresponding to the memory slice area 2, and thus it may be determined that the memory slice area 2 is the first memory slice area.
For the step (2), the first sub-server distributes the key-value pairs to the target data nodes stored in the first memory slice region, and executes key-value-pair merging operation on the target data nodes, that is, merges the currently inserted key-value pairs with the original key-value pairs of the target data nodes.
For the step (3), if the first sub-server detects that the data amount of the target data node after the key-value pair merging operation exceeds a set threshold (for example, a 4K data page), the data node splitting operation needs to be performed, where the target data node is split into at least two new data nodes according to the set splitting threshold, so that the data of each new data node can be written into one data page. On the contrary, if the data volume of the target data node after the key-value pair merging operation does not exceed the set threshold, the target data node does not need to be subjected to the splitting operation.
For the step (4), the first sub-server needs to allocate a corresponding page number (i.e. the number of the data page storing the data node) to each new data node, and then write the data of each new data node into the data page corresponding to the respective page number.
Specifically, the first sub-server may record and store a minimum available page number and a free page number list of each provided memory slice area; the minimum available page number is equal to the maximum page number which is currently allocated to the data node in the corresponding memory chip area plus one, and the free page number list records the free page number which is currently allocated to the data node in the corresponding memory chip area. The minimum available page number here and the maximum page ID described in the foregoing description of page number management are the same concept, and indicate a page number that is larger than all allocated page numbers and is not allocated, and corresponds to the minimum value of the available page numbers. For example, assuming that the page numbers currently assigned to the nodes are 1-60, 61 is a larger and unassigned page number than all of the page numbers that are assigned, and 61 is also the smallest of the available page numbers (61, 62, 63 …). Each memory slice area has a corresponding free page number list, and records the free page numbers of the memory slice area currently allocated to the data nodes, wherein the free page numbers are generally the page numbers allocated to the nodes but recycled. The step (4) may include:
(4.1) detecting whether the number of free page numbers recorded in the free page number list of the first memory slice area is greater than or equal to the number of each new data node;
(4.2) if the number of the free page numbers recorded in the free page number list of the first memory area is greater than or equal to the number of each new data node, respectively allocating one free page number to each new data node;
(4.3) if the number of the free page numbers recorded in the free page number list of the first memory area is less than the number of each new data node, dividing each new data node into a first data node set and a second data node set; wherein the number of data nodes contained in the first set of data nodes is equal to the number of free page numbers; allocating one free page number to each data node contained in the first data node set; detecting whether the minimum available page number of the first memory slice area exceeds the upper limit of the page number range corresponding to the first memory slice area or not aiming at any data node contained in the second data node set; if the minimum available page number of the first memory slice area does not exceed the upper limit of the page number, allocating the minimum available page number of the first memory slice area to the arbitrary data node, and incrementing the minimum available page number of the first memory slice area; if the minimum available page number of the first memory slice area exceeds the upper limit of the page number, determining a second memory slice area, sending the data of any data node to a second sub-server where the second memory slice area is located, indicating the second sub-server to allocate a corresponding page number for the any data node, and returning the page number allocated for the any data node to the first sub-server.
The first sub-server first detects whether the number of the free page numbers recorded in the free page number list of the first memory slice area is greater than or equal to the number of each new data node, and if so, it indicates that the number of the free page numbers recorded in the free page number list of the first memory slice area is sufficient, and at this time, a free page number in the free page number list can be respectively allocated to each new data node. If not, indicating that the free page numbers recorded in the free page number list of the first memory area are insufficient, dividing the new data nodes into a first data node set and a second data node set according to the number of the free page numbers, wherein the number of the data nodes contained in the first data node set is equal to the number of the free page numbers, so that a free page number can be respectively allocated to each data node contained in the first data node set; at this time, for any data node in the second data node set, it may be detected whether the minimum available page number of the first memory chip area has exceeded the upper limit of the page number range corresponding to the first memory chip area, if not, the minimum available page number may be allocated to the any data node, and then the minimum available page number is incremented. For example, assuming that the minimum available page number stored in the first memory slice region is 198 and the corresponding page number range is 100-199, for a first data node in the second set of data nodes, since the current minimum available page number 198 does not exceed the upper limit of page number 199 of the page number range, the page number 198 may be assigned to the first data node, and then the minimum available page number is incremented to 199; for a second data node in the second set of data nodes, page number 199 may be assigned to the second data node since the current minimum available page number 199 also does not exceed the upper page number limit 199 of the page number range, and then the minimum available page number is incremented to 200; for the third data node in the second data node set, since the current minimum available page number 200 has exceeded the upper limit 199 of the page number range, the minimum available page number cannot be allocated to the third data node at this time, and in addition, there is no allocable free page number in the free page number list of the first memory slice area, so that it indicates that there is no way for the first memory slice area to allocate a page number.
If the first memory chip area has no means for allocating a page number, the first sub-server may select another memory chip area provided in the server system in a predetermined manner (e.g. randomly or incrementally according to the number of the memory chip area) and use the selected memory chip area as the second memory chip area. The data of the corresponding data node is then sent to the sub-server where the second memory slice region is located, here denoted as second sub-server. For example, in fig. 3, if the page number is not allocated to the memory slice region 2, the sub-server 2 may determine a new memory slice region (assumed to be the memory slice region 3), and the sub-server 2 may send the data of the corresponding data node to the sub-server where the memory slice region 3 is located, that is, the sub-server 3. It should be noted that the determined new memory slice area may also be a memory slice area provided by the first sub-server, and at this time, the step of sending data to another sub-server is not involved, and the first sub-server may determine whether the new memory slice area (the new memory slice area also has its own minimum available page number and free page number list) can be allocated with a page number in the same manner as the first memory slice area. After the first sub-server sends the data of the corresponding data node to a second sub-server where the second memory slice area is located, the second sub-server will determine whether the second memory slice area can allocate a page number in the same manner, and if the page number is successfully allocated, the second sub-server will return the allocated page number result to the first sub-server. If the second memory area of the second sub-server does not allocate a page number, the second sub-server will continue to send the data of the corresponding data node to a sub-server (for example, a third sub-server) where another memory area is located, and so on until the page number is successfully allocated to the data node, and the successfully allocated page number will sequentially pass through each sub-server and return to the first sub-server.
Further, the determining the second memory slice area may include:
(1) Acquiring a global maximum page number, wherein the global maximum page number is the maximum value in the upper limit of the page number of each page number range corresponding to each memory chip area;
(2) And searching a memory slice area with the upper limit of the page number of the corresponding page number range equal to the global maximum page number from each memory slice area to serve as the second memory slice area.
In the foregoing description, the main server and each sub server maintain a global maximum page number MaxDataID, which will be described herein as its role, and the global maximum page number represents the maximum value in the upper limit of page numbers of each page number range corresponding to all the memory slice areas. When one memory area can not allocate page number, the data of the data node is transmitted to other memory areas, and other memory areas are allocated with page numbers. It may be specified that forwarding is performed to a memory slice area with a global maximum page number (i.e. the upper limit of the page number of the corresponding page number range is equal to the global maximum page number), since the memory slice area with the global maximum page number generally stores the least data, i.e. is most likely to be allocated the page number.
Fig. 5 is a schematic diagram illustrating the operation of determining the forwarded memory slice area according to the global maximum page number. In fig. 5, the storage capacity of each memory slice area is 10, the global maximum page number MaxDataID =59, and each sub server maintains the global maximum page number, when the memory slice area 2 in the sub server 2 cannot allocate a page number, the sub server 2 can know that the memory slice area with the global maximum page number 59 is the memory slice area 6 through interaction with each sub server, and the memory slice area 6 is distributed on the sub server 1, so that the sub server 2 forwards the data of the corresponding data node to the sub server 1, and the sub server 1 allocates a page number through the memory slice area 6.
Furthermore, each memory chip area stores a target number of data pages, each memory chip area is numbered according to the page number range corresponding to the respective stored data page from small to large, each sub-server is numbered, and each memory chip area is sequentially distributed to each sub-server arranged according to the number from small to large according to the sequence of the numbers from small to large for storage; after sending the data of the arbitrary data node to the second sub server where the second memory slice region is located, the method may further include:
(1) If the free page number list of the second memory area has no free page number, and the minimum available page number of the second memory area exceeds the global maximum page number, the second sub-server adds the global maximum page number and the target number to obtain an updated global maximum page number;
(2) And the second sub-server determines a third sub-server corresponding to the updated global maximum page number, and sends the updated global maximum page number to the third sub-server to indicate that the third sub-server creates a new memory area with a page number upper limit of a corresponding page number range equal to the updated global maximum page number, uses the new memory area to allocate a corresponding page number for any data node, and returns the page number allocated for any data node to the second sub-server.
If the second memory slice area with the global maximum page number can not allocate the page number (namely, no free page number exists in the free page number list of the second memory slice area, and the minimum available page number of the second memory slice area exceeds the global maximum page number), it indicates that a new memory slice area needs to be created. At this time, the second sub-server in which the second memory slice area is located adds the global maximum page number and the target number (i.e., the storage capacity of each memory slice area) to obtain an updated global maximum page number. According to the distribution rule of the memory slice area shown in fig. 3, the second sub-server may calculate a sub-server (represented by a third sub-server) corresponding to the updated global maximum page number, and then send the data of the corresponding data node and the updated global maximum page number to the third sub-server. After receiving the updated global maximum page number, the third sub-server finds that the page number is larger than the original global maximum page number stored locally, which indicates that a new memory slice area needs to be created, so the third sub-server creates a new memory slice area with a page number range upper limit equal to the updated global maximum page number, then uses the new memory slice area to allocate a corresponding page number to a corresponding data node, returns the allocated page number result to the second sub-server, and then returns the page number result to the first sub-server by the second sub-server.
As shown in fig. 6, the operation diagram for updating the global maximum page number and creating a new memory slice area is shown. In fig. 6, a main server and 5 sub-servers (sub-server 1-sub-server 5) are included, the initial global maximum page number is 29, and the main server and 5 sub-servers both maintain the global maximum page number. The upper part of fig. 6 shows the state before creation of a chip area, when there are 3 chip areas in the system, respectively chip area 1 (page number range: 0-9), chip area 2 (page number range: 10-19) and chip area 3 (page number range: 20-29), each having a storage capacity of 10. When the page number is not allocated in the memory slice area 1 or the memory slice area 2, the data of the corresponding data node is forwarded to the memory slice area 3 with the global maximum page number 29. The lower part of fig. 6 shows the state after the creation of the memory slice area, and if the memory slice area 3 cannot allocate a page number, the sub-server 3 adds the global maximum page number 29 and the memory capacity 10 to obtain an updated global maximum page number 39, and replaces the local original global maximum page number 29. In addition, the updated global maximum page number 39 can be calculated to be located in the sub-server 4 according to the distribution rule of the page number range, so that the sub-server 3 sends the data of the corresponding data node and the updated global maximum page number 39 to the sub-server 4. After receiving the updated global maximum page number 39, the sub-server 4 finds that it is larger than the global maximum page number 29 maintained by itself, and therefore, creates a memory slice area 4 with a page number range of 30-39, and updates the locally maintained global maximum page number 39, so that the automatic creation of the memory slice area and the update of the global maximum page number are completed.
However, at this time, the global maximum page number maintained by the sub server 1, the sub server 2, and the sub server 5 is still 29, and here, the global maximum page number maintained on the main server plays a role. When the main server receives the page number result distributed by the memory area 4 returned by the sub server 1 or the sub server 2, the main server finds that the page number result is larger than the global maximum page number 29 maintained locally, and at this time, the main server can know that the memory area is newly created and update the global maximum page number stored locally to 39. And then, the main server sends the updated global maximum page number 39 to each sub-server, and the sub-server 1, the sub-server 2 and the sub-server 5 update the locally stored global maximum page number 39 after receiving the global maximum page number 39 sent by the main server, so that the synchronization of the global maximum page number in the system is realized.
For the step (5), the first sub-server integrates the key value, the hash value and the corresponding page number of each new data node to obtain a corresponding data node splitting result.
406. The first sub server returns the data node splitting result to the main server;
the first sub-server replies the obtained data node splitting result to an upper layer, namely sends the data node splitting result to the main server, and then the main server executes corresponding index node splitting operation.
407. And after receiving the data node splitting result, the main server executes index node splitting operation of the Merckel B + tree according to the data node splitting result so as to complete the operation of adding the key value pair to the distributed database.
After receiving the data node splitting result (mainly including the key, the page number, and the hash value of the split data node) returned by the first sub server, the main server may perform a corresponding splitting operation on the index node according to the data node splitting result. After the index node splitting is completed, the operation of adding the key-value pair to the distributed database is completed.
Fig. 7 is a schematic diagram illustrating operations of adding key-value pairs to a distributed database according to an embodiment of the present application. In fig. 7, a batch of key value pairs are first inserted into the root node of the merkel B + tree, and then the key value pairs are distributed according to the range of the index nodes until the key value pairs are distributed to the index nodes in the last layer (leaf layer), and the operations are completed by the main server. Next, the main server will send the key-value pairs and the page numbers of the corresponding data nodes to the remote corresponding sub-servers 1, and the sub-servers 1 perform the key-value pair merging and data node splitting operations. In the process of executing the data node splitting operation, if the sub-server 1 encounters a situation that the memory slice region 1 corresponding to the page number of the corresponding data node cannot allocate a page number to the new data node, another memory slice region 2 may be determined (here, a mode of random selection, incremental selection according to the memory slice region number, or the aforementioned selection of a memory slice region with a global maximum page number, or the like may be adopted), the data of the new data node is sent to the sub-server 2 where the memory slice region 2 is located, the sub-server 2 completes the page number allocating operation by using the memory slice region 2, and an obtained page number result is returned to the sub-server 1. Then, the sub-server 1 returns the data node splitting result to the main server, and then the main server executes the corresponding index node splitting operation, and finally completes the addition of the key value pair to the distributed database.
The following describes the operation process of adding key-value pairs to a database in conjunction with an example. As shown in FIG. 8, there is an initial state of the Mercker B + tree, which includes root nodes [ a1, B1, c2], index node [ a1], index node [ B1], index node [ c2], data nodes [ a1, a2], data node [ B1], and data node [ c2].
Now 3 key-value pairs of [ a3], [ B2] and [ c1] need to be added to the merkel B + tree shown in fig. 8, and first, as shown in fig. 9, the distribution and merging operations of the key-value pairs are performed. In fig. 9, starting from the root node [ a1, b1, c2], key value pairs are distributed to the nodes below in sequence, if the distributed nodes are index nodes, the distribution continues to be performed, and if the distributed nodes are data nodes, the merging operation of the key value pairs is performed. For example, at the far right of FIG. 9, inserted [ a3] is incorporated into data nodes [ a1, a2], inserted [ b2] is incorporated into data node [ b1], and inserted [ c1] is incorporated into data node [ c2].
After completion of the key-value pair distribution and merging operations, node splitting operations (including data node splitting operations and index node splitting operations) are next performed as shown in FIG. 10. In fig. 10, the data nodes from the bottom layer are sequentially split upward. If the data size of the data node exceeds the threshold, splitting the data node is performed to ensure that the data of each node can be written into one data page, and a new node obtained after splitting needs to allocate a corresponding page number (see the foregoing description for a specific page number allocation method). It can be seen that the data node [ a1, a2, a3] in fig. 10 is split into two data nodes [ a1, a2] and [ a3], and then the splitting is continued upwards, and the index node [ a1] is also split into [ a1, a3] correspondingly until the root node is split, and the visible root node is split into [ a1, b1, c1] from [ a1, b1, c2], that is, the index is updated. In the embodiment of the application, the distribution operation of the key-value pairs and the splitting operation of the index nodes are executed by the main server, and the merging operation of the key-value pairs and the splitting operation of the data nodes are executed by the sub-servers, so that the data transmission pressure generated in the adding operation of the key-value pairs is shared.
In the server system architecture of the embodiment of the application, if data of the distributed database needs to be queried, the primary server receives a data query request, and assuming that a key value to be queried is a key value corresponding to a target key, the primary server can determine a data node to be queried according to the target key, where the data node is represented by a specified data node. The main server may obtain a page number corresponding to the specified data node, where the page number is represented by the specified page number, and then may find the sub-server where the data of the specified data node is located from the connected sub-servers according to the above-mentioned indexing method for the sub-servers by the page number, where the sub-server is represented by the fourth sub-server. Next, the main server sends a key value query request carrying the target key and the specified page number to the fourth sub-server. After receiving the key value query request, the fourth sub-server finds a corresponding memory chip area according to the specified page number and the maintained memory chip area information, queries a data page with the page number stored in the memory chip area as the specified page number, searches a target key value corresponding to the target key from the data page, and returns the target key value to the main server, so that the main server can output the target key value to complete the data query operation.
In terms of multi-version control and data rollback of the database, since the merkel B + tree is equivalent to a plurality of logical trees stored in a disk, when data needs to be rolled back to a certain version, only the tree root needs to be pointed to the tree root of the stored certain tree. If a plurality of logic trees are required to be saved, a free page number list, a to-be-released page number list and a maximum page ID are required to be set. Assuming that the database supports only two versions of data rollback, when transaction 4 is completed, at most, it can only be rolled back to the state of transaction 3, which is equivalent to two trees with only transactions 3 and 4 in the database. Fig. 11 is a schematic diagram illustrating a data rollback operation performed on a database. In fig. 11, the initial page numbers of 3 nodes corresponding to transaction 1 are 1, 2, and 3, respectively, at this time, the maximum page ID is 4, and both the free page number list and the to-be-released page number list are empty; executing a transaction 2 after the transaction 1 is finished, and after the transaction 2 updates key values, dirtying the nodes with the page numbers 2 and 3, wherein the node with the page number 2 is split into two nodes, and because an idle page number list and a page number list to be released are empty, a page number can be allocated to each node only in a mode of increasing the maximum page ID; the allocated page numbers are 4, 5, 6 and 7, the maximum page ID is changed into 8, the recycled page numbers 1-3 are not released immediately (namely enter a free page number list), but are put into a page number list to be released first; when a transaction 3 starts, because only two versions of data are supported for rolling back, at this time, a page number to be released of a transaction 1 in a page number list to be released needs to be released, at this time, page numbers 1, 2 and 3 are put into a free page number list, a node 6 becomes dirty after the update of the transaction 3 is completed (correspondingly, a root node 7 also becomes dirty), at this time, a page number 1 and a page number 2 can be taken from the free page number list for allocation, and the node 6 is put into the page number list to be released; the final result is: the free page number list stores page number 3, and the to-be-released page number list stores page number 6 and page number 7 corresponding to the transaction 2. At this time, if it needs to roll back to the transaction 2, since the page number of the root node that has recorded the transaction 2 is the page number 7, the free page number list of the transaction 2, and the to-be-released page number list, it is only necessary to point the root node to the page number 7, replace the current free page number list with the free page number list corresponding to the transaction 2, and replace the current to-be-released page number list with the to-be-released page number list of the transaction 2. Theoretically, data rollback of an infinite number of versions can be supported, only the page number list to be released is longer, and multi-version control and data rollback of the database are achieved through the technical means.
When multi-version control and data rollback are applied to the server system provided in the embodiment of the present application, because the main server and each sub-server store a data file or an index file, and can record the respective maximum page ID, free page number list, and to-be-released page number list, when data rollback occurs, the main server only needs to point the tree root to the tree root of a certain stored tree, and send a data rollback request (including a rollback transaction ID) to each underlying sub-server, and after receiving the data rollback request, each underlying sub-server respectively replaces the local current free page number list and to-be-released page number list with the free page number list and to-be-released page number list corresponding to the rollback transaction ID. Here, each of the memory slice areas provided by a sub-server has a corresponding free page number list and a to-be-released page number list, and therefore each memory slice area performs its own list replacement operation.
In addition, the server system provided by the embodiment of the application also supports horizontal extension of the sub-servers, namely, the connected sub-servers are added on the basis of the original sub-servers. In actual operation, only the main server needs to configure the relevant information of the sub-server to be added, such as the IP address and the port number, then update the mapping information described above, and synchronize the updated mapping information to each sub-server.
In the embodiment of the application, a main server and at least one sub-server are provided, and considering that a data node of a merkel B + tree stores key-value pair data with large data volume, and an index node only stores index information with small data volume, the data of the data node is divided into sub-servers for storage, wherein each sub-server is provided with at least zero memory chip regions, and each memory chip region stores a set number of data pages for storing the data of the corresponding data node. When a key value pair needs to be added to the database, the primary server determines a data node into which the key value pair needs to be inserted, and then sends a page number and the key value pair corresponding to the data node to a sub-server where data of the data node is located. After receiving the page number and the key value pair, the sub-server executes key value pair combination and data node splitting operation on the data node, and returns an obtained data node splitting result to the main server. And finally, the main server executes the splitting operation of the index nodes according to the splitting result of the data nodes, thereby finishing the operation of adding key value pairs to the database. By the arrangement, the data transmission pressure of the main server can be distributed to the sub servers when the database is operated, so that the data transmission pressure brought to each server is reduced.
To sum up, the data transmission pressure of the main server is distributed to each sub-server, and the data of the data nodes are distributed to each sub-server relatively evenly, so that the data read-write quantity of each sub-server is relatively uniform. The index nodes of the main server distribute key value pairs to the lower layer during updating, and the data nodes are dispersed on each sub-server, so that the data node splitting operation executed by each sub-server is concurrent, and the updating operation concurrency of the whole database is high. In addition, the server system provided by the embodiment of the application also supports horizontal extension, namely, a sub-server can be conveniently added.
It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present application.
The above mainly describes an operation method of a distributed database, and an operation apparatus of a distributed database will be described below.
Referring to fig. 12, an embodiment of an operating apparatus for a distributed database applied to a host server in an embodiment of the present application includes:
a key-value pair obtaining module 1201, configured to obtain a key-value pair to be added to the distributed database; the distributed database is based on a Mercker B + tree, the main server is connected with at least one sub-server, each sub-server is provided with at least zero memory slice regions, the total number of the memory slice regions of all the sub-servers is at least 1, each memory slice region stores a set number of data pages, and the data pages are used for storing data of data nodes of the Mercker B + tree;
a data node determining module 1202, configured to determine, according to the key of the key value pair, a target data node in the merkel B + tree into which the key value pair needs to be inserted;
a sub-server determining module 1203, configured to determine, according to a target page number corresponding to the target data node, a first sub-server where data of the target data node is located from the at least one sub-server; the target page number is the number of a target data page, the target data page stores the data of the target data node, and a first memory chip area of the first sub-server stores the target data page;
a key-value pair sending module 1204, configured to send the key-value pair and the target page number to the first sub-server, so as to instruct the first sub-server to perform key-value pair merging and data node splitting operations on the target data node according to the key-value pair and the target page number, obtain a data node splitting result, and return the data node splitting result to the main server;
an index node splitting module 1205, configured to, after receiving the data node splitting result, execute an index node splitting operation of the merkel B + tree according to the data node splitting result, so as to complete an operation of adding the key value pair to the distributed database.
In an implementation manner of the embodiment of the application, each memory slice area stores a target number of data pages, each memory slice area is numbered according to a page number range corresponding to the data page stored in the memory slice area from small to large, each sub-server is numbered, and each memory slice area is sequentially distributed to each sub-server arranged according to the numbers from small to large according to the sequence of the numbers from small to large for storage; the sub server determination module may include:
the first modular operation unit is used for carrying out modular operation on the target page number and the target number to obtain a first numerical value;
the second modular arithmetic unit is used for carrying out modular arithmetic on the first numerical value and the quantity of each sub-server to obtain a second numerical value;
and the sub-server determining unit is used for determining the sub-servers which are numbered as the second numerical value in each sub-server as the first sub-server.
Further, the sub-server determining module may further include:
the mapping information acquisition unit is used for acquiring preset mapping information which records the corresponding relation between each preset page number range and the number of the servers;
a page number range determining unit, configured to determine a target page number range in which the target page number is located in each preset page number range;
and the server number searching unit is used for searching the number of the servers corresponding to the target page number range from the mapping information as the number of each sub-server.
Referring to fig. 13, an embodiment of an operating apparatus applied to a distributed database of a first sub-server in an embodiment of the present application includes:
a key-value pair receiving module 1301, configured to receive a target page number sent by a host server and a key-value pair to be added to a distributed database; the distributed database is based on a merkel B + tree, the main server is connected with at least one sub-server, each sub-server is provided with at least zero memory slice regions, the total number of the memory slice regions of all the sub-servers is at least 1, each memory slice region stores a set number of data pages, the data pages are used for storing data of data nodes of the merkel B + tree, the first sub-server is a sub-server where data of a target data node determined by the main server from the at least one sub-server is located, the target page number is the number of the target data page, the target data page stores data of the target data node, the first memory slice region of the first sub-server stores the target data page, and the target data node is a data node in the merkel B + tree, which the main server needs to insert, and is determined according to the key of the key pair;
a data node splitting module 1302, configured to perform key-value pair merging and data node splitting operations on the target data node according to the key-value pair and the target page number, so as to obtain a data node splitting result;
a splitting result returning module 1303, configured to return the data node splitting result to the main server, so as to instruct the main server to perform, according to the data node splitting result, an index node splitting operation of the merkel B + tree after receiving the data node splitting result, so as to complete an operation of adding the key value pair to the distributed database.
In an implementation manner of the embodiment of the present application, the data node splitting module may include:
a storage area determining unit, configured to determine the first storage area from all storage areas of the first sub-server according to the target page number and the storage area information of the first sub-server; the memory slice area information records a page number range corresponding to each memory slice area arranged in the first sub server;
a key-value pair merging unit, configured to distribute the key-value pair to the target data node stored in the first memory slice region, and perform a key-value pair merging operation on the target data node;
the data node splitting unit is used for splitting the target data node into at least two new data nodes if the data volume of the target data node after the key-value pair merging operation is executed exceeds a set threshold value, so that the data of each new data node can be written into one data page;
a page number allocation unit, configured to allocate a corresponding page number to each new data node, and write data of each new data node into the data page corresponding to the respective page number;
and the splitting result determining unit is used for determining the splitting result of the data node according to the key value, the hash value and the corresponding page number of each new data node.
Further, the first sub server records a minimum available page number and a free page number list of each set memory slice area; the minimum available page number is equal to the maximum page number which is currently allocated to the data node in the corresponding memory chip area and is added by one, and the free page number list records the free page number which is currently allocated to the data node in the corresponding memory chip area; the page number allocation unit may include:
a free page number detection subunit, configured to detect whether the number of free page numbers recorded in a free page number list of the first memory area is greater than or equal to the number of each new data node;
a first page number allocation subunit, configured to allocate one free page number to each new data node if the number of free page numbers recorded in the free page number list of the first memory slice area is greater than or equal to the number of each new data node;
a second page number allocation subunit, configured to divide each new data node into a first data node set and a second data node set if the number of free page numbers recorded in the free page number list of the first memory slice area is smaller than the number of each new data node; wherein the number of data nodes contained in the first set of data nodes is equal to the number of free page numbers; allocating one free page number to each data node contained in the first data node set; detecting whether the minimum available page number of the first memory slice area exceeds the upper limit of the page number range corresponding to the first memory slice area or not aiming at any data node contained in the second data node set; if the minimum available page number of the first memory slice area does not exceed the upper limit of the page number, allocating the minimum available page number of the first memory slice area to the arbitrary data node, and incrementing the minimum available page number of the first memory slice area; if the minimum available page number of the first memory slice area exceeds the upper limit of the page number, determining a second memory slice area, sending the data of any data node to a second sub-server where the second memory slice area is located, indicating the second sub-server to allocate a corresponding page number for the any data node, and returning the page number allocated for the any data node to the first sub-server.
Still further, the second page number allocation subunit may include:
a global maximum page number obtaining subunit, configured to obtain a global maximum page number, where the global maximum page number is a maximum value in a page number upper limit of each page number range corresponding to each memory slice region;
and the memory slice area determining subunit is used for searching a memory slice area, of which the upper limit of the page number of the corresponding page number range is equal to the global maximum page number, from each memory slice area, and using the memory slice area as the second memory slice area.
Furthermore, each memory chip area stores a target number of data pages, each memory chip area is numbered according to the page number range corresponding to the respective stored data page from small to large, each sub-server is numbered, and each memory chip area is sequentially distributed to each sub-server arranged according to the number from small to large according to the sequence of the numbers from small to large for storage; the second page number allocation subunit may further include:
a global maximum page number updating subunit, configured to add the global maximum page number and the target number to obtain an updated global maximum page number if there is no free page number in the free page number list of the second memory slice area and the minimum available page number of the second memory slice area exceeds the global maximum page number;
and the memory slice area creating subunit is used for determining a third sub-server corresponding to the updated global maximum page number, sending the updated global maximum page number to the third sub-server to indicate that the page number upper limit of the page number range corresponding to the third sub-server creation is equal to the new memory slice area of the updated global maximum page number, using the new memory slice area to allocate a corresponding page number to any data node, and returning the page number allocated to any data node to the second sub-server.
The embodiments of the present application further provide a computer-readable storage medium, where a computer program is stored, and when the computer program is executed by a processor, the computer program implements the operation method of the distributed database described in any of the above embodiments.
The embodiments of the present application further provide a computer program product, when the computer program product runs on a server, the server is caused to execute the operation method of the distributed database described in any of the above embodiments.
Fig. 14 is a schematic diagram of a server according to an embodiment of the present application. As shown in fig. 14, the server 14 of this embodiment includes: a processor 140, a memory 141, and a computer program 142 stored in the memory 141 and executable on the processor 140. The processor 140, when executing the computer program 142, implements the steps in the embodiments of the operation method of each distributed database described above, such as the steps 401 to 407 shown in fig. 4. Alternatively, the processor 140 implements the functions of the modules/units in the above device embodiments when executing the computer program 142, for example, the functions of the modules 1201 to 1205 shown in fig. 12 or the functions of the modules 1301 to 1303 shown in fig. 13.
The computer program 142 may be partitioned into one or more modules/units that are stored in the memory 141 and executed by the processor 140 to accomplish the present application. The one or more modules/units may be a series of computer program instruction segments capable of performing specific functions, which are used to describe the execution of the computer program 142 in the server 14.
The Processor 140 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The storage 141 may be an internal storage unit of the server 14, such as a hard disk or a memory of the server 14. The memory 141 may also be an external storage device of the server 14, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, which are provided on the server 14. Further, the memory 141 may also include both an internal storage unit of the server 14 and an external storage device. The memory 141 is used for storing the computer program and other programs and data required by the server. The memory 141 may also be used to temporarily store data that has been output or is to be output.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the above embodiments, the description of each embodiment has its own emphasis, and reference may be made to the related description of other embodiments for parts that are not described or recited in any embodiment.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described system embodiments are merely illustrative, and for example, the division of the modules or units is only one logical division, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, indirect coupling or communication connection of devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one position, or may be distributed on multiple network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiments of the present application.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, all or part of the flow in the method of the embodiments described above can be realized by a computer program, which can be stored in a computer-readable storage medium and can realize the steps of the embodiments of the methods described above when the computer program is executed by a processor. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, read-Only Memory (ROM), random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like. It should be noted that the computer readable medium may contain other components which may be suitably increased or decreased as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media which may not include electrical carrier signals and telecommunications signals in accordance with legislation and patent practice.
The above-mentioned embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not depart from the spirit and scope of the embodiments of the present application, and they should be construed as being included in the present application.

Claims (10)

1. An operation method of a distributed database, applied to a main server, is characterized in that the operation method comprises:
obtaining key value pairs to be added to a distributed database; the distributed database is based on a Mercker B + tree, the main server is connected with at least one sub-server, each sub-server is provided with at least zero memory slice regions, the total number of the memory slice regions of all the sub-servers is at least 1, each memory slice region stores a set number of data pages, and the data pages are used for storing data of data nodes of the Mercker B + tree;
determining a target data node in the Mercker B + tree, into which the key value pair needs to be inserted, according to the key of the key value pair;
determining a first sub-server where the data of the target data node is located from the at least one sub-server according to the target page number corresponding to the target data node; the target page number is the number of a target data page, the target data page stores the data of the target data node, and a first memory chip area of the first sub-server stores the target data page;
sending the key value pair and the target page number to the first sub-server to instruct the first sub-server to execute key value pair combination and data node splitting operation on the target data node according to the key value pair and the target page number to obtain a data node splitting result, and returning the data node splitting result to the main server;
and after the data node splitting result is received, executing index node splitting operation of the Mercker B + tree according to the data node splitting result so as to finish the operation of adding the key value pair to the distributed database.
2. The operation method according to claim 1, wherein each of the memory chip regions stores a target number of data pages, each of the memory chip regions has been numbered from small to large according to a page number range corresponding to the respective stored data page, each of the sub-servers has completed numbering, and each of the memory chip regions is sequentially distributed to each of the sub-servers arranged from small to large according to the sequence of numbering from small to large for storage; the determining, according to the target page number corresponding to the target data node, a first sub-server where the data of the target data node is located from the at least one sub-server includes:
performing modular operation on the target page number and the target number to obtain a first numerical value;
performing modular operation on the first numerical value and the number of the sub-servers to obtain a second numerical value;
and determining the sub-server which is numbered as the second numerical value in each sub-server as the first sub-server.
3. The method of claim 2, wherein before performing a modulo operation on the first value and the number of each of the sub-servers to obtain a second value, further comprising:
acquiring preset mapping information, wherein the mapping information records the corresponding relation between each preset page number range and the number of servers;
determining a target page number range of the target page number in each preset page number range;
and searching the number of servers corresponding to the target page number range from the mapping information as the number of each sub-server.
4. An operation method of a distributed database, applied to a first sub-server, is characterized in that the operation method includes:
receiving a target page number sent by a main server and a key value pair to be added to a distributed database; the distributed data 5 database is based on a merkel B + tree, the main server is connected with at least one sub server, each sub server is provided with at least zero memory chip areas, the total number of the memory chip areas of all the sub servers is at least 1, each memory chip area stores a set number of data pages, each data page is used for storing data of data nodes of the merkel B + tree, the first sub server is a sub server where the data of a target data node determined by the main server from the at least one sub server is located, the target page number is the number of the target data page, the target data page stores data of 0 point of the target data node, the first memory chip area of the first sub server stores the target data page, the target data node is determined by the main server according to the key of the key pair, and the key pair is required to be inserted into the data nodes in the merkel B + tree;
executing key-value pair combination and data node splitting operation on the target data node according to the key-value pair and the target page number to obtain a data node splitting result;
and 5, returning the data node splitting result to the main server to indicate the main server to execute the index node splitting operation of the Mercker B + tree according to the data node splitting result after receiving the data node splitting result, so as to complete the operation of adding the key value pair to the distributed database.
5. The method of operation of claim 4 wherein said determining, based on said key-value pairs and said target page number,
executing key-value pair merging and data node splitting operation on the target data node to obtain a data node splitting result, wherein the key-value pair merging and data node splitting operation comprises the following steps: 0, determining the first memory slice area from all memory slice areas of the first sub-server according to the target page number and the memory slice area information of the first sub-server; the memory slice area information records a page number range corresponding to each memory slice area arranged in the first sub server;
distributing the key-value pairs to the target data nodes stored in the first memory slice area, and executing key-value pair merging operation on the target data nodes;
5 if the data volume of the target data node after the key-value pair merging operation is executed exceeds a set threshold, splitting the target data node into at least two new data nodes, so that the data of each new data node can be written into one data page;
distributing a corresponding page number for each new data node, and writing the data of each new data node into the data page corresponding to the respective page number;
and 0, determining the data node splitting result according to the key value, the hash value and the corresponding page number of each new data node.
6. The operating method according to claim 5, wherein the first sub server records a minimum available page number and a free page number list for each memory slice area provided; the minimum available page number is equal to the maximum page number which is currently allocated to the data node in the corresponding memory chip area and is added by one, and the free page number list records the free page number which is currently allocated to the data node in the corresponding memory chip area; the allocating a corresponding page number to each new data node includes:
detecting whether the number of the free page numbers recorded in the free page number list of the first memory area is greater than or equal to the number of each new data node;
if the number of the free page numbers recorded in the free page number list of the first memory area is larger than or equal to the number of each new data node, respectively allocating one free page number to each new data node;
if the number of the free page numbers recorded in the free page number list of the first memory area is less than the number of each new data node, dividing each new data node into a first data node set and a second data node set; wherein the number of data nodes contained in the first set of data nodes is equal to the number of free page numbers; allocating one free page number to each data node contained in the first data node set; detecting whether the minimum available page number of the first memory slice area exceeds the upper limit of the page number range corresponding to the first memory slice area or not aiming at any data node contained in the second data node set; if the minimum available page number of the first memory slice area does not exceed the upper limit of the page number, allocating the minimum available page number of the first memory slice area to any data node, and increasing the minimum available page number of the first memory slice area; if the minimum available page number of the first memory slice area exceeds the upper limit of the page number, determining a second memory slice area, sending the data of any data node to a second sub-server where the second memory slice area is located, indicating the second sub-server to allocate a corresponding page number to the any data node, and returning the page number allocated to the any data node to the first sub-server.
7. The method of operation of claim 6, wherein the determining a second memory slice region comprises:
acquiring a global maximum page number which is the maximum value in the upper limit of the page number of each page number range corresponding to each memory chip area;
and searching a memory slice area with the upper limit of the page number of the corresponding page number range equal to the global maximum page number from each memory slice area to serve as the second memory slice area.
8. The operation method according to claim 7, wherein each of the memory chip regions stores a target number of data pages, each of the memory chip regions has been numbered from small to large according to a page number range corresponding to the respective stored data page, each of the sub-servers has completed numbering, and each of the memory chip regions is sequentially distributed to each of the sub-servers arranged from small to large according to the sequence of numbering from small to large for storage; after sending the data of the arbitrary data node to the second sub server where the second storage slice region is located, the method further includes:
if the free page number does not exist in the free page number list of the second memory area and the minimum available page number of the second memory area exceeds the global maximum page number, the second sub-server adds the global maximum page number and the target number to obtain an updated global maximum page number;
and the second sub-server determines a third sub-server corresponding to the updated global maximum page number, and sends the updated global maximum page number to the third sub-server to indicate that the third sub-server creates a new memory area with a page number upper limit of a corresponding page number range equal to the updated global maximum page number, uses the new memory area to allocate a corresponding page number for any data node, and returns the page number allocated for any data node to the second sub-server.
9. A server comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the method of operation of any one of claims 1 to 3 or the method of operation of any one of claims 4 to 8 when executing the computer program.
10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the method of operation of one of the claims 1 to 3 or carries out the method of operation of one of the claims 4 to 8.
CN202211511184.9A 2022-11-29 2022-11-29 Operation method of distributed database, server and storage medium Pending CN115794823A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211511184.9A CN115794823A (en) 2022-11-29 2022-11-29 Operation method of distributed database, server and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211511184.9A CN115794823A (en) 2022-11-29 2022-11-29 Operation method of distributed database, server and storage medium

Publications (1)

Publication Number Publication Date
CN115794823A true CN115794823A (en) 2023-03-14

Family

ID=85443057

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211511184.9A Pending CN115794823A (en) 2022-11-29 2022-11-29 Operation method of distributed database, server and storage medium

Country Status (1)

Country Link
CN (1) CN115794823A (en)

Similar Documents

Publication Publication Date Title
JP5259404B2 (en) Cloning and managing database fragments
US20130254240A1 (en) Method of processing database, database processing apparatus, computer program product
US11100047B2 (en) Method, device and computer program product for deleting snapshots
CN102298633A (en) Method and system for investigating repeated data in distributed mass data
JP2014519100A (en) Distributed caching and cache analysis
CN114490674B (en) Partition table establishing method, partition table data writing method, partition table data reading method, partition table data writing device, partition table data reading device and partition table data reading device
CN103246549A (en) Method and system for data transfer
CN113835639B (en) I/O request processing method, device, equipment and readable storage medium
CN113392089B (en) Database index optimization method and readable storage medium
CN117573676A (en) Address processing method and device based on storage system, storage system and medium
CN103905512A (en) Data processing method and equipment
CN112527900A (en) Method, device, equipment and medium for database multi-copy reading consistency
CN115794823A (en) Operation method of distributed database, server and storage medium
CN116450607A (en) Data processing method, device and storage medium
CN115934713A (en) Operation method of distributed database, server and storage medium
CN110413617B (en) Method for dynamically adjusting hash table group according to size of data volume
CN112711627B (en) Data importing method, device and equipment of Greemplum database
CN111104435B (en) Metadata organization method, device and equipment and computer readable storage medium
CN116414843A (en) Data updating method and device
CN115904211A (en) Storage system, data processing method and related equipment
CN112148792A (en) Partition data adjusting method, system and terminal based on HBase
CN116069788B (en) Data processing method, database system, computer device, and storage medium
CN113127480A (en) Data management method and device
US10169250B2 (en) Method and apparatus method and apparatus for controlling access to a hash-based disk
CN116010430B (en) Data recovery method, database system, computer device, and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination