CN116932655B - Distributed key value database operation method and computer readable storage medium - Google Patents

Distributed key value database operation method and computer readable storage medium Download PDF

Info

Publication number
CN116932655B
CN116932655B CN202311195808.5A CN202311195808A CN116932655B CN 116932655 B CN116932655 B CN 116932655B CN 202311195808 A CN202311195808 A CN 202311195808A CN 116932655 B CN116932655 B CN 116932655B
Authority
CN
China
Prior art keywords
data
column
key
database
value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202311195808.5A
Other languages
Chinese (zh)
Other versions
CN116932655A (en
Inventor
陈远润
徐向阳
陈坚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chengdu Shanyan Technology Co ltd
Original Assignee
Chengdu Shanyan Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chengdu Shanyan Technology Co ltd filed Critical Chengdu Shanyan Technology Co ltd
Priority to CN202311195808.5A priority Critical patent/CN116932655B/en
Publication of CN116932655A publication Critical patent/CN116932655A/en
Application granted granted Critical
Publication of CN116932655B publication Critical patent/CN116932655B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/221Column-oriented storage; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/466Transaction processing

Abstract

A method of distributed key-value database operation and a computer-readable storage medium, the method comprising: creating a single row multi-column transaction operation, setting a single key and a plurality of columns associated with the key in a single issued database operation request, wherein each column stores a value associated with the key, and each value corresponds to a certain operation on the key so as to simultaneously execute a plurality of operations through the single database operation request; setting a plurality of column indexes in a distributed key value database according to different characteristics of each value, wherein each column index stores a value corresponding to an operation; establishing a mapping relation among a plurality of column indexes, wherein the mapping relation is used for constraining key value data among the plurality of column indexes to be always on the same data fragment; therefore, one-stage submitting of the database transaction is realized, the performance loss caused by adopting two-stage submitting in the process of submitting the transaction to the distributed key value database in the past is avoided, and the network interaction times are greatly reduced.

Description

Distributed key value database operation method and computer readable storage medium
Technical Field
The present application relates to database technologies, and in particular, to a method for operating a distributed key-value database.
Background
Object storage is a data storage method in which data is organized by object and the object is the smallest operation unit. Distributed object storage may provide a plurality of data storage nodes that form a data storage cluster that together provide object storage services. Data nodes and metadata nodes are generally arranged in the data storage cluster, and the data nodes store actual data of the objects; metadata nodes are typically served by a distributed key-value database that stores metadata such as attributes, relationships, histories, etc. of objects.
The distributed key value database is a novel database which takes keys as indexes and is used for storing a certain key and a corresponding value thereof. The distributed key value database is generally composed of a plurality of database nodes, after the stored keys are ordered according to a certain rule, the key value data with a certain size is taken as a group, the key value data is divided into a plurality of data fragments according to the group, and then the data are stored on the plurality of database nodes in a scattered mode according to the data fragments as a unit. The keys are stored in sequence inside the data fragments, each data fragment has an upper bound and a lower bound of the key, and when new key value data is written, the key value data is determined on which data fragment the key value data should fall according to the upper bound and the lower bound of the fragment. Since the fragments are stored sequentially inside, there is a high probability that multiple key-value data with the same prefix fall on the same data fragment, while key-value data with different prefixes may fall on different data fragments.
When a distributed object storage system stores data on a distributed key value database, the data of each object is generally responsible for storage by a plurality of key value pairs, in order to ensure the atomicity of operation on the object, the data of the object is generally required to be operated by using a distributed key value database transaction, and the plurality of key value data of the object are respectively operated in the same transaction and then submitted together.
The single data fragment of the distributed key value database can be independently operated, and a plurality of operations on the single data fragment can be aggregated to realize atomic writing; if the data of an operation falls on multiple different pieces of data, the atomicity of execution of the multiple operations cannot be guaranteed. Thus, for a transaction, when multiple keys are operated on by a transaction, since the operation across multiple data slices cannot achieve atomicity, in order to satisfy the ACID constraint of the database transaction, the transaction commit requires two phases, i.e., two-phase commit, the process of which is shown in fig. 1:
stage one, pre-writing: the client of the distributed key value database writes all key value operations related in the transaction into each data fragment in turn, checks whether other transactions are operating the key value data at the same time in the writing process, if so, indicates that the transaction conflict occurs, and needs to execute the rollback of the database at the moment; if not, the pre-write phase is completed;
stage two, submitting: the client of the distributed key-value database notifies all the data fragments which have completed the pre-writing stage, marks the pre-written key-value data as committed, and the transaction commit is completed at this time, so that the modification of the key-value data is effective.
It can be seen that committing a transaction requires at least 2 network interactions with each data slice, resulting in a greater latency for transaction commit.
If all of the key-value data for a transaction is on the same data fragment, because multiple operations for a single fragment may be aggregated for atomic execution, the pre-write and commit may be combined into a single request issue directly to the data fragment, where only one phase of operation, i.e., a one-phase commit, is required to complete a commit for a database transaction, as shown in FIG. 2.
The network interaction times of the transaction submitting stage can be interacted with a plurality of data fragments by the transaction in one stage, and the interaction times of the client and each data fragment are 2 when the transaction is submitted in two stages, so that the network interaction times of the client and a single data fragment are only needed to be interacted for 1 time, the network interaction times are greatly reduced, and the time delay is reduced.
In order to ensure reliability, accessibility, and high availability of data, distributed object storage typically requires the provision of multiple data storage clusters with data synchronization between the individual storage clusters. One way to synchronize objects across multiple data storage clusters is to issue an operation to an object while recording the operation itself in time sequence, in addition to actually acting on the object. All operations of an object in a certain time period are recorded for the operations of the object.
The flow of object data synchronization between distributed object storage multiple clusters is shown in fig. 3. When object data are synchronized among a plurality of data storage clusters, the synchronized target cluster scans operation records of the object on the source cluster in time sequence, then each operation record is played back on the target cluster in time sequence, and after all the operation records are played back on the target cluster, the target cluster has data consistent with the original cluster, and the data synchronization is completed. Operational records that are successfully played back on the target cluster need to be removed to free up storage space. In order to ensure that playback operation records can be scanned quickly, and that operation records are deleted quickly after synchronization is completed, the operation records need to be stored continuously in time series.
The object data and the operation records of the object are stored on the distributed key value database, and as the operation records of the object data and the object cannot be ensured, under the condition that the operation records of the object are stored in time sequence and the object data are stored in other non-time sequences, the object data and the operation records are always stored on a single data partition, so that database transactions of the operation object data and the operation records must be submitted in two stages at the same time, and one-stage commit after optimization cannot be adopted.
As shown in FIG. 4, one approach to creating an operation record for an object operation is to create a transaction of a distributed key database each time a user issues an operation on an object stored by a distributed object, respectively manipulate the object data and create the operation record within the same transaction, and then commit the database transaction. After the transaction is submitted, the operation on the object is completed, and the result is returned to the user. Since the object of the operation and the created operation record are not guaranteed to be on the same data partition of the distributed key value database, the transaction is submitted in two stages, namely, two-stage commit. Since two-phase commit is more network interactions than one-phase commit, the performance of the object operation is lost.
In summary, when an object operating a distributed object store system needs to issue multiple database operations, performance loss is introduced by adopting two-stage commit in the process of committing transactions to the distributed key-value database.
It should be noted that the information disclosed in the above background section is only for understanding the background of the application and thus may include information that does not form the prior art that is already known to those of ordinary skill in the art.
Disclosure of Invention
The present application is directed to overcoming the above-mentioned drawbacks of the related art, and providing a method for operating a distributed key-value database.
In order to achieve the above purpose, the present application adopts the following technical scheme:
a method of distributed key-value database operation, comprising:
creating a single row multi-column transaction operation, setting a single key and a plurality of columns associated with the key in a single issued database operation request, wherein each column stores a value associated with the key, and each value corresponds to a certain operation on the key so as to simultaneously execute a plurality of operations through the single database operation request;
setting a plurality of column indexes in a distributed key value database according to different characteristics of each value, wherein each column index stores a value corresponding to an operation;
establishing a mapping relation among a plurality of column indexes, wherein the mapping relation is used for constraining key value data among the plurality of column indexes to be always on the same data fragment;
thus, a one-phase commit of the database transaction is achieved.
Further, when some key value data are migrated from one data slice to another data slice, according to the mapping relation, the key value data which are required to meet the constraint always on the same data slice are migrated to the new data slice together, and the mapping relation is built and maintained on the new data slice.
Further, the method is used for a distributed object storage system, and the value corresponding to the operation comprises object data;
the single-row multi-column transaction operation also uses a column associated with the key to store an operation record of an object, the distributed key value database also uses a column index to store the operation record of the object, and a mapping relation is established between the object data and the column index of the operation record;
when the distributed key value database performs data equalization, the object data and the corresponding operation record are migrated to a new fragment together according to the mapping relation between the object data and the column index of the operation record.
Further, the operation records are stored in independent column indexes and sequentially stored in time sequence, when data synchronization is carried out among a plurality of distributed object storage clusters, a target cluster sequentially reads the operation records on a source cluster according to time sequence scanning operation record column indexes, and playback is carried out on the target cluster; and the operation records after synchronization are directly and quickly deleted according to the ordered range.
Further, the single-row multi-column transaction operation also uses a column associated with the key to store object statistics, the distributed key value database also uses a column index to store the object statistics, and a mapping relationship is established between the object data and the column index of the object statistics;
when the distributed key value database performs data equalization, the object data and the object statistical information are migrated together according to the mapping relation, so that the object statistical information and the corresponding object data are positioned on the same data fragment.
Further, the object statistics information is counted by states of a plurality of objects in the current data slice, when different objects originally located on the same data slice are stored on different data slices after data migration, according to storage distribution of the last object, the original object statistics information is split and combined according to a mapping relation between column indexes, and the recalculated object statistics information accurately reflects the states of the plurality of objects in the current slice.
Further, when data in the column index needs to be deleted, the object operation record is deleted independently of the object data, and the object statistics record is updated or deleted together with the object data for a plurality of column indexes storing the object data, the object operation record, and the object statistics information, respectively.
Further, different column indexes have independent key-value data namespaces, while allowing individual configuration or deletion operations for a certain column index.
Further, for a single data shard, the data distribution is stored on multiple database nodes using one or more of raft, replica, ec.
A computer readable storage medium storing a computer program which, when executed by a processor, implements the distributed key value database operating method.
The application has the following beneficial effects:
a single key and a plurality of columns associated with the key are arranged in a single database operation request, each column stores a value associated with the key, each value corresponds to a certain operation on the key, so that key value data related to a plurality of operations in a single database transaction are located on a single data partition.
Compared with the prior art, the embodiment of the application has the main advantages that:
the method overcomes the defect that in the prior art, under the distributed key value database scene, two-stage transaction submission is required to be used in a single transaction due to multi-data slicing operation, and optimizes the single transaction multi-slicing operation into single transaction to only operate one data slicing through innovation of technical means such as single-row multi-column transaction, multi-column index of the database, column index mapping and the like, so that one-stage transaction submission can be used. Compared with two-stage transaction submission, the one-stage transaction submission has fewer network interaction times and data fragmentation of operation, can reduce the time delay of data operation, and improves the throughput, thereby improving the overall performance of the database.
Other advantages of embodiments of the present application are further described below.
Drawings
FIG. 1 is a schematic diagram of a two-phase commit of a distributed key-value database transaction.
FIG. 2 is a schematic diagram illustrating a one-phase commit of a database transaction.
FIG. 3 is a flow chart for object data synchronization among multiple clusters of distributed object storage.
FIG. 4 is a schematic diagram of an object operation creation operation record employing two phase commit.
FIG. 5 is a schematic diagram of a single row, multi-column transaction operation according to an embodiment of the present application.
FIG. 6 is a diagram of a distributed key database based on multiple column indexes according to an embodiment of the present application.
FIG. 7 is a diagram illustrating mapping and co-migration between column indices according to an embodiment of the present application.
FIG. 8 is a schematic diagram of one-phase commit of a database transaction according to an embodiment of the present application.
FIG. 9 illustrates operation optimization of a distributed object storage system with respect to object operation records in accordance with an embodiment of the present application.
FIG. 10 illustrates data migration according to a mapping relationship between an object data column index and an operation record column index according to an embodiment of the present application.
Fig. 11 illustrates that object operation records are sequentially stored in time sequence, played back on a destination cluster, and quickly deleted according to an embodiment of the present application.
FIG. 12 illustrates operational optimization of a distributed object storage system with respect to object statistics in accordance with an embodiment of the present application.
FIG. 13 illustrates object statistics that are migrated and recalculated together according to a mapping relationship in accordance with an embodiment of the present application.
FIG. 14 illustrates an embodiment of the application that obtains object statistics from different data slices to obtain overall object statistics for a bucket.
FIG. 15 illustrates database operations using single row, multi-column transactions in an object store business scenario in accordance with an embodiment of the present application.
Detailed Description
The following describes embodiments of the present application in detail. It should be emphasized that the following description is merely exemplary in nature and is in no way intended to limit the scope of the application or its applications.
It will be understood that when an element is referred to as being "mounted" or "disposed" on another element, it can be directly on the other element or be indirectly on the other element. When an element is referred to as being "connected to" another element, it can be directly connected to the other element or be indirectly connected to the other element. In addition, the connection may be for both a fixing action and a coupling or communication action.
It is to be understood that the terms "length," "width," "upper," "lower," "front," "rear," "left," "right," "vertical," "horizontal," "top," "bottom," "inner," "outer," and the like are merely for convenience in describing embodiments of the application and to simplify the description by referring to the figures, rather than to indicate or imply that the devices or elements referred to must have a particular orientation, be constructed and operated in a particular orientation, and thus are not to be construed as limiting the application.
Furthermore, the terms "first," "second," and the like, are used for descriptive purposes only and are not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include one or more such feature. In the description of the embodiments of the present application, the meaning of "plurality" is two or more, unless explicitly defined otherwise.
Referring to fig. 1, an embodiment of the present application provides a method for operating a distributed key-value database, including:
creating a single row multi-column transaction operation, setting a single key and a plurality of columns associated with the key in a single issued database operation request, wherein each column stores a value associated with the key, and each value corresponds to a certain operation on the key so as to simultaneously execute a plurality of operations through the single database operation request;
setting a plurality of column indexes in a distributed key value database according to different characteristics of each value, wherein each column index stores a value corresponding to an operation;
establishing a mapping relation among a plurality of column indexes, wherein the mapping relation is used for constraining key value data among the plurality of column indexes to be always on the same data fragment;
thus, a one-phase commit of the database transaction is achieved.
In the embodiment of the application, a single key and a plurality of columns associated with the key are arranged in a single database operation request, each column stores a value associated with the key, each value corresponds to a certain operation on the key, and the key value data related to a plurality of operations in a single database transaction are located on a single data fragment by utilizing a plurality of column indexes and mapping relations among the column indexes arranged in a distributed key value database, so that the simultaneous execution of a plurality of operations can be realized by issuing a single request, the performance cost caused by the need of transmitting a plurality of database requests when a plurality of database key value data are operated in the same transaction in the past is reduced, and the network interaction times are greatly reduced.
In some embodiments, different column indexes have independent key-value data namespaces while allowing configuration or deletion operations to be performed on a column index alone.
For a single data shard, the data distribution may be stored on multiple database nodes using one or more of a shift, a replica, and an ec.
In a preferred embodiment, when some key-value data is migrated from one data slice to another, according to the mapping relation, key-value data which needs to meet the constraint always on the same data slice is migrated to the new data slice together, and the mapping relation is built and maintained on the new data slice.
In a preferred embodiment, the method is for a distributed object storage system, the values corresponding to the operations comprising object data; the single-row multi-column transaction operation also uses a column associated with the key to store an operation record of an object, the distributed key value database also uses a column index to store the operation record of the object, and a mapping relation is established between the object data and the column index of the operation record; when the distributed key value database performs data equalization, the object data and the corresponding operation record are migrated to a new fragment together according to the mapping relation between the object data and the column index of the operation record.
In a preferred embodiment, the operation records are stored in independent column indexes and sequentially stored in time sequence, and when data synchronization is performed among a plurality of distributed object storage clusters, a target cluster sequentially reads the operation records on a source cluster by scanning the operation record column indexes in time sequence and plays back on the target cluster; and the operation records after synchronization are directly and quickly deleted according to the ordered range.
In a preferred embodiment, the single row, multi-column transaction operation further uses a column associated with the key to store object statistics, the distributed key value database further uses a column index to store the object statistics, and a mapping relationship is established between the object data and the column index of the object statistics; when the distributed key value database performs data equalization, the object data and the object statistical information are migrated together according to the mapping relation, so that the object statistical information and the corresponding object data are positioned on the same data fragment.
In a more preferred embodiment, the statistics of the object statistics is the states of a plurality of objects in the current data slice, when the data migration causes different objects originally on the same data slice to be stored on different data slices after the migration, the original object statistics is split and combined according to the mapping relation between column indexes according to the storage distribution of the last object, so that the recalculated object statistics correctly reflects the states of a plurality of objects in the current slice.
In a preferred embodiment, when data within a column index needs to be deleted, the object operation record is deleted independently of the object data and the object statistics record is updated or deleted with the object data for a plurality of column indexes storing the object data, the object operation record, and the object statistics information, respectively.
The embodiment of the application also provides a computer readable storage medium storing a computer program which, when executed by a processor, implements the distributed key value database operating method of the previous embodiment.
The implementation and operation principles of the present application are further described below in connection with specific embodiments.
Single row, multi-column transaction operation
A single key and a plurality of columns associated with the key are provided within a single database operation request, each column storing a value associated with the key, each value corresponding to a certain operation on the key, as shown in fig. 5. On the basis, the method can realize that a plurality of operations are executed simultaneously by issuing a single request, and reduce the network interaction times.
Distributed key value database based on multiple column indexes
And setting a plurality of column indexes in the distributed key value database according to different characteristics of a plurality of values carried in a plurality of operations issued by the single-row multi-column transaction, wherein each column index stores a value corresponding to an operation, as shown in fig. 6. Different column indexes have independent key-value data namespaces while allowing individual configuration or deletion operations for a certain column index.
Mapping, co-migration, and deletion between multiple column indices
The data among the plurality of column indexes can have certain mapping relations due to service reasons, and the key value data among the plurality of column indexes can be constrained to be always on the same data fragment by establishing and using the mapping relations of the data among the plurality of column indexes. As shown in fig. 7, when the distributed key-value database needs to perform data balancing, some key-value data needs to be migrated from one data slice to another data slice, key-value data, which needs to satisfy the constraint always on the same data slice, among a plurality of column indexes is determined according to the mapping relation among the plurality of column indexes, the key-value data is migrated to the new data slice together, and the mapping relation is established and maintained on the new data slice. Therefore, before and after the migration of the key value data, the key value data which is mapped according to the indexes among the multi-column families meets the constraint of being always on the same data fragment.
When data in a column index needs to be deleted, depending on the requirements of different services, the data in multiple class indexes associated by mapping may have different deletion strategies, such as deleting individually or deleting together by mapping. For example, in an object storage system, if object data, an object operation record, and object statistics are stored using a plurality of column indexes, respectively, the object operation record may be deleted independently of the object data, and the object statistics record may be updated or deleted together with the object data.
Database transaction one-phase commit
Based on the foregoing "single row multi-column transaction", "database multi-column index", "mapping between column indexes and collaborative migration" technique, it is ensured that all key value data related to multiple operations in a single database transaction are necessarily located on a single data partition, multiple operations can be issued simultaneously by a single database request, so as to meet the requirement of implementing one-stage commit of the database transaction, and then one-stage commit of the database transaction can be performed, as shown in fig. 8. Thus, the number of network interactions to commit the transaction is greatly reduced.
The following describes an example of the operation of the distributed object storage system object.
Distributed object storage system object operation record optimization
In a distributed object storage system, an operation object needs to modify object data and create an operation record at the same time. As shown in FIG. 9, when an object storage client needs to modify an object, a single row, multi-column transaction operation is created, wherein data for the object operation is stored in the columns, respectively, and the operation record itself. Then, the request carrying the object data and the object operation record is issued to a certain distributed key value data fragment through one-stage submission of the database. In the distributed key value data slicing, a plurality of column indexes (the column indexes can be realized by using column families of a common key value database) are arranged, object data and object operation records are respectively stored, and meanwhile, a mapping exists between the object data and the object operation records, so that the operation records of a certain object can be directly obtained according to the mapping through the data of the object. After single-row multi-column transaction operation is issued, the distributed key value database analyzes the object data and the object operation from the request, and stores the object data and the object operation in corresponding indexes respectively, and then the transaction is submitted to completion.
As shown in fig. 10, when the distributed key value database needs to perform data balancing, in order to ensure that the operation record of the object is always located on the same data fragment as the object data, the object data and the corresponding operation record are migrated to a new fragment together according to the mapping relationship between the object data column index and the operation record column index.
As shown in fig. 11, the object operation records are stored in separate column indexes, and in order to speed up the access and deletion of the operation records, the operation records are sequentially stored in time series, and when data synchronization between a plurality of distributed object storage clusters needs to be performed, the destination cluster sequentially reads the operation records on the source cluster, and plays back on the destination cluster. The operation records after synchronization can be deleted rapidly according to the ordered range.
For single data shards, to increase the availability of data, the data may be stored distributed across other multiple database nodes using a shift, replica, ec, or the like. The deletion of the played-back object operation record is independent of the object data, that is, the deletion of the object operation record does not delete the corresponding object data together.
Distributed object storage system object statistics optimization
In a distributed object storage system, object statistics are typically updated in addition to creating operational records when operating on object data. The object statistics reflect certain states of the bucket to which the object belongs, such as the number of objects of the bucket, the occupied size of the bucket, the actual size of the bucket, etc. Object statistics are typically stored in a distributed key value database using different keys, independent of the object.
In the past, if the change of the object statistical information is involved in modifying the object data, the key value data of the object data and the key value data of the object statistical information need to be operated in a distributed key value database transaction, and as the object data and the object statistical information can not be ensured to be distributed on the same data fragment all the time, the database transaction is submitted by two phases.
As shown in fig. 12, according to an embodiment of the present application, statistics of an object may be issued to a single data slice together with the object data through a single row multi-column operation, a column index is created inside the data slice to store the object statistics, and a mapping relationship is established between the object data column index and the object statistics column index. As shown in fig. 13, when database data equalization occurs, the object data and the object statistics are migrated together according to the mapping relationship, so as to ensure that the object data and the object statistics are always on the same data slice. Different from migration of the object operation record, because the object statistical information is counted by the states of a plurality of objects in the current data slice, when different objects originally on the same data slice are stored on different data slices after migration due to data migration, the original object statistical information needs to be split and combined according to the storage distribution of the last object and the mapping between column indexes, so that the recalculated object statistical information correctly reflects the states of a plurality of objects in the current slice.
As shown in fig. 14, since objects in one object bucket may be distributed over a plurality of data slices, and the object statistics information in a single slice only reflects the state of the objects stored in the slice, when the total object statistics information of a certain bucket needs to be acquired, all the data slices are accessed, the object statistics information is acquired from the data slices respectively, and then the statistics of the information is summed, so that the required total object statistics information of the bucket is obtained.
Since the object statistics change with the change of the object, the object statistics are not actively deleted, and the object statistics are deleted only after all the objects related to the object statistics are deleted.
Application example
The following is an exemplary description of database operations in an object store business scenario.
Under the object storage service scene, an operation log, namely a bilog, is recorded at the same time when an operation is executed on an object; the biolog may be used for playback of data that operates when multiple clusters are synchronized; the method comprises the steps that the logs are created according to the time sequence of operation when the objects are operated (created, read-write, deleted and the like), and the played logs are played back according to the time sequence when cross-site synchronization is carried out, so that the played logs need to be deleted from the clusters; to ensure that the bilog can be accessed quickly in time sequence, the bilog needs to be stored in time sequence.
As shown in FIG. 15, using a single row, multi-column transaction, not multiple columns within a single database request, both deposit object operations and bilog creation operations, the request is issued to the same database node, at which point a one-phase commit may be used; based on multiple column families, object operation and biolog operation in the request are independently stored on multiple column families of the same node; based on multi-column family object data and a biolog mapping method, when database region migration, splitting and merging occur, the biolog always migrates along with corresponding object data, and the object data and the biolog are always on the same node; support for biolog quick access and quick deletion.
Multicolumn family storage biolog:
in a distributed key value database, data is stored in a plurality of independent column families, and the bilog is stored in the independent column families and isolated from normal object data, so that independent configuration and operation of different column families are convenient according to different characteristics of the bilog and the object data.
Mapping of objects and bilogs:
in order to ensure that when the data migration occurs in the distributed key value database, the object data and the biolog are always on the same node, a mapping is created in the database, the biolog record associated with certain object data can be quickly acquired through the mapping, and when the migration occurs in the object data, the related biolog records are also migrated together;
biolog access and deletion:
when the biolog needs to be accessed, the biolog is stored in sequence, so that range scanning can be directly initiated on the database, and the scanning result is returned to the client;
similarly, because the bilogs are stored in order, quick deletion may be achieved by deleting the bilog implementation within the designated area.
In summary, in the embodiment of the present application, a single row and multiple columns of transaction operations are used to issue multiple operations related to a single transaction to a single partition of a distributed key-value database, so as to implement one-stage commit of the database transaction. Setting a plurality of column indexes on the data fragment, independently storing the data of a plurality of operations related to the transaction on different column indexes of the data fragment, simultaneously creating a mapping for the data among the different column indexes, and ensuring that a plurality of column index data meeting certain constraint are cooperatively migrated according to the mapping among the column indexes when the data migration occurs to the database.
For a distributed object storage system, actual data and operation records of an object during operation of the object are issued through single-row multi-column transactions, are respectively stored on a plurality of column indexes of a data fragment, and are submitted by using one-stage transactions. The mapping is established on a plurality of column indexes of the data fragments, so that the cooperative migration of the object data and the operation records when the data balance of the database occurs can be ensured, and the object data and the operation records are always positioned on the same data fragment before and after the migration.
In addition, when object data synchronization is performed among a plurality of distributed object storage systems, playback of operation records can be performed by scanning the operation record column index in time series, while the operation records that have been played back are deleted quickly by range.
Compared with the prior art, the embodiment of the application has the main advantages that:
the application overcomes the defect that in the prior distributed key value database scene, two-stage transaction submission is required to be used because of multi-data slicing operation in a single transaction, key value data related to a plurality of operations in the single database transaction are positioned on the single data slicing through the technical means of single-row multi-column transaction, multi-column index of the database, column index mapping and the like, so that the multi-data slicing operation of the single transaction is optimized to be only operated on one data slicing of the single transaction through the simultaneous issuing of the single database request, and further, one-stage transaction submission can be used. Compared with two-stage transaction submission, the one-stage transaction submission has fewer network interaction times and data fragmentation of operation, can reduce the time delay of data operation, and improves the throughput, thereby improving the overall performance of the database.
The embodiments of the present application also provide a storage medium storing a computer program which, when executed, performs at least the method as described above.
The embodiment of the application also provides a control device, which comprises a processor and a storage medium for storing a computer program; wherein the processor is adapted to perform at least the method as described above when executing said computer program.
The embodiments of the present application also provide a processor executing a computer program, at least performing the method as described above.
The storage medium may be implemented by any type of volatile or non-volatile storage device, or combination thereof. The nonvolatile Memory may be a Read Only Memory (ROM), a programmable Read Only Memory (PROM, programmable Read-Only Memory), an erasable programmable Read Only Memory (EPROM, erasableProgrammable Read-Only Memory), an electrically erasable programmable Read Only Memory (EEPROM, electricallyErasable Programmable Read-Only Memory), a magnetic random Access Memory (FRAM, ferromagneticRandom Access Memory), a Flash Memory (Flash Memory), a magnetic surface Memory, an optical disk, or a compact disk Read Only (CD-ROM, compact Disc Read-Only Memory); the magnetic surface memory may be a disk memory or a tape memory. The volatile memory may be random access memory (RAM, random Access Memory), which acts as external cache memory. By way of example, and not limitation, many forms of RAM are available, such as static random access memory (SRAM, static Random Access Memory), synchronous static random access memory (SSRAM, synchronousStatic Random Access Memory), dynamic random access memory (DRAM, dynamic Random AccessMemory), synchronous dynamic random access memory (SDRAM, synchronous Dynamic Random AccessMemory), double data rate synchronous dynamic random access memory (ddr SDRAM, double Data RateSynchronous Dynamic Random Access Memory), enhanced synchronous dynamic random access memory (ESDRAM, enhanced Synchronous Dynamic Random Access Memory), synchronous link dynamic random access memory (SLDRAM, syncLink Dynamic Random Access Memory), direct memory bus random access memory (DRRAM, direct Rambus Random Access Memory). The storage media described in embodiments of the present application are intended to comprise, without being limited to, these and any other suitable types of memory.
In the several embodiments provided by the present application, it should be understood that the disclosed systems and methods may be implemented in other ways. The above described device embodiments are only illustrative, e.g. the division of the units is only one logical function division, and there may be other divisions in practice, such as: multiple units or components may be combined or may be integrated into another system, or some features may be omitted, or not performed. In addition, the various components shown or discussed may be coupled or directly coupled or communicatively coupled to each other via some interface, whether indirectly coupled or communicatively coupled to devices or units, whether electrically, mechanically, or otherwise.
The units described as separate units may or may not be physically separate, and units displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units; some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in each embodiment of the present application may be integrated in one processing unit, or each unit may be separately used as one unit, or two or more units may be integrated in one unit; the integrated units may be implemented in hardware or in hardware plus software functional units.
Those of ordinary skill in the art will appreciate that: all or part of the steps for implementing the above method embodiments may be implemented by hardware associated with program instructions, where the foregoing program may be stored in a computer readable storage medium, and when executed, the program performs steps including the above method embodiments; and the aforementioned storage medium includes: a mobile storage device, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk or an optical disk, or the like, which can store program codes.
Alternatively, the above-described integrated units of the present application may be stored in a computer-readable storage medium if implemented in the form of software functional modules and sold or used as separate products. Based on such understanding, the technical solutions of the embodiments of the present application may be embodied in essence or a part contributing to the prior art in the form of a software product stored in a storage medium, including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: a removable storage device, ROM, RAM, magnetic or optical disk, or other medium capable of storing program code.
The methods disclosed in the method embodiments provided by the application can be arbitrarily combined under the condition of no conflict to obtain a new method embodiment.
The features disclosed in the several product embodiments provided by the application can be combined arbitrarily under the condition of no conflict to obtain new product embodiments.
The features disclosed in the embodiments of the method or the apparatus provided by the application can be arbitrarily combined without conflict to obtain new embodiments of the method or the apparatus.
The foregoing is a further detailed description of the application in connection with the preferred embodiments, and it is not intended that the application be limited to the specific embodiments described. It will be apparent to those skilled in the art that several equivalent substitutions and obvious modifications can be made without departing from the spirit of the application, and the same should be considered to be within the scope of the application.

Claims (10)

1. A method of operating a distributed key-value database, comprising:
creating a single row multi-column transaction operation, setting a single key and a plurality of columns associated with the key in a single issued database operation request, wherein each column stores a value associated with the key, and each value corresponds to a certain operation on the key so as to simultaneously execute a plurality of operations through the single database operation request;
setting a plurality of column indexes in a distributed key value database according to different characteristics of each value, wherein each column index stores a value corresponding to an operation;
establishing a mapping relation among a plurality of column indexes, wherein the mapping relation is used for constraining key value data among the plurality of column indexes to be always on the same data fragment;
thus, a one-phase commit of the database transaction is achieved.
2. The method of claim 1, wherein when some key-value data is migrated from one data slice to another, key-value data that needs to satisfy the constraint always on the same data slice is migrated to the new data slice together according to the mapping relation, and the mapping relation is established and maintained on the new data slice.
3. The method of claim 2, wherein the method is used in a distributed object storage system, and the value corresponding to the operation includes object data;
the single-row multi-column transaction operation also uses a column associated with the key to store an operation record of an object, the distributed key value database also uses a column index to store the operation record of the object, and a mapping relation is established between the object data and the column index of the operation record;
when the distributed key value database performs data equalization, the object data and the corresponding operation record are migrated to a new fragment together according to the mapping relation between the object data and the column index of the operation record.
4. The method for operating a distributed key database according to claim 3, wherein the operation records are stored in separate column indexes and sequentially stored in time series, and when data synchronization is performed between a plurality of distributed object storage clusters, a destination cluster sequentially reads the operation records on a source cluster by time series scanning the operation record column indexes and plays back on the destination cluster; and the operation records after synchronization are directly and quickly deleted according to the ordered range.
5. The method of claim 3 or 4, wherein the single row, multi-column transaction operation further uses a column associated with the key to store object statistics, the distributed key database further uses a column index to store the object statistics, and a mapping relationship is established between the object data and the column index of the object statistics;
when the distributed key value database performs data equalization, the object data and the object statistical information are migrated together according to the mapping relation, so that the object statistical information and the corresponding object data are positioned on the same data fragment.
6. The method according to claim 5, wherein the statistics of the object statistics are the states of a plurality of objects in the current data partition, and when the migration of the data causes different objects originally on the same data partition to be stored on different data partitions after the migration, the statistics of the original objects are split and combined according to the storage distribution of the last object and the mapping relationship between column indexes, so that the recalculated statistics of the objects correctly reflect the states of the plurality of objects in the current partition.
7. The method of claim 5, wherein when data in the column index needs to be deleted, the object operation record is deleted independently of the object data and the object statistics record is updated or deleted together with the object data for a plurality of column indexes storing the object data, the object operation record, and the object statistics information, respectively.
8. A method of operating a distributed key database as claimed in any one of claims 1 to 4, wherein different column indices have independent key data namespaces, whilst allowing a configuration or deletion operation to be performed on a certain column index alone.
9. A method of operating a distributed key-value database as claimed in any one of claims 1 to 4, wherein for a single data slice, the data distribution is stored on a plurality of database nodes using one or more of a shift, a replica, and an ec.
10. A computer readable storage medium storing a computer program, wherein the computer program, when executed by a processor, implements the method of distributed key value database operation of any one of claims 1 to 9.
CN202311195808.5A 2023-09-18 2023-09-18 Distributed key value database operation method and computer readable storage medium Active CN116932655B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311195808.5A CN116932655B (en) 2023-09-18 2023-09-18 Distributed key value database operation method and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311195808.5A CN116932655B (en) 2023-09-18 2023-09-18 Distributed key value database operation method and computer readable storage medium

Publications (2)

Publication Number Publication Date
CN116932655A CN116932655A (en) 2023-10-24
CN116932655B true CN116932655B (en) 2023-11-24

Family

ID=88388244

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311195808.5A Active CN116932655B (en) 2023-09-18 2023-09-18 Distributed key value database operation method and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN116932655B (en)

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105357311A (en) * 2015-11-23 2016-02-24 中国南方电网有限责任公司 Secondary equipment big data storage and processing method by utilizing cloud computing technology
US9600500B1 (en) * 2013-06-21 2017-03-21 Amazon Technologies, Inc. Single phase transaction commits for distributed database transactions
CN108228709A (en) * 2017-11-29 2018-06-29 北京市商汤科技开发有限公司 Date storage method and system, electronic equipment, program and medium
CN110008225A (en) * 2019-03-19 2019-07-12 阿里巴巴集团控股有限公司 The treating method and apparatus of distributed transaction
CN112241354A (en) * 2019-08-28 2021-01-19 华东师范大学 Application-oriented transaction load generation system and transaction load generation method
CN113535666A (en) * 2020-04-15 2021-10-22 华为技术有限公司 Data writing method and device, database system and storage medium
CN113868192A (en) * 2021-12-03 2021-12-31 深圳市杉岩数据技术有限公司 Data storage device and method and distributed data storage system
CN115114374A (en) * 2022-06-27 2022-09-27 腾讯科技(深圳)有限公司 Transaction execution method and device, computing equipment and storage medium
CN115454656A (en) * 2022-08-09 2022-12-09 阿里云计算有限公司 Transaction processing method, device and storage medium
CN116578558A (en) * 2023-03-01 2023-08-11 深圳前海微众银行股份有限公司 Data processing method, device, equipment and storage medium

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8892509B2 (en) * 2006-03-28 2014-11-18 Oracle America, Inc. Systems and methods for a distributed in-memory database
US10565070B2 (en) * 2017-11-29 2020-02-18 Bmc Software, Inc. Systems and methods for recovery of consistent database indexes
US10936445B2 (en) * 2018-11-28 2021-03-02 International Business Machines Corporation Resource management

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9600500B1 (en) * 2013-06-21 2017-03-21 Amazon Technologies, Inc. Single phase transaction commits for distributed database transactions
CN105357311A (en) * 2015-11-23 2016-02-24 中国南方电网有限责任公司 Secondary equipment big data storage and processing method by utilizing cloud computing technology
CN108228709A (en) * 2017-11-29 2018-06-29 北京市商汤科技开发有限公司 Date storage method and system, electronic equipment, program and medium
CN110008225A (en) * 2019-03-19 2019-07-12 阿里巴巴集团控股有限公司 The treating method and apparatus of distributed transaction
CN112241354A (en) * 2019-08-28 2021-01-19 华东师范大学 Application-oriented transaction load generation system and transaction load generation method
CN113535666A (en) * 2020-04-15 2021-10-22 华为技术有限公司 Data writing method and device, database system and storage medium
CN113868192A (en) * 2021-12-03 2021-12-31 深圳市杉岩数据技术有限公司 Data storage device and method and distributed data storage system
CN115114374A (en) * 2022-06-27 2022-09-27 腾讯科技(深圳)有限公司 Transaction execution method and device, computing equipment and storage medium
CN115454656A (en) * 2022-08-09 2022-12-09 阿里云计算有限公司 Transaction processing method, device and storage medium
CN116578558A (en) * 2023-03-01 2023-08-11 深圳前海微众银行股份有限公司 Data processing method, device, equipment and storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Zhenghua Lyu等.Greenplum: A Hybrid Database for Transactional and Analytical Workloads.《SIGMOD '21: Proceedings of the 2021 International Conference on Management of Data》.2021,第2530-2542页. *
分布式事务处理模型研究与实现;俞腾秋;《中国优秀硕士学位论文全文数据库信息科技辑》(第1期);第I138-971页 *

Also Published As

Publication number Publication date
CN116932655A (en) 2023-10-24

Similar Documents

Publication Publication Date Title
US10078682B2 (en) Differentiated secondary index maintenance in log structured NoSQL data stores
US7689574B2 (en) Index and method for extending and querying index
US20100161565A1 (en) Cluster data management system and method for data restoration using shared redo log in cluster data management system
US20190245919A1 (en) Method and apparatus for information processing, server and computer readable medium
US7933938B2 (en) File storage system, file storing method and file searching method therein
US6691136B2 (en) Fast data retrieval based upon contiguous consolidation of records according to frequency of access
CN107766374B (en) Optimization method and system for storage and reading of massive small files
US9922086B1 (en) Consistent query of local indexes
CN112597114B (en) OLAP (on-line analytical processing) precomputation engine optimization method and application based on object storage
US11314719B2 (en) Method for implementing change data capture in database management system
KR20160100211A (en) Method and device for constructing on-line real-time updating of massive audio fingerprint database
KR20200056357A (en) Technique for implementing change data capture in database management system
US20110153580A1 (en) Index Page Split Avoidance With Mass Insert Processing
US9898468B2 (en) Single pass file system repair with copy on write
US20180011897A1 (en) Data processing method having structure of cache index specified to transaction in mobile environment dbms
EP2402861A1 (en) Storage system
CN109918393A (en) The data platform and its data query and multilist conjunctive query method of Internet of Things
US11442663B2 (en) Managing configuration data
CN116932655B (en) Distributed key value database operation method and computer readable storage medium
US20060206543A1 (en) Database reorganization program and method
CN114185934B (en) Indexing and query method and system based on Tiandun database column storage
CN109885619A (en) Data write-in and read method and device based on distributed data base
CN113204520B (en) Remote sensing data rapid concurrent read-write method based on distributed file system
US7949632B2 (en) Database-rearranging program, database-rearranging method, and database-rearranging apparatus
CN110781164B (en) Design method, device and medium of database all-in-one machine

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant