CN116383209A - Concurrency control method and concurrency control device for column-store database - Google Patents

Concurrency control method and concurrency control device for column-store database Download PDF

Info

Publication number
CN116383209A
CN116383209A CN202310365749.5A CN202310365749A CN116383209A CN 116383209 A CN116383209 A CN 116383209A CN 202310365749 A CN202310365749 A CN 202310365749A CN 116383209 A CN116383209 A CN 116383209A
Authority
CN
China
Prior art keywords
data
request
processed
column
transaction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310365749.5A
Other languages
Chinese (zh)
Inventor
陈志标
谢锐
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Institute of Computing Sciences
Original Assignee
Shenzhen Institute of Computing Sciences
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Institute of Computing Sciences filed Critical Shenzhen Institute of Computing Sciences
Priority to CN202310365749.5A priority Critical patent/CN116383209A/en
Publication of CN116383209A publication Critical patent/CN116383209A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/221Column-oriented storage; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/466Transaction processing
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application provides a concurrency control method and device for a column-store database, and compared with the problem of weak concurrency control of a row type in a column-store database system in the prior art, the application provides a solution for concurrency control by adopting a column lock and a group-level MVCC, and the solution specifically comprises the following steps: when a data to-be-processed request is received, determining a transaction type and request information corresponding to the data to-be-processed request; wherein the transaction type includes a modification and a read; determining a data group corresponding to the column data to be processed according to the request information; and executing data processing on the data group according to the transaction type. The concurrent control is performed through the column locks, so that transaction conflicts are reduced; through group-level MVCC, the inquiry and modification are not blocked mutually, and the batch inquiry performance is improved; lock resources are managed according to groups, and lock utilization efficiency is improved; lock resources within the group are configured as needed, balancing concurrency capabilities and access efficiency.

Description

Concurrency control method and concurrency control device for column-store database
Technical Field
The application relates to the field of concurrency control, in particular to a concurrency control method and device for a column-store database.
Background
In database systems, concurrency control is typically row-based. The row-level concurrency control enlarges the conflict range, and different columns of the same row are not in conflict in some scenes, such as a scene based on the primary key update, and different columns of the same primary key record are not in conflict in practice in two transaction updates, but the enlargement to the row level is in conflict in the row-level concurrency control.
In a column database system, each column is stored independently, and if row-wise concurrency control is employed, the transaction control information is stored as a single column. Adjacent rows share a transaction control block, which additionally introduces write-write conflicts for the transaction control block.
In addition, the performance of batch inquiry is considered under the column storage, the lock state of each record needs to be checked when batch data is read under the parallel control of the row level, and the inquiry performance is lower.
Disclosure of Invention
In view of the foregoing, the present application has been made to provide a concurrency control method for a columnar database and an apparatus thereof, which overcomes the foregoing problems or at least partially solves the foregoing problems, including:
a concurrency control method for a column-store database, where each column of the column-store database includes a plurality of data blocks, and the data blocks include a plurality of data groups, the method includes:
When a data to-be-processed request is received, determining a transaction type and request information corresponding to the data to-be-processed request; wherein the transaction type includes a modification and a read;
determining a data group corresponding to the column data to be processed according to the request information;
and executing data processing on the data group according to the transaction type.
Further, the request information includes a line number of the data to be processed and a column number of the request to be processed, and the step of determining the data group corresponding to the column data to be processed according to the request information includes:
determining a data block corresponding to the request column number to be processed according to the data line number to be processed and the request column number to be processed;
and determining a data group and a data row corresponding to the column number of the request to be processed according to the data block.
Further, the data in the data set includes at least one transaction lock, and the step of performing data processing on the data set according to the transaction type includes:
when the transaction type is modified, determining whether modification conflict corresponding to the data line of the column number of the request to be processed exists or not according to the transaction lock;
when no modification conflict corresponding to the data line of the column number of the request to be processed exists, holding the transaction lock and executing the request to be processed of the data;
And releasing the transaction lock after the data waiting request is executed.
Further, the step of performing data processing on the data set according to the transaction type includes:
when the transaction type is read, determining all visible historical versions of the data line number to be processed according to the transaction lock;
and determining the visible version corresponding to the line number of the data to be processed according to the visible history version.
Further, the step of determining whether there is a modification conflict corresponding to the data line of the pending request column number according to the transaction lock further includes:
when a modification conflict corresponding to the data line of the request column number to be processed exists, determining whether the modification conflict is executed;
if the modification conflict is submitted, terminating the data pending request, and rolling back modification operation of the data pending request;
if the modification conflict rolls back, continuing to execute the data to-be-processed request;
and if the modification conflict is not finished, waiting for the modification conflict to finish.
Further, the step of holding the transaction lock and executing the data pending request includes:
Determining whether the idle transaction lock exists in the data group corresponding to the column number of the request to be processed;
if yes, executing the data to-be-processed request according to the transaction lock;
and if not, waiting for the transaction lock in the data group to be idle.
Further, the step of executing the data pending request according to the transaction lock, if any, includes:
executing the data to-be-processed request according to the transaction lock to generate execution data;
and generating historical data according to the execution data and the historical version data before the data line of the column number of the request to be processed is modified, and storing the address of the historical data in the transaction lock.
A concurrency control device for a column-store database, where each column of the column-store database includes a plurality of data blocks, and the data blocks include a plurality of data groups, the device includes:
the information determining module is used for determining the transaction type and the request information corresponding to the data to-be-processed request when the data to-be-processed request is received; wherein the transaction type includes a modification and a read;
the group number determining module is used for determining a data group corresponding to the column data to be processed according to the request information;
And the concurrency control module is used for executing data processing on the data group according to the transaction type.
A computer device comprising a processor, a memory and a computer program stored on the memory and capable of running on the processor, which when executed by the processor implements the steps of a concurrency control method for a columnar-store database as described above.
A computer readable storage medium having stored thereon a computer program which when executed by a processor implements the steps of a concurrency control method for a columnar-memory database as described above.
The application has the following advantages:
in the embodiment of the present application, compared to the problem of weak concurrency control in a column database system in the prior art, the present application provides a solution for concurrency control by using column lock and group level MVCC, specifically: when a data to-be-processed request is received, determining a transaction type and request information corresponding to the data to-be-processed request; wherein the transaction type includes a modification and a read; determining a data group corresponding to the column data to be processed according to the request information; and executing data processing on the data group according to the transaction type. The concurrent control is performed through the column locks, so that transaction conflicts are reduced; through group-level MVCC, the inquiry and modification are not blocked mutually, and the batch inquiry performance is improved; lock resources are managed according to groups, and lock utilization efficiency is improved; lock resources within the group are configured as needed, balancing concurrency capabilities and access efficiency.
Drawings
In order to more clearly illustrate the technical solutions of the present application, the drawings that are needed in the description of the present application will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort to a person skilled in the art.
FIG. 1 is a schematic flow chart of a first prior art provided in an embodiment of the present application;
FIG. 2 is a schematic diagram of a second prior art structure according to an embodiment of the present application;
FIG. 3 is a flowchart illustrating a concurrency control method for a columnar database according to an embodiment of the present application;
FIG. 4 is a schematic diagram of a column store database according to an embodiment of the present disclosure;
FIG. 5 is a schematic diagram of a data block according to an embodiment of the present application;
FIG. 6 is a schematic diagram of a column store database according to embodiment 2 of the present application;
FIG. 7 is a block diagram of a concurrency control device for a column-store database according to an embodiment of the present application;
fig. 8 is a schematic structural diagram of a computer device according to an embodiment of the present invention.
Detailed Description
In order to make the objects, features and advantages of the present application more comprehensible, the present application is described in further detail below with reference to the accompanying drawings and detailed description. It will be apparent that the embodiments described are some, but not all, of the embodiments of the present application. All other embodiments, which can be made by one of ordinary skill in the art based on the embodiments herein without making any inventive effort, are intended to be within the scope of the present application.
The inventors found by analyzing the prior art that: in the database, the modification of the same record involves concurrency control problems, and common concurrency control methods are as follows: locks, TIMESTAMP (time stamps), OCC (Optimistic Concurrency Control ) and MVCC (Multi-VersionConcurrency Control, multi-version concurrency control). Lock concurrency control is pessimistic for conflicts, locks records when read and write, blocks other concurrency modifications, and releases locks after resubmitting. TIMESTAMP concurrency control is to maintain the latest read and written TimeStamp information on the record, and check the conflict between TimeStamp of the transaction and TimeStamp of the record when the transaction reads and writes the record. The OCC records the version number of the read-write record in the execution stage, and then performs conflict check in the commit stage, and if the conflict is found, the transaction is interrupted. OCC is more optimistic than TimeStamp.
MVCC of MySQL storage engine InnoDB is implemented by saving two hidden columns behind each row record. These two columns, one holding the creation time of the row and one holding the expiration time of the row, of course store not the actual time value but the device version number. MVCC is most widely used in database systems, and its principle is to maintain a chain of versions of records in lock to handle read-write conflicts, generate new versions when written, and read committed historical versions. Unlike the three protocols described above, it resolves the read-write collision problem, which is still resolved by the lock. Under the concurrency control of the row type, the MVCC has two implementations of row level and block level. For example, mySql and Postgresql are both row-level MVCC implementations, each record having its own lock and history version chain. Oracle adopts block-level MVCC, and through sharing lock and history version chains in a data block, the whole data block is restored to a visible version during reading, and then each record in the data block is read.
The column store database supports weaker concurrency control, and a common way is by adding auxiliary columns for transaction processing, the auxiliary columns support column mark deletion and record visibility judgment, and updating is realized by mark deletion and insertion of new version records.
The following describes the first and second prior art, respectively:
the first prior art is:
mysql InnoDB MVCC: referring to fig. 1, innodb implements MVCC at row level, multiple versions of records form a chain, with the new version pointing to the old version through roll_ptr. And transferring the old version to the undo table space during modification, generating a new version and pointing to the old version in the undo table space, and determining the read version according to the request information on the record during reading. The old version is cleaned up after a period of time.
The record needs to be locked for the modification of the record, and other modification operations need to wait for the end of the transaction to be modified currently. If the query occurs before the commit, the current modified version is not visible, but rather does not have to wait for the current transaction to be modified, a visible version (possibly not visible for all versions) is found by ROLL_PTR. If the query occurs after submission, the current version is queried directly.
The disadvantage of this solution is:
1. the row-level concurrency control expands the conflict range.
2. Each record maintains lock information, and the storage overhead is high.
3. Batch queries are slow in performance and each record needs to be checked for transaction status during the query.
And the second prior art is as follows:
greenplus AOCO (applied-optimized column-oriented): referring to FIG. 2, the AOCO table divides data into segments (parts) by rows, with data within segments organized by columns. The segment only supports the writing of an application-only mode, and each batch of written data can be independently coded and compressed. In-place updates are not supported, segment updates are deleted by visibility bitmap (visibility bitmap) flags, and then new versions are generated for insertion into writable segments. When inquiring, inquiring all segments in column.
visibility bitmap is managed using a tuple, each tuple corresponding to a plurality of rows, the tuple supporting MVCC, the transaction state on the tuple being used to determine if there is a conflict when modified, only the visible version of the tuple being read when read, the corresponding row being filtered out.
The disadvantage of this solution is:
1. although the memory is in column type, concurrency control is still performed according to rows, so that the conflict range is enlarged.
2. visibility bitmap tuple lines share, and transactions modifying different lines collide at visibility bitmap.
By introducing the column-level concurrency control and carrying out lock resource management configured according to the requirement, the concurrency conflict in the database is reduced, and the concurrency performance is improved.
Referring to fig. 3, a concurrency control method for a column-store-oriented database according to an embodiment of the present application is shown;
the method comprises the following steps:
s310, when a data to-be-processed request is received, determining a transaction type and request information corresponding to the data to-be-processed request; wherein the transaction type includes a modification and a read;
s320, determining a data group corresponding to the column data to be processed according to the request information;
s330, executing data processing on the data group according to the transaction type.
In the embodiment of the present application, compared to the problem of weak concurrency control in a column database system in the prior art, the present application provides a solution for concurrency control by using column lock and group level MVCC, specifically: when a data to-be-processed request is received, determining a transaction type and request information corresponding to the data to-be-processed request; wherein the transaction type includes a modification and a read; determining a data group corresponding to the column data to be processed according to the request information; and executing data processing on the data group according to the transaction type. The concurrent control is performed through the column locks, so that transaction conflicts are reduced; through group-level MVCC, the inquiry and modification are not blocked mutually, and the batch inquiry performance is improved; lock resources are managed according to groups, and lock utilization efficiency is improved; lock resources within the group are configured as needed, balancing concurrency capabilities and access efficiency.
Next, a concurrency control method for a column-store database in the present exemplary embodiment will be further described.
It should be noted that, in the present application, the concurrency control is performed by using the column lock and the group-level MVCC, as shown in fig. 4, each column of the column database has its own data block, and one data block includes several data groups. The data within the data group shares a number of locks. As shown in FIG. 5, the transaction lock records transaction information using the lock, and modifies the generated old version data address, where the old version data is stored in the undo (version chain) for convenient reclamation management. Wherein, the data group is a certain number of data rows in one data block under the same column; group-level MVCC is a mechanism for reading by group, traversing all transaction locks within a data group, and restoring rows within the data group to a visible version when a column is read in bulk.
The implementation of MVCC in MySQL relies on an undo log version chain and read view. Wherein, the undo log version chain: recording a plurality of versions of data of a certain data in an undo log version chain; readview: for judging the visibility of the current version data.
As described in the step S310, when a data pending request is received, determining a transaction type and request information corresponding to the data pending request; wherein the transaction type includes a modification and a read.
As an example, a Transaction (Transaction) in computer terminology refers to a program execution unit (unit) that accesses and possibly updates various data items in a database. A transaction consists of a collective operation performed between the start of the transaction and the end of the transaction.
In data synchronization, a synchronization system is deployed between a source database and a destination database, the source data synchronization system reads logs from the source database, and the destination data synchronization system is responsible for applying synchronization operations sent by the source to the destination database. A collection of operations, which is an indivisible unit of work, a transaction submits or cancels requests for operations to the system as a whole, i.e., the operations either succeed simultaneously or fail simultaneously.
The source database generates data changes by executing transactions, each transaction including one or more database operations, each operation generating a data change. Operations include reading data, writing data, updating modified data, deleting data, etc., and in a specific implementation scenario, an operation may correspond to an SQL statement. A transaction is a set of sentences operating on a database, and for a batch of transactions within a specific time period, a set of read data and a set of write data for the current transaction are generated in the database system.
In step S320, a data set corresponding to the column data to be processed is determined according to the request information.
In an embodiment of the present invention, the specific process of "determining the data group corresponding to the column data to be processed according to the request information" in step S320 may be further described in conjunction with the following description.
Determining a data block corresponding to the to-be-processed request column number according to the to-be-processed data row number and the to-be-processed request column number;
and determining a data group and a data row corresponding to the column number of the request to be processed according to the data block as described in the following steps.
It should be noted that, the data pending request includes a modification transaction request and a read transaction request, and the request information includes a pending data line number and a pending request column number.
As an example, each column of the column database has its own data block, and one data block contains several data groups, so that a specific data block in the column number of the request to be processed can be located according to the inputted column number of the data to be processed and the column number of the request to be processed, and the specific data block in the column, the data groups in the data blocks, and the data rows can be determined according to the column number and the column number.
As an example, when the transaction type is read, the manner of finding the data group and the data row corresponding to the column number of the to-be-processed request is the same as that of modifying the transaction request, specifically, the specific data block in the column number of the to-be-processed request is located according to the input column number of the to-be-processed data and the column number of the to-be-processed request, and the specific data block in the column, the data group in the data block and the data row can be determined according to the column number and the column number.
As described in the step S330, data processing is performed on the data group according to the transaction type.
In one embodiment of the present invention, the specific process of "performing data processing on the data set according to the transaction type" described in step S330 may be further described in conjunction with the following description.
When the transaction type is modified, determining whether modification conflict corresponding to the data line of the column number of the request to be processed exists according to the transaction lock;
when there is no modification conflict corresponding to the data line of the column number of the request to be processed, holding the transaction lock and executing the request to be processed;
and as described in the following steps, the transaction lock is released after the data waiting request is executed.
As an example, the data in the data set shares a certain number of locks, when the transaction type is modification, all the transaction locks in the data set are traversed, the transaction locks record transaction information using the locks, and old version data addresses generated by modification are recorded on the transaction locks, so that whether modification operations exist on the data rows corresponding to the column numbers of the pending requests can be determined according to the transaction locks. At this time, there are two cases in which the data line corresponding to the request column number to be processed has no modification operation, and the other case in which the data line corresponding to the request column number to be processed has a modification operation, and the case in which the data line corresponding to the request column number to be processed has a modification operation is described further below.
When a modification conflict corresponding to the data line of the request column number to be processed exists, determining whether the modification conflict is executed to end or not;
if the modification conflict is submitted, terminating the data pending request and rollback the modification operation of the data pending request;
if the modification conflict rolls back, continuing to execute the data waiting request;
And if the modification conflict is not finished, waiting for the modification conflict to finish as described in the following steps.
In a specific implementation, when a modification operation exists on the data line of the column number of the request to be processed, and the modified transaction is submitted after execution, the request to be processed is judged to be a conflict transaction, the current request transaction to be processed is terminated, the modification operation of the current transaction is rolled back, the data is restored to the last correct state, and if the conflict transaction is rolled back, the current transaction is continuously executed.
In a specific implementation, when the transaction under modification has not finished, waiting for its finish or selecting to roll back, if the transaction under modification is finally submitted, also determining the data pending request as a conflict transaction, and finishing the data pending request.
The case when there is no modification conflict corresponding to the data line corresponding to the column number of the request to be processed is further described below.
When there is no modification conflict corresponding to the data line of the column number of the request to be processed, holding the transaction lock and executing the request to be processed;
and as described in the following steps, the transaction lock is released after the data waiting request is executed.
In a specific implementation, when there is no modification operation on the data line of the column number of the request to be processed, it is determined that the request to be processed is not a conflicting transaction, then the transaction lock is held and the request to be processed is executed according to the transaction lock, and after the transaction is finished, the transaction lock is released to provide lock resources for subsequent transaction processing.
In one embodiment of the present invention, the specific process of holding the transaction lock and executing the data pending request may be further described in conjunction with the following description.
Determining whether the idle transaction lock exists in the data group corresponding to the column number of the request to be processed or not according to the following steps;
if yes, executing the data waiting request according to the transaction lock;
if not, waiting for the transaction lock within the data set to be free, as described below.
As an example, when a free lock in the data group corresponding to the column number of the request to be processed is obtained, the modification of the request to be processed of the data is executed after the locking is successful. And if the data group corresponding to the column number of the to-be-processed request has no idle transaction lock, waiting until the data group corresponding to the column number of the to-be-processed request has the idle transaction lock, and executing the modification of the data to-be-processed request. Repeating the above steps, and modifying other updated columns. And releasing the transaction lock after the transaction is submitted.
It should be noted that, the batch modification flow is similar to the above steps, and will not be repeated here, and multiple lines are recorded in the transaction lock and the Undo at one time during batch modification, so that the recovery efficiency during modification and batch query can be improved.
In one embodiment of the present invention, the specific process of "executing the data pending request according to the transaction lock if any" may be further described in conjunction with the following description.
Executing the data waiting request according to the transaction lock to generate execution data;
and generating historical data according to the execution data and the historical version data before the data line of the column number of the request to be processed is modified, and storing the address of the historical data in the transaction lock.
As an example, after the modification of the data pending request is completed, the execution data is recorded in the transaction lock, and the historical version data before modification is also stored in the transaction lock. Specifically, the historical version data before the data line of the column number of the request to be processed is transferred to the historical data, the address of the data line is reserved on the transaction lock, and the historical version data is searched according to the address on the transaction lock during inquiry.
In one embodiment of the present invention, the specific process of "performing data processing on the data set according to the transaction type" described in step S330 may be further described in conjunction with the following description.
When the transaction type is read, determining all visible historical versions of the data line number to be processed according to the transaction lock;
and determining the visible version corresponding to the line number of the data to be processed according to the visible history version as described in the following steps.
As an example, traversing all transaction locks in the data group corresponding to the data line number to be processed, and finding all visible historical versions of the data line number to be processed; the line is rewound back to the corresponding visible version; repeating the above steps and reading other columns. Since the transaction information is global, the records of the multiple columns of reads are at the same point in time. The visible version is restored to the group upon batch reading, and is not described in additional detail herein.
Example 1
Single-row reading flow:
1. and finding out the data block and the group number corresponding to the column according to the row number to be read.
2. Locks within the group are traversed to find all visible history versions of the row to be read.
3. The line is flipped back to the corresponding visible version.
4. Repeating the above steps and reading other columns.
Single line modification flow:
1. and inputting a row to be updated, and positioning the row to the data block corresponding to the updated column and the group number.
2. Locks within the group are traversed and a determination is made as to whether a modify operation exists for the row to be updated.
2.1, if the modification is carried out, and if the modification transaction is finished, the conflict is judged, and the transaction is finished.
2.2 if the modification transaction has not ended, wait for it to end or roll back. If the final transaction commits, the transaction is also determined to be conflicted and ended.
3. A free lock within the group is taken and waits if there is no free lock.
4. After the lock is successful, the modification to the record is performed.
5. Repeating the above steps, and modifying other updated columns.
6. The lock is released after the transaction commits.
Example 2
The concurrency process is described herein in terms of a specific table and operation sentence. As shown in fig. 6, the table is assumed to have two columns C1 and C2, whose initial values are divided into x0 and y0, with only one data set per data block, 4 records, and two locks.
In this embodiment, 5 transactions are executed sequentially, and the operations are as follows:
T100:update table set c1=x1 where row=row1,row2
T101:update table set c2=y1 where row=row1,row2,row3
T102:update table set c1=x2 where row=row2
T103:update table set c2=y2 where row=row4
T104:select c2 from table
here T refers to a transaction, the execution is as follows:
1. t100 modifies R1, R2 on the C1 column since there are no other transactions currently. T100 directly acquires lock C1.L1, modifies successfully and writes the old version of R1, R2 to the undo space, and records the header identification transaction information at the lock and undo.
2. Also T101 performs modifications on the C2 column, with no concurrent conflicts, record transaction information, and undo.
3. T102 performs a modification of C1.R2, which traverses the lock region, finding that there is currently a modification of T100 at R2 that has not yet been committed, whereupon T102 blocks waiting for T100.
4. T103 performs a modification of c2.r4 that traverses the lock region, finding no uncommitted transaction on R4. But no lock is available and waiting T101 is blocked.
5. T100 commits.
6. T102 waits to end, finding that its c1.r2 to be modified is modified. Thus, abort is performed.
7. T104 performs a scan of the C2 column. The lock region is traversed and an active transaction T101 is found on C2.L1, C2.L2, whereupon R1, R2, R3 is restored to y0 through the Undo chain. 4 rows of data for column C2 are returned.
8. T101 commits.
9. T103 takes the c2.l2 lock and performs the modification to R4.
In the above process, both T102 and T103 have been waiting, T102 is due to the modification conflict, T103 is due to insufficient resources of the transaction lock, and finally T102 abart, T103 takes the transaction lock and executes successfully.
By the concurrency control method, the modification transactions of different columns in the same row can not be blocked; and the modification of different columns under the column lock does not relate to the same data block, so that the access conflict is reduced; group-level MVCC is provided, so that batch query performance is improved; the number of locks configured by groups is supported, balancing concurrency capabilities and access efficiency.
Specifically, the existing conflict relationship under a row lock is shown in the following table one:
identical records Different recordings
Same column Conflict with each other Non-conflicting
List one
By locking a specific row in the update column provided in this embodiment, the conflict graph may be changed to the one shown in table two:
identical records Different recordings
Same column Conflict with each other Non-conflicting
Different columns Non-conflicting Non-conflicting
Watch II
Under column store, each column is stored independently, and the column lock is stored with the column data so that modifying different columns is completely free of any conflicts.
Within a same column of a data block, N rows are used as a group, the number of locks is configured by group, and the resources of the locks are transacted in the group. For scenes with high concurrency requirements or columns with high concurrency modification, more lock resources can be configured for each group, and concurrency capacity is increased. Otherwise, fewer lock resources are configured, and the inquiry and conflict checking performances are improved. Flexible choice is provided between concurrency capability and access efficiency resource overhead.
The application provides group-level MVCC capability, which enables the whole group of columns to be restored to a visible version in batches when columns are scanned in batches, and improves the restoration performance.
For the device embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and reference is made to the description of the method embodiments for relevant points.
Referring to fig. 7, a concurrency control device for a column-store database according to an embodiment of the present application is shown;
the method specifically comprises the following steps:
the information determining module 710 is configured to determine, when a data pending request is received, a transaction type and request information corresponding to the data pending request; wherein the transaction type includes a modification and a read;
the group number determining module 720 is configured to determine a data group corresponding to the column data to be processed according to the request information;
and the concurrency control module 730 is configured to perform data processing on the data set according to the transaction type.
In an embodiment of the present invention, the request information includes a line number of the data to be processed and a column number of the request to be processed, and the group number determining module 720 includes:
the data block determining submodule is used for determining a data block corresponding to the to-be-processed request column number according to the to-be-processed data line number and the to-be-processed request column number;
and the data determination submodule is used for determining a data group and a data row corresponding to the column number of the request to be processed according to the data block.
In one embodiment of the present invention, the data in the data set includes at least one transaction lock, and the concurrency control module 730 includes:
The conflict judging sub-module is used for determining whether modification conflict corresponding to the data line of the request column number to be processed exists or not according to the transaction lock when the transaction type is modified;
and the modification execution sub-module is used for holding the transaction lock and executing the data pending request when the modification conflict corresponding to the data line of the pending request column number does not exist.
In an embodiment of the present invention, the concurrency control module 730 includes:
the version acquisition sub-module is used for determining all visible historical versions of the line number of the data to be processed according to the transaction lock when the transaction type is read;
and the reading sub-module is used for determining the visible version corresponding to the line number of the data to be processed according to the visible history version.
In an embodiment of the present invention, the conflict determination submodule includes:
the execution judging unit is used for determining whether the execution of the modification conflict is finished when the modification conflict corresponding to the data line of the request column number to be processed exists;
the execution ending unit is used for ending the data pending request and rolling back the modification operation of the data pending request if the modification conflict is submitted;
The rollback processing unit is used for continuously executing the data waiting request if the modification conflict rollbacks;
and executing an unfinished unit, wherein the unfinished unit is used for waiting for the modification conflict to finish if the modification conflict is not finished yet.
In one embodiment of the present invention, the modification execution submodule includes:
the idle lock judging unit is used for determining whether the idle transaction lock exists in the data group corresponding to the column number of the request to be processed;
the execution unit is used for executing the data waiting request according to the transaction lock if the transaction lock exists;
and the waiting unit is used for waiting for the idle transaction lock in the data group if the transaction lock is not available.
In one embodiment of the present invention, the execution unit includes:
an execution data generation subunit, configured to execute the data pending request according to the transaction lock, and generate execution data;
and the data storage subunit is used for generating historical data according to the execution data and the historical version data before the data line of the column number of the request to be processed is modified, and storing the address of the historical data in the transaction lock.
Referring to fig. 8, a computer device of a concurrency control method for a column-store database according to the present invention may specifically include the following:
The computer device 12 described above is embodied in the form of a general purpose computing device, and the components of the computer device 12 may include, but are not limited to: one or more processors or processing units 16, a system memory 28, a bus 18 that connects the various system components, including the system memory 28 and the processing units 16.
Bus 18 represents one or more of several types of bus 18 structures, including a memory bus 18 or memory controller, a peripheral bus 18, an accelerated graphics port, a processor, or a local bus 18 using any of a variety of bus 18 architectures. By way of example, and not limitation, such architectures include Industry Standard Architecture (ISA) bus 18, micro channel architecture (MAC) bus 18, enhanced ISA bus 18, video Electronics Standards Association (VESA) local bus 18, and Peripheral Component Interconnect (PCI) bus 18.
Computer device 12 typically includes a variety of computer system readable media. Such media can be any available media that is accessible by computer device 12 and includes both volatile and nonvolatile media, removable and non-removable media.
The system memory 28 may include computer system readable media in the form of volatile memory, such as Random Access Memory (RAM) 30 and/or cache memory 32. The computer device 12 may further include other removable/non-removable, volatile/nonvolatile computer storage media. By way of example only, storage system 34 may be used to read from or write to non-removable, nonvolatile magnetic media (commonly referred to as a "hard disk drive"). Although not shown in fig. 8, a magnetic disk drive for reading from and writing to a removable non-volatile magnetic disk (e.g., a "floppy disk"), and an optical disk drive for reading from or writing to a removable non-volatile optical disk such as a CD-ROM, DVD-ROM, or other optical media may be provided. In such cases, each drive may be coupled to bus 18 through one or more data medium interfaces. The memory may include at least one program product having a set (e.g., at least one) of program modules 42, the program modules 42 being configured to carry out the functions of embodiments of the invention.
A program/utility 40 having a set (at least one) of program modules 42 may be stored in, for example, a memory, such program modules 42 including, but not limited to, an operating system, one or more application programs, other program modules 42, and program data, each or some combination of which may include an implementation of a network environment. Program modules 42 generally perform the functions and/or methods of the embodiments described herein.
The computer device 12 may also communicate with one or more external devices 14 (e.g., keyboard, pointing device, display 24, camera, etc.), one or more devices that enable an operator to interact with the computer device 12, and/or any devices (e.g., network card, modem, etc.) that enable the computer device 12 to communicate with one or more other computing devices. Such communication may occur through an input/output (I/O) interface 22. Moreover, computer device 12 may also communicate with one or more networks such as a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network such as the Internet, through network adapter 20. As shown, network adapter 20 communicates with other modules of computer device 12 via bus 18. It should be appreciated that although not shown in fig. 8, other hardware and/or software modules may be used in connection with computer device 12, including, but not limited to: microcode, device drivers, redundant processing units 16, external disk drive arrays, RAID systems, tape drives, data backup storage systems 34, and the like.
The processing unit 16 executes programs stored in the system memory 28 to perform various functional applications and data processing, for example, to implement a concurrency control method for a columnar database according to an embodiment of the present invention.
That is, the processing unit 16 realizes when executing the program: when a data to-be-processed request is received, determining a transaction type and request information corresponding to the data to-be-processed request; wherein the transaction type includes a modification and a read; determining a data group corresponding to the column data to be processed according to the request information; and executing data processing on the data group according to the transaction type.
In an embodiment of the present invention, the present invention further provides a computer readable storage medium, where a computer program is stored, where the program when executed by a processor implements a concurrency control method for a column-store database provided in all embodiments of the present application:
that is, the program is implemented when executed by a processor: when a data to-be-processed request is received, determining a transaction type and request information corresponding to the data to-be-processed request; wherein the transaction type includes a modification and a read; determining a data group corresponding to the column data to be processed according to the request information; and executing data processing on the data group according to the transaction type.
Any combination of one or more computer readable media may be employed. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the computer-readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
The computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Computer program code for carrying out operations of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, smalltalk, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the operator's computer, partly on the operator's computer, as a stand-alone software package, partly on the operator's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the operator computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (e.g., connected through the internet using an internet service provider). In this specification, each embodiment is described in a progressive manner, and each embodiment is mainly described by differences from other embodiments, and identical and similar parts between the embodiments are all enough to be referred to each other.
While preferred embodiments of the present embodiments have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiments and all such alterations and modifications as fall within the scope of the embodiments of the present application.
Finally, it is further noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or terminal that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or terminal. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or terminal device comprising the element.
The foregoing describes in detail a concurrency control method and apparatus for a columnar database provided in the present application, and specific examples are applied to illustrate principles and embodiments of the present application, where the foregoing examples are only used to help understand the method and core idea of the present application; meanwhile, as those skilled in the art will have modifications in the specific embodiments and application scope in accordance with the ideas of the present application, the present description should not be construed as limiting the present application in view of the above.

Claims (10)

1. A concurrency control method for a column-store database, where each column of the column-store database includes a plurality of data blocks, and the data blocks include a plurality of data groups, and the concurrency control method is characterized in that the method includes:
when a data to-be-processed request is received, determining a transaction type and request information corresponding to the data to-be-processed request; wherein the transaction type includes a modification and a read;
determining a data group corresponding to the column data to be processed according to the request information;
and executing data processing on the data group according to the transaction type.
2. The method according to claim 1, wherein the request information includes a line number of the data to be processed and a column number of the request to be processed, and the step of determining the data group corresponding to the column data to be processed according to the request information includes:
Determining a data block corresponding to the request column number to be processed according to the data line number to be processed and the request column number to be processed;
and determining a data group and a data row corresponding to the column number of the request to be processed according to the data block.
3. The method of claim 2, wherein the data within the data set includes at least one transaction lock, and wherein the step of performing data processing on the data set in accordance with the transaction type comprises:
when the transaction type is modified, determining whether modification conflict corresponding to the data line of the column number of the request to be processed exists or not according to the transaction lock;
when no modification conflict corresponding to the data line of the column number of the request to be processed exists, holding the transaction lock and executing the request to be processed of the data;
and releasing the transaction lock after the data waiting request is executed.
4. A method according to claim 3, wherein said step of performing data processing on said data sets in dependence upon said transaction type comprises:
when the transaction type is read, determining all visible historical versions of the data line number to be processed according to the transaction lock;
And determining the visible version corresponding to the line number of the data to be processed according to the visible history version.
5. A method according to claim 3, wherein said step of determining from said transaction lock whether there is a modification conflict corresponding to a data line of said pending request column number comprises:
when a modification conflict corresponding to the data line of the request column number to be processed exists, determining whether the modification conflict is executed;
if the modification conflict is submitted, terminating the data pending request, and rolling back modification operation of the data pending request;
if the modification conflict rolls back, continuing to execute the data to-be-processed request;
and if the modification conflict is not finished, waiting for the modification conflict to finish.
6. A method according to claim 3, wherein the step of holding the transaction lock and executing the data pending request comprises:
determining whether the idle transaction lock exists in the data group corresponding to the column number of the request to be processed;
if yes, executing the data to-be-processed request according to the transaction lock;
and if not, waiting for the transaction lock in the data group to be idle.
7. The method of claim 6, wherein the step of executing the data pending request in accordance with the transaction lock, if any, comprises:
executing the data to-be-processed request according to the transaction lock to generate execution data;
and generating historical data according to the execution data and the historical version data before the data line of the column number of the request to be processed is modified, and storing the address of the historical data in the transaction lock.
8. A concurrency control device for a column-store database, where each column of the column-store database includes a plurality of data blocks, and the data blocks include a plurality of data groups, and the concurrency control device is characterized in that the concurrency control device includes:
the information determining module is used for determining the transaction type and the request information corresponding to the data to-be-processed request when the data to-be-processed request is received; wherein the transaction type includes a modification and a read;
the group number determining module is used for determining a data group corresponding to the column data to be processed according to the request information;
and the concurrency control module is used for executing data processing on the data group according to the transaction type.
9. A computer device comprising a processor, a memory and a computer program stored on the memory and capable of running on the processor, which computer program, when executed by the processor, implements the method of any one of claims 1 to 7.
10. A computer readable storage medium, characterized in that the computer readable storage medium has stored thereon a computer program which, when executed by a processor, implements the method according to any of claims 1 to 7.
CN202310365749.5A 2023-03-30 2023-03-30 Concurrency control method and concurrency control device for column-store database Pending CN116383209A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310365749.5A CN116383209A (en) 2023-03-30 2023-03-30 Concurrency control method and concurrency control device for column-store database

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310365749.5A CN116383209A (en) 2023-03-30 2023-03-30 Concurrency control method and concurrency control device for column-store database

Publications (1)

Publication Number Publication Date
CN116383209A true CN116383209A (en) 2023-07-04

Family

ID=86978419

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310365749.5A Pending CN116383209A (en) 2023-03-30 2023-03-30 Concurrency control method and concurrency control device for column-store database

Country Status (1)

Country Link
CN (1) CN116383209A (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150006466A1 (en) * 2013-06-27 2015-01-01 Andreas Tonder Multiversion concurrency control for columnar database and mixed OLTP/OLAP workload
CN108363806A (en) * 2018-03-01 2018-08-03 上海达梦数据库有限公司 Multi-version concurrency control method, device, server and the storage medium of database
CN113010533A (en) * 2021-03-01 2021-06-22 上海钧正网络科技有限公司 Database access method, system, terminal and storage medium based on locking limitation

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150006466A1 (en) * 2013-06-27 2015-01-01 Andreas Tonder Multiversion concurrency control for columnar database and mixed OLTP/OLAP workload
CN108363806A (en) * 2018-03-01 2018-08-03 上海达梦数据库有限公司 Multi-version concurrency control method, device, server and the storage medium of database
CN113010533A (en) * 2021-03-01 2021-06-22 上海钧正网络科技有限公司 Database access method, system, terminal and storage medium based on locking limitation

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
小嗑数据库: "openGauss存储技术(三)——列存储引擎", pages 1 - 9, Retrieved from the Internet <URL:https://blog.csdn.net/Cui_Yuan_666/article/details/127868580> *

Similar Documents

Publication Publication Date Title
US9953051B2 (en) Multi-version concurrency control method in database and database system
US10055440B2 (en) Database table re-partitioning using trigger-based capture and replay
CN109923534B (en) Multi-version concurrency control for database records with uncommitted transactions
US11386065B2 (en) Database concurrency control through hash-bucket latching
US8832159B2 (en) Systems and methods for asynchronous schema changes
US5850507A (en) Method and apparatus for improved transaction recovery
CN107992269B (en) Transaction writing method based on deduplication SSD
US9053153B2 (en) Inter-query parallelization of constraint checking
US20110060724A1 (en) Distributed database recovery
JPH056297A (en) Method of transaction processing and system
US20160092488A1 (en) Concurrency control in a shared storage architecture supporting on-page implicit locks
CN103092903A (en) Database Log Parallelization
JP2007501468A (en) Database management system with efficient version control
US20130117241A1 (en) Shadow Paging Based Log Segment Directory
CN110597663A (en) Transaction processing method and device
US20180276234A1 (en) Distributed transaction conflict resolution
CN113515501B (en) Nonvolatile memory database management system recovery method and device and electronic equipment
US8600962B2 (en) Transaction processing device, transaction processing method, and transaction processing program
CN108694230B (en) Management of unique identifiers in a database
CN113495872A (en) Transaction processing method and system in distributed database
US10901854B2 (en) Temporal logical transactions
JPWO2020234719A5 (en)
US11704216B2 (en) Dynamically adjusting statistics collection time in a database management system
US8001084B2 (en) Memory allocator for optimistic data access
CN116303495A (en) Database system and method supporting parallel updating

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination