WO2021139376A1 - 一种存储引擎中数据索引的方法以及相关装置 - Google Patents

一种存储引擎中数据索引的方法以及相关装置 Download PDF

Info

Publication number
WO2021139376A1
WO2021139376A1 PCT/CN2020/126397 CN2020126397W WO2021139376A1 WO 2021139376 A1 WO2021139376 A1 WO 2021139376A1 CN 2020126397 W CN2020126397 W CN 2020126397W WO 2021139376 A1 WO2021139376 A1 WO 2021139376A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
index
index table
storage engine
row
Prior art date
Application number
PCT/CN2020/126397
Other languages
English (en)
French (fr)
Inventor
龙剑
张文亮
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Priority to JP2022519003A priority Critical patent/JP7362190B2/ja
Priority to EP20912278.7A priority patent/EP4006740A4/en
Publication of WO2021139376A1 publication Critical patent/WO2021139376A1/zh
Priority to US17/671,578 priority patent/US11868330B2/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2272Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24564Applying rules; Deductive queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases

Definitions

  • This application relates to the field of computer technology, and in particular to data indexing in a storage engine.
  • the database in short, can be regarded as an electronic file cabinet-a place where electronic files are stored, and users can perform operations such as adding, querying, updating, and deleting data in the file.
  • MySQL a database management system that supports multiple storage engines, can be used to apply to different application scenarios.
  • different storage engines have different support capabilities for transactions.
  • the InnoDB storage engine implements complete transaction support and also supports the XA protocol of distributed transactions; while the MyISAM storage engine is easy to expand; and the NEWDB engine supports single-sentence transactions; thereby realizing the data indexing process in multiple scenarios.
  • the present application provides a data indexing method, which can effectively reduce the complexity of the storage engine application process and code redundancy, and improve the efficiency of the data indexing process.
  • an embodiment of the present application provides a data indexing method, which can be applied to a system or program that includes a data indexing function in a terminal device, and specifically includes: obtaining a target transaction, where the target transaction is used to indicate a data indexing process;
  • At least one second index table configured in the second storage engine is determined according to the first index table, and the second index table is obtained by adding row identifiers based on the first index table, and the row identifiers are used to indicate all Row data in the target data, the row data corresponds to an index column in the target data, the index column is obtained based on the target transaction, and the second storage engine supports the execution of the target transaction;
  • the index data corresponding to the data index condition in the first storage engine is determined from the second index table, and the index data is included in the target data.
  • an embodiment of the present application provides a data indexing device, including: an acquiring unit, configured to acquire a target transaction, the target transaction being used to indicate a data indexing process;
  • a determining unit configured to determine target data in the first storage engine according to the target transaction, where the target data is contained in at least one first index table;
  • the mapping unit is configured to determine at least one second index table configured in the second storage engine according to the first index table, where the second index table is obtained by adding a row identifier based on the first index table, and the row The identifier is used to indicate row data in the target data, the row data corresponds to an index column in the target data, the index column is obtained based on the target transaction, and the second storage engine supports the target The execution of the transaction;
  • the index unit is configured to determine index data corresponding to the data index condition in the first storage engine from the second index table, and the index data is included in the target data.
  • an embodiment of the present application provides a computer device, including: a memory, a processor, and a bus system; the memory is used for storing program code; the processor is used for executing the above-mentioned aspects according to instructions in the program code.
  • an embodiment of the present application provides a storage medium, where the storage medium is used to store a computer program, and the computer program is used to execute the data indexing method in the above aspect.
  • embodiments of the present application provide a computer program product including instructions, which when run on a computer, cause the computer to execute the data indexing method in the above aspects.
  • the target data is contained in at least one first index table; and then determining the configuration in the second storage engine according to the first index table
  • the second index table is obtained by adding row identifiers based on the first index table, the row identifiers are used to indicate the row data in the target data, and the second storage engine supports the execution of the target transaction; and then from the second index table Determine the index data corresponding to the data index condition in the first storage engine. In this way, the data indexing process across storage engines is realized.
  • the functional design enables the first storage engine to perform functions that cannot be carried by itself, which further improves the application scope and indexing efficiency of the database.
  • Figure 1 is a network architecture diagram of the operation of the data indexing system
  • FIG. 2 is a process architecture diagram of a data index provided by an embodiment of the application
  • FIG. 3 is a flowchart of a data indexing method provided by an embodiment of the application.
  • FIG. 4 is a schematic diagram of an index table update process provided by an embodiment of the application.
  • FIG. 5 is a flowchart of another data indexing method provided by an embodiment of the application.
  • FIG. 6 is a schematic diagram of a data insertion operation process provided by an embodiment of the application.
  • FIG. 7 is a flowchart of a data recovery method provided by an embodiment of the application.
  • FIG. 8 is a schematic structural diagram of a data indexing device provided by an embodiment of the application.
  • FIG. 9 is a schematic structural diagram of another data indexing device provided by an embodiment of the application.
  • the embodiment of the present application provides a data indexing method and related devices, which can be applied to a system or program containing a data indexing function in a terminal device, by obtaining a target transaction indicating the data indexing process, and determining the first storage engine according to the target transaction
  • the target data in the target data the target data is contained in at least one first index table; then at least one second index table configured in the second storage engine is determined according to the first index table, and the second index table is added based on the first index table
  • the row identification is used to indicate the row data in the target data.
  • the row data corresponds to the index column in the target data.
  • the index column is obtained based on the target transaction.
  • the second storage engine supports the execution of the target transaction; and then from the second index
  • the index data corresponding to the data index condition in the first storage engine is determined in the table. In this way, the data indexing process across storage engines is realized. Because the development complexity introduced by the new storage engine is reduced, the code reuse of multiple storage engines is realized, and the indexing efficiency of the database is improved; further through the second storage engine
  • the functional design enables the first storage engine to perform functions that cannot be carried by itself, which further improves the application scope and indexing efficiency of the database.
  • MySQL A relational database management system that uses a plug-in architecture to support multiple storage engines at the same time.
  • Storage engine responsible for data storage and transaction management in the database management system, and is the core module of the database system. MySQL can support multiple different storage engines at the same time to provide data storage and transaction management capabilities optimized for different scenarios; common storage engines include InnoDB, MyISAM, etc.
  • Transaction is a sequence of database operations that access and possibly manipulate various data items.
  • a transaction consists of all database operations performed between the start of the transaction and the end of the transaction.
  • Transactions have ACID characteristics, namely Atomicity; Consistency; Isolation; Durability.
  • Database schema definition language That is, data can schema information, which is a language used to describe the structure of the library table to be stored in the database. Such as CREATE, ALTER, DROP, etc.
  • DML Data Manipulation Language
  • XA in database connection transaction refers to the specification of distributed transaction processing proposed by the X/Open organization.
  • Row ID (rowno) a function ID that can return the current row number of the current column data in the table, that is, the collection ID of the row data.
  • Row data The set of index values corresponding to the same key value in the index table.
  • Index table A table indicating the correspondence between logical records and physical records, which can be used to identify data in the database.
  • the data indexing method provided in this application can be applied to a system or program that contains a database or needs to perform data reading and writing, such as a MySQL database or related programs running based on MySQL.
  • the data indexing system can be run as shown in the figure In the system architecture shown in 1, as shown in Figure 1, it is a system architecture diagram of the operation of the data indexing system.
  • MySQL includes the access layer, the service layer, the storage engine layer, and the system file layer;
  • the entry layer is mainly responsible for connection processing, authorization and authentication, security and other matters; while the service layer is mainly responsible for query analysis, analysis, optimization, caching and all built-in functions.
  • All cross-storage engine functions are implemented in this layer: stored procedures, triggers , Views, binlog, table locks, etc.; the storage engine layer is mainly responsible for the storage and extraction of data in MySQL, and the service layer communicates with the storage engine through APIs.
  • the storage engine contains dozens of underlying function APIs, and each engine provides a set of specific implementations;
  • the system file layer is mainly responsible for the reading and writing of the underlying file system.
  • the data indexing method provided in this application can be applied to the interaction process of multiple storage engines set in a plug-in manner in the storage engine layer. By reusing the transactions of the first storage engine through the second storage engine, the unified application of the underlying functions is realized. .
  • first storage engine shows a first storage engine and a second storage engine, but in actual scenarios, there can be more first storage engines or more second storage engines.
  • the specific number depends on the actual scenario and is not limited here.
  • the type of storage engine can be InnoDB, MyISAM, or NEWDB, etc.
  • the above-mentioned data indexing system can be run on a server, for example: as an application for cloud data storage, it can also run on a terminal device, and can also be run on a third-party device to provide data indexing to obtain data indexed nodes Distribution results;
  • the specific data indexing system can be run in the form of a program in the above-mentioned equipment, or as a system component in the above-mentioned equipment, or as a kind of cloud service program.
  • the specific operation mode depends on the actual scenario. Depends, there is no limitation here.
  • Cloud technology refers to the integration of a series of resources such as hardware, software, and network within a wide area network or a local area network to realize the calculation, storage, processing, and sharing of data.
  • Kind of hosting technology
  • cloud technology is based on the general term of network technology, information technology, integration technology, management platform technology, application technology, etc., applied in the cloud computing business model, and can form a resource pool, which can be used as needed, which is flexible and convenient.
  • Cloud computing technology will become an important support.
  • the background service of the technical network system requires a large amount of computing and storage resources, such as video websites, image websites and more portal websites.
  • each item may have its own identification mark, which needs to be transmitted to the back-end system for logical processing. Data of different levels will be processed separately, and all types of industry data need to be powerful The backing of the system can only be achieved through cloud computing.
  • the database in short, can be regarded as an electronic file cabinet-a place where electronic files are stored, and users can perform operations such as adding, querying, updating, and deleting data in the file.
  • database is a collection of data that is stored together in a certain way, can be shared with multiple users, has as little redundancy as possible, and is independent of the application.
  • Database Management System is a computer software system designed to manage databases. It generally has basic functions such as storage, interception, security, and backup.
  • the database management system can be classified based on the database model it supports, such as relational, XML; or based on the type of computer supported, such as server clusters, mobile phones; or based on the query language used, such as SQL , XQuery; or classification based on performance impulse focus, such as maximum scale, maximum operating speed; or other classification methods. No matter which classification method is used, some DBMSs can cross categories, for example, support multiple query languages at the same time.
  • MySQL a database management system that supports multiple storage engines, can be used to apply to different application scenarios.
  • different storage engines have different support capabilities for transactions.
  • the InnoDB storage engine implements complete transaction support and also supports the XA protocol of distributed transactions; while the MyISAM storage engine is easy to expand; and the NEWDB engine supports single-sentence transactions; thereby realizing the data indexing process in multiple scenarios.
  • this application proposes a data indexing method, which is applied to the data indexing process framework shown in FIG. 2.
  • a data indexing method provided by an embodiment of this application Process architecture diagram, access the target transaction through the first storage engine, and map the index table in the second storage engine based on the target transaction, so as to realize the data indexing process across storage engines; generally, the second storage engine adopts a complete transaction
  • a functional storage engine such as InnoDB, makes it easier for the first storage engine to reuse the underlying functions in the storage engine layer to perform the data indexing process.
  • the method provided in this application can be a kind of program writing, as a processing logic in a hardware system, or as a data indexing device, which can be integrated or externally implemented to implement the aforementioned processing logic.
  • the data indexing device obtains the target transaction indicating the data indexing process, and determines the target data in the first storage engine according to the target transaction, and the target data is contained in at least one first index table; and then according to the first index
  • the table determines at least one second index table configured in the second storage engine.
  • the second index table is obtained by adding a row identifier based on the first index table.
  • the row identifier is used to indicate the row data in the target data.
  • the index column is based on the target transaction, and the second storage engine supports the execution of the target transaction; and then the index data corresponding to the data index condition in the first storage engine is determined from the second index table.
  • the data indexing process across storage engines is realized. Because the development complexity introduced by the new storage engine is reduced, the code reuse of multiple storage engines is realized, and the indexing efficiency of the database is improved; further through the second storage engine
  • the functional design enables the first storage engine to perform functions that cannot be carried by itself, which further improves the application scope and indexing efficiency of the database.
  • FIG. 3 is a flowchart of a method of data indexing provided by an embodiment of this application, which is applied to computer equipment.
  • the implementation of this application is Examples include at least the following steps:
  • the computer device obtains a target transaction.
  • the target transaction is used to indicate the data indexing process; among them, the target transaction can be a transaction that the current database has been executing; it can also be a newly issued transaction; and for the first storage engine, the underlying The function design can realize the data index function through the bottom function of the second storage engine.
  • the computer device determines the target data in the first storage engine according to the target transaction.
  • the target data is contained in at least one first index table; where the first index table records target data in different versions of the first storage engine; in a possible scenario, the first storage engine is NEWDB, the database storage engine only supports single-sentence transactions, and when the engine is newly added, it is necessary to design underlying functions for the single-sentence transaction to interact with data through an interface.
  • the first index table records target data in different versions of the first storage engine
  • the first storage engine is NEWDB
  • the database storage engine only supports single-sentence transactions, and when the engine is newly added, it is necessary to design underlying functions for the single-sentence transaction to interact with data through an interface.
  • the computer device determines at least one second index table configured in the second storage engine according to the first index table.
  • the second index table is obtained by adding a row identifier based on the first index table.
  • the row identifier is used to indicate the row data in the target data.
  • the row data corresponds to the index column in the target data, and the index column is based on the target transaction.
  • the second storage engine supports the execution of target transactions; since each inserted or deleted record has an independent row identifier (rowno), a row of data can be directly obtained through rowno for batch operations.
  • the first storage engine does not support multi-sentence concurrent transactions
  • the second storage engine supports multi-sentence concurrent transactions to illustrate the storage engine reuse process.
  • the functional feature in is the use of a full-featured storage engine to reuse a single-function storage engine; below, the first storage engine is NEWDB and the second storage engine is InnoDB as an example for description, which is not limited here.
  • the database will involve data interaction during use, its data may have changes, such as the implementation of DDL and DML statements; at this time, relevant data needs to be updated, based on the data indexing method in this application. Perform the following operations to achieve data update.
  • FIG. 4 it is a schematic diagram of an index table update process provided by an embodiment of the present application, that is, first access the transaction in the first storage engine, determine the DDL sentence, and then determine the corresponding first index table.
  • the second index table mapped in the second storage engine is searched based on the first index table, and then the index table creation process is performed.
  • the index table contains the same definition of the index column as the NEWDB table, and an additional column rowno, the data type of which is BIGINT, used to represent the row number of the record in the NEWDB engine, that is, the second index table is classified based on the index method, to The second index table is updated, and the index method includes a primary key index, a unique index, or a normal index.
  • the second index table t has multiple columns c1, c2, c3, c4, and c5, where the primary key index columns are c1, c2, the unique index columns are c3, c4, and the ordinary index column is c5 .
  • t_pk c1, c2, owno
  • t_unique1 c3, c4, rowno
  • t_index1 c5, rowno
  • the update process of the CREATE statement for the second index table can be realized through the above-mentioned implementation manner.
  • DML statements include INSERT, UPDATE, and DELETE; when these statements are executed, the corresponding index table needs to be maintained at the same time. Specifically, the DML statement in the target transaction is determined. The DML statement is used to indicate the insertion, update, or deletion of data; and then the second index table is updated according to the DML statement.
  • INSERT statement For the INSERT statement, first for each primary key index and unique index on the table, check whether the record to be inserted already exists in the index table; if the record already exists, end the transaction and return the INSERT failure; otherwise, insert in the NEWDB engine Record, and return the comparison table of index columns and rowno of all records; then, insert batches in the index table according to the comparison table returned in the previous step.
  • For the DELETE statement first find the records that meet the conditions in the NEWDB engine according to the WHERE condition in the DELETE statement; then delete the records that meet the conditions in the NEWDB engine and record their rowno; then perform the delete operation in the index table according to rowno. .
  • the WHERE condition of the DELETE statement has an index column, that is, the data index condition has an index column
  • the corresponding row identifier is determined by the data index condition, and then the deletion operation is performed, which reduces the amount of data processing and improves the process efficiency.
  • For the DELETE statement first search for records that meet the conditions in the NEWDB engine according to the WHERE condition in the UPDATE statement; then delete the records that meet the conditions in the NEWDB engine, and record its rowno; then in the NEWDB engine, calculate a new record based on the UPDATE statement Value and insert the record, record the rowno of the new record; and perform the update operation in the index table according to the new and old rowno of the record.
  • the WHERE condition of the UPDATE statement has an index column
  • the corresponding row identifier is determined by the data index condition, and then the update operation is performed; the amount of data processing can be reduced, and the process efficiency can be improved.
  • the execution efficiency of the operation can be greatly improved; and it has better results in certain business scenarios, such as the business needs to process multiple rows of data in batches, and specify the row number
  • the query can ensure that there will be no duplication or errors or omissions in the data during processing.
  • the computer device determines the index data corresponding to the data index condition in the first storage engine from the second index table.
  • the index data is included in the target data.
  • the indexing process can be performed based on the row identifier. Specifically, first obtain the data index condition of the target transaction in the first storage engine to determine the row identifier indicated by the data index condition; then determine the row identifier indicated by the data index condition The corresponding row identifier in the second index table is used to obtain the row data; and then the corresponding index data is determined according to the row data. Since the row ID corresponds to multiple row data, the record corresponding to the row ID position is directly searched in the NEWDB engine and the result is returned. It is not necessary to read all the data in order for searching, which greatly improves the query speed.
  • the corresponding second index table may be determined from the at least one second index table according to the target index value; then the index tables corresponding to the target index value are filtered The row data to get the corresponding index data. For example: if the WHERE condition specifies an index value, first query through the index table, query all the records in the index table that meet the conditions, and then quickly find the actual record value based on the rowno value recorded in the index table.
  • the efficiency of determining the index data corresponding to the data index condition in the first storage engine from the second index table can be improved.
  • the target transaction in the index data index process is obtained, and the target data in the first storage engine is determined according to the target transaction.
  • the target data is contained in at least one first index table; At least one second index table configured in the storage engine, the second index table is obtained by adding row identifiers based on the first index table, the index column is obtained based on the target transaction, and the second storage engine supports the execution of the target transaction; and then from the second index
  • the index data corresponding to the data index condition in the first storage engine is determined in the table. In this way, the data indexing process across storage engines is realized.
  • the functional design enables the first storage engine to perform functions that cannot be carried by itself, which further improves the application scope and indexing efficiency of the database.
  • FIG. 5 is another type of data provided by an embodiment of the application.
  • the flow chart of the indexing method the embodiment of this application at least includes the following steps:
  • the computer device obtains a target transaction.
  • the computer device determines the target data in the first storage engine according to the target transaction.
  • the computer device determines at least one second index table configured in the second storage engine according to the first index table.
  • steps 501-503 are similar to steps 301-303 of the embodiment shown in FIG. 3, and related feature descriptions can be referred to, which will not be repeated here.
  • the computer device executes the data manipulation information in the target transaction.
  • the computer device sets at least one failure detection point during execution.
  • each index table uses a version number to represent data changes; where the version number is an incremental integer, that is, the version is switched when the transaction is submitted, otherwise Use the old version.
  • the log table can adopt the following structure:
  • FIG. 6 is a schematic diagram of a data insertion operation process provided by an embodiment of the application. The process includes the following steps:
  • the computer equipment starts to be inserted.
  • the computer device judges whether the record exists.
  • the computer equipment reports an error and ends.
  • the computer equipment is inserted into the log table to record.
  • the version of the second index table in step 602 is recorded, and then the record (dbid, tableid, version) is inserted into the index table _newdb.wal, that is, the storage engine ID, the index table ID, and the version number.
  • the computer device submits the second storage engine transaction.
  • the computer device is inserted into the first storage engine log table.
  • the data consistency is judged by the method of log table comparison.
  • the log table insertion process is performed in the first storage engine; that is, the NEWDB table is inserted to record data, and the NEWDB transaction is submitted, and the submitted NEWDB table The version is switched to version version+1.
  • the computer device submits the first storage engine transaction.
  • the computer device inserts an index table record.
  • the record index column and rowno returned by inserting the NEWDB table are inserted into the corresponding index table.
  • the computer device deletes the log table record.
  • the record inserted in step 604 in the table _newdb.wal is deleted.
  • the computer device submits the second storage engine transaction.
  • the computer device is successfully inserted and the process ends.
  • step 601-step 605 are the detection process of failure point 1
  • step 606-step 607 are the detection process of failure point 2
  • step 608-step 610 are the detection process of failure point 3.
  • Detection process Step 611 is the detection process of failure point 4.
  • failure point 1 the possible cause of the failure is that no record is inserted in the _newdb.wal table, or a record is inserted but not submitted.
  • the _newdb.wal table is a storage engine supported by a complete transaction, after recovery, due to the guarantee of transaction atomicity, the inserted but uncommitted records will be rolled back. Therefore, the failure point does not require an additional failure recovery process to ensure that the data and index of NEWDB are consistent.
  • the _newdb.wal table has recorded new data at this time, but the transaction of the NEWDB table has not been committed. Although the data and the index are not inconsistent, the record of the _newdb.wal table may be incorrect and needs to be restored. .
  • the _newdb.wal table has recorded new data at this time, and the transaction of the NEWDB table has also been submitted, but the data in the index table has not been updated, and the record in the _newdb.wal table has not been deleted, and the data is inconsistent with the index. , Needs to be restored.
  • When restoring look up the version recorded in the _newdb.wal table according to the dbid and tableid fields, such as 1, and then determine the current version of the NEWDB table. If it is 2, it means this is the case and the restoration process needs to be redone. Since the _newdb.wal table and the index table are the same storage engine, and the engine supports complete transactions, the operations of the two tables in the third step are consistent. Query and compare the record changes between versions 1 and 2 in the NEWDB table, and write the rowno of these changes to the index table, then delete the records in the _newdb.wal table, and submit the transaction.
  • the WHERE statement specifies the index column, find the NEWDB table record rowno that needs to be deleted through the index table, and then find the NEWDB table record that needs to be deleted through the rowno; if the WHERE statement does not specify rowno and index columns, you need to search the NEWDB table in order To find the record that needs to be deleted. Then delete the found NEWDB record and submit it. The version of the NEWDB table after submission changes to version+1. Then according to the rowno of the row deleted in the NEWDB table, delete the corresponding record in the index table, and then delete the record inserted by _newdb.wal in the first step. And commit the transaction.
  • the log table in the above fault detection process can also use a log table that records the row numbers of all modified records, and then perform REDO or UNDO operations based on the data in the log table during the recovery phase, that is, based on row identification Perform detection and recovery for the recovery unit.
  • the computer equipment judges whether it is malfunctioning.
  • step 505 through the judgment process for different fault points introduced in step 505, the judgment result can be obtained. If it is judged to be a fault, data recovery is required to ensure data consistency; if it is judged to be normal, proceed The indexing process of step 507.
  • the computer device determines the index data corresponding to the data index condition in the first storage engine from the second index table.
  • step 507 is similar to step 304 in the embodiment described in FIG. 3, and related feature descriptions can be referred to, and details are not described here.
  • the index of the new engine is realized by reusing the storage engine with complete transaction function, which reduces the difficulty of implementing the index of the new engine and the risk of problems. Further, the index of the new engine is realized by using the storage engine with complete transaction function, which can reuse the existing mature modules and reduce the development workload, so that the new engine does not need to re-implement the B+ tree, nor does it need to implement complete transactions on it. stand by.
  • FIG. 7 is a flowchart of a data recovery method provided by an embodiment of the application. , The embodiment of this application at least includes the following steps:
  • the computer equipment checks the log table.
  • the log table is the log table of the second storage engine, and records related data operation records performed by the index table in the second storage engine according to the DML statement.
  • the computer equipment judges whether the record exists.
  • step 704 if the operation record exists, proceed to step 704; if not, proceed to step 711.
  • the computer device obtains any record in the log table.
  • the process of obtaining records in the log table may be random record selection, or record selection based on a certain rule order, for example: firstly obtain records for data insertion.
  • the computer device obtains the version in the log table.
  • the computer device obtains the actual version of the log table.
  • the actual version of the log table is the data state of the storage engine after the transaction corresponding to the log table is executed.
  • the computer equipment judges the difference.
  • the data corresponding to the version in the log table is compared with the data of the actual version of the log table to determine the data difference.
  • the computer device obtains the version difference.
  • the computer equipment updates the index table.
  • the corresponding difference data is determined according to the version difference, and the difference data is correspondingly modified in the index table to update the index table.
  • the index table can be updated to correspond to the actual version of the log table data.
  • the computer device deletes the corresponding record in the log table.
  • the recovery process of data failure is realized, the consistency of the data is ensured, and the accuracy of data processing of the storage engine is improved.
  • FIG. 8 is a schematic structural diagram of a data indexing device provided by an embodiment of the application.
  • the data indexing device 800 includes:
  • the obtaining unit 801 is configured to obtain a target transaction, and the target transaction is used to indicate a data indexing process;
  • the determining unit 802 is configured to determine target data in the first storage engine according to the target transaction, where the target data is included in at least one first index table;
  • the mapping unit 803 is configured to determine at least one second index table configured in the second storage engine according to the first index table, where the second index table is obtained by adding row identifiers based on the first index table, and The row identifier is used to indicate row data in the target data, and the second storage engine supports the execution of the target transaction;
  • the index unit 804 is configured to determine index data corresponding to the data index condition in the first storage engine from the second index table, the index data being included in the target data.
  • the indexing unit 804 is specifically configured to obtain the data index condition of the target transaction in the first storage engine to determine the data index condition indicated Line identification
  • the index unit 804 is specifically configured to determine the row data corresponding to the row ID in the second index table according to the row ID indicated by the data index condition;
  • the index unit 804 is specifically configured to determine the corresponding index data according to the row data identified by the corresponding row in the second index table.
  • the data index condition includes a target index value
  • the index unit 804 is specifically configured to determine from the at least one second index table according to the target index value The corresponding second index table
  • the index unit 804 is specifically configured to determine the row data corresponding to the target index value in the corresponding second index table to obtain corresponding index data.
  • the mapping unit 803 is further configured to determine data manipulation information in the target transaction, and the data manipulation information is used to indicate data insertion, update, or deletion;
  • the mapping unit 803 is further configured to update the second index table according to the data manipulation information.
  • the mapping unit 803 is specifically configured to determine the change information in the first index table according to the data manipulation information
  • the mapping unit 803 is specifically configured to determine the corresponding row identifier in the second index table according to the change information
  • the mapping unit 803 is specifically configured to determine the corresponding entry of the change information in the second index table based on the corresponding row identifier in the second index table, so as to perform a check on the second index table. Update.
  • the mapping unit 803 is specifically configured to determine the version number corresponding to the first storage engine according to the change information, so as to generate in the second storage engine Log table
  • the mapping unit 803 is specifically configured to determine the data correspondence situation of the adjacent first storage engine version numbers in the log table, so as to perform data recovery.
  • the mapping unit 803 is specifically configured to determine the processing flow corresponding to the change information
  • the mapping unit 803 is specifically configured to set at least one failure detection point for the processing flow, and the failure detection point is determined based on the interaction process between the first storage engine and the second storage engine;
  • the mapping unit 803 is specifically configured to determine the data correspondence situation of the adjacent first storage engine version numbers based on the failure detection point, so as to perform data recovery.
  • the indexing unit 804 is specifically configured to determine the version number recorded in the log table in the second storage engine
  • the indexing unit 804 is specifically configured to determine the version number of the first storage engine according to the change information
  • the index unit 804 is specifically configured to compare the version number recorded in the log table with the version number of the first storage engine to generate a difference record
  • the index unit 804 is specifically configured to update the first index table and the second index table according to the difference record.
  • mapping unit 803 is further configured to determine database mode information in the target transaction
  • the mapping unit 803 is further configured to update the second index table according to the database schema information.
  • the database schema information includes at least one indexing method
  • the mapping unit 803 is specifically configured to classify the second index table based on the indexing method
  • the index mode includes a primary key index, a unique index, or a common index.
  • the row data indicated by the row identifier of the second index table corresponds to an index column in the target data, and the index column is obtained based on the target transaction.
  • the mapping unit 803 is specifically configured to obtain processing information for the row data in the database schema information;
  • the mapping unit 803 is specifically configured to extract corresponding index column change data based on the processing information
  • the mapping unit 803 is specifically configured to update the second index table according to the index column change data.
  • the target data is contained in at least one first index table; and then determining the configuration in the second storage engine according to the first index table
  • the second index table is obtained by adding row identifiers based on the first index table, the row identifiers are used to indicate the row data in the target data, and the second storage engine supports the execution of the target transaction; and then from the second index table Determine the index data corresponding to the data index condition in the first storage engine. In this way, the data indexing process across storage engines is realized.
  • the functional design enables the first storage engine to perform functions that cannot be carried by itself, which further improves the application scope and indexing efficiency of the database.
  • FIG. 9 is a schematic structural diagram of another data indexing device provided by an embodiment of the present application.
  • the data indexing device 900 may vary depending on the configuration or performance. A relatively large difference may be generated, which may include one or more central processing units (CPU) 922 (for example, one or more processors) and memory 932, and one or more storage applications 942 or data 944. Medium 930 (for example, one or one storage device in a large amount). Among them, the memory 932 and the storage medium 930 may be short-term storage or persistent storage.
  • CPU central processing units
  • the memory 932 and the storage medium 930 may be short-term storage or persistent storage.
  • the program stored in the storage medium 930 may include one or more modules (not shown in the figure), and each module may include a series of instruction operations on the data indexing device. Furthermore, the central processing unit 922 may be configured to communicate with the storage medium 930, and execute a series of instruction operations in the storage medium 930 on the data indexing device 900.
  • the data indexing device 900 may also include one or more power supplies 926, one or more wired or wireless network interfaces 950, one or more input and output interfaces 958, and/or one or more operating systems 941, such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, etc.
  • operating systems 941 such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, etc.
  • the steps performed by the data indexing device in the foregoing embodiment may be based on the structure of the data indexing device shown in FIG. 9.
  • An embodiment of the present application also provides a computer-readable storage medium, where the storage medium is used to store a computer program, and the computer program is used to execute the data index in the method described in the foregoing embodiments shown in FIGS. 2 to 7 The steps performed by the device.
  • the embodiment of the present application also provides a computer program product including data indexing instructions, which when running on a computer, causes the computer to execute the method performed by the data indexing apparatus in the method described in the embodiments shown in FIG. 2 to FIG. 7 step.
  • An embodiment of the present application also provides a data indexing system.
  • the data indexing system may include the data indexing device in the embodiment described in FIG. 8 or the data indexing device described in FIG. 8.
  • the disclosed system, device, and method can be implemented in other ways.
  • the device embodiments described above are merely illustrative, for example, the division of the units is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components may be combined or It can be integrated into another system, or some features can be ignored or not implemented.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
  • the functional units in the various embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
  • the above-mentioned integrated unit can be implemented in the form of hardware or software functional unit.
  • the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable storage medium.
  • the technical solution of this application essentially or the part that contributes to the existing technology or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium.
  • a computer device which can be a personal computer, a data indexing device, or a network device, etc.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (read-only memory, ROM), random access memory (random access memory, RAM), magnetic disk or optical disk and other media that can store program code .

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

一种存储引擎中数据索引的方法以及相关装置,通过获取指示数据来索引目标事务。根据目标事务确定第一存储引擎中的第一索引表;然后根据第一索引表确定在第二存储引擎中配置的至少一个第二索引表;进而从第二索引表中确定与第一存储引擎中的数据索引条件对应的索引数据。从而实现了跨存储引擎的数据索引过程,由于减少了新的存储引擎引入的开发复杂度,实现了多存储引擎的代码复用,提高了数据库的索引效率。

Description

一种存储引擎中数据索引的方法以及相关装置
本申请要求于2020年01月08日提交中国专利局、申请号为202010018746.0、申请名称为“一种存储引擎中数据索引的方法以及相关装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及计算机技术领域,尤其涉及存储引擎中数据索引。
背景技术
随着云技术的发展,越来越多的应用出现在人们生活中,而在云技术中,需要数据库的参与,以实现数据交互。数据库,简而言之可视为电子化的文件柜——存储电子文件的处所,用户可以对文件中的数据进行新增、查询、更新、删除等操作。
一般,可以采用MySQL这一支持多种存储引擎的数据库管理系统以适用于不同的应用场景中。其中,不同存储引擎对事务的支持能力可以不同。例如InnoDB存储引擎实现了完整的事务支持,也支持分布式事务的XA协议;而MyISAM存储引擎便于拓展;另外NEWDB引擎则支持单语句的事务;从而实现多种场景的数据索引过程。
发明内容
有鉴于此,本申请提供一种数据索引的方法,可以有效减少存储引擎应用过程的复杂度以及代码的冗余,提高数据索引过程的效率。
一方面,本申请实施例提供一种数据索引的方法,可以应用于终端设备中包含数据索引功能的系统或程序中,具体包括:获取目标事务,所述目标事务用于指示数据索引过程;
根据所述目标事务确定第一存储引擎中的目标数据,所述目标数据包含于至少一个第一索引表中;
根据所述第一索引表确定在第二存储引擎中配置的至少一个第二索引表,所述第二索引表为基于所述第一索引表增加行标识所得,所述行标识用于指示所述目标数据中的行数据,所述行数据与所述目标数据中的索引列相对应,所述索引列基于所述目标事务所得,所述第二存储引擎支持所述目标事务的执行;
从所述第二索引表中确定与所述第一存储引擎中的数据索引条件对应的索引数据,所述索引数据包含于所述目标数据。
另一方面,本申请实施例提供一种数据索引的装置,包括:获取单元,用于获取目标事务,所述目标事务用于指示数据索引过程;
确定单元,用于根据所述目标事务确定第一存储引擎中的目标数据,所述目标数据包含于至少一个第一索引表中;
映射单元,用于根据所述第一索引表确定在第二存储引擎中配置的至少一个第二索引表,所述第二索引表为基于所述第一索引表增加行标识所得,所述行标识用于指示所述目标数据中的行数据,所述行数据与所述目标数据中的索引列相对应,所述索引列基于所述目标事务所得,所述第二存储引擎支持所述目标事务的执行;
索引单元,用于从所述第二索引表中确定与所述第一存储引擎中的数据索引条件对应的索引数据,所述索引数据包含于所述目标数据。
另一方面,本申请实施例提供一种计算机设备,包括:存储器、处理器以及总线系统;所述存储器用于存储程序代码;所述处理器用于根据所述程序代码中的指令执行上述方面所述的数据索引的方法。
又一方面,本申请实施例提供一种存储介质,所述存储介质用于存储计算机程序,所述计算机程序用于执行以上方面的数据索引的方法。
又一方面,本申请实施例提供了一种包括指令的计算机程序产品,当其在计算机上运行时,使得所述计算机执行以上方面的数据索引的方法。
从以上技术方案可以看出,本申请实施例具有以下优点:
通过获取指示数据索引过程目标事务,并根据目标事务确定第一存储引擎中的目标数据,目标数据包含于至少一个第一索引表中;然后根据第一索引表确定在第二存储引擎中配置的至少一个第二索引表,第二索引表为基于第一索引表增加行标识所得,行标识用于指示目标数据中的行数据,第二存储引擎支持目标事务的执行;进而从第二索引表中确定与第一存储引擎中的数据索引条件对应的索引数据。从而实现了跨存储引擎的数据索引过程,由于减少了新的存储引擎引入的开发复杂度,实现了多存储引擎的代码复用,提高了数据库的索引效率;进一步的通过对于第二存储引擎的功能设计使得第一存储引擎可以执行自身无法搭载的功能,进一步的提高了数据库的应用范围以及索引效率。
附图说明
图1为数据索引系统运行的网络架构图;
图2为本申请实施例提供的一种数据索引的流程架构图;
图3为本申请实施例提供的一种数据索引的方法的流程图;
图4为本申请实施例提供的一种索引表更新流程示意图;
图5为本申请实施例提供的另一种数据索引的方法的流程图;
图6为本申请实施例提供的一种数据插入操作流程示意图;
图7为本申请实施例提供的一种数据恢复的方法的流程图;
图8为本申请实施例提供的一种数据索引装置的结构示意图;
图9为本申请实施例提供的另一种数据索引装置的结构示意图。
具体实施方式
本申请实施例提供了一种数据索引的方法以及相关装置,可以应用于终端设备中包含数据索引功能的系统或程序中,通过获取指示数据索引过程目标事务,并根据目标事务确定第一存储引擎中的目标数据,目标数据包含于至少一个第一索引表中;然后根据第一索引表确定在第二存储引擎中配置的至少一个第二索引表,第二索引表为基于第一索引表增加行标识所得,行标识用于指示目标数据中的行数据,行数据与目标数据中的索引列相对应,索引列基于目标事务所得,第二存储引擎支持目标事务的执行;进而从第二索引表中确定与第一存储引擎中的数据索引条件对应的索引数据。从而实现了跨存储引擎的数据索引过程,由于减少了新的存储引擎引入的开发复杂度,实现了多存储引擎的代码复用,提高了数据库的索引效率;进一步的通过对于第二存储引擎的功能设计使得第一存储引擎可以执行自身无法搭载的功能,进一步的提高了数据库的应用范围以及索引效率。
首先,对本申请实施例中可能出现的一些名词进行解释。
MySQL:一个关系型数据库管理系统,它采用插件式体系结构,可同时支持多个存储引擎。
存储引擎:负责数据库管理系统中数据存储和事务管理等功能,是数据库系统的核心模块。MySQL可以同时支持多个不同的存储引擎,以提供针对不同场景优化的数据存储和事务管理能力;常见的存储引擎有InnoDB、MyISAM等。
事务(transaction):是访问并可能操作各种数据项的一个数据库操作序列。事务由事务开始与事务结束之间执行的全部数据库操作组成。事务具有ACID特性,即原子性(Atomicity);一致性(Consistency);隔离性(Isolation);持久性(Durability)。
数据库模式定义语言(Data Definition Language,DDL):即数据可以模式信息,是用于描述数据库中要存储的库表结构的语言。比如CREATE、ALTER、DROP等。
数据操纵语言(Data Manipulation Language,DML):即数据操纵信息,用户通过它可以实现对数据库的基本操作。比如INSERT、DELETE、UPDATE等。
XA机制:数据库连接事务中的XA是指由X/Open组织提出的分布式事务处理的规范。
行标识(rowno):一种函数标识,可以返回表格中当前列段数据的当前行数,即行数据的集合标识。
行数据:索引表表中同一键值对应的索引值集合。
索引表:一种指示逻辑记录和物理记录之间对应关系的表,可以用于标识数据库中的数据。
应理解,本申请提供的数据索引方法可以应用于包含数据库或需要进行数据读写的系统或程序中,例如MySQL数据库或者基于MySQL运行的相关程序等,具体的,数据索引系统可以运行于如图1所示的系统架构中,如图1所示,是数据索引系统运行的系统架构图,下面以MySQL进行说明,MySQL包括接入层、服务层、存储引擎层和系统文件层;其中,接入层主要负责连接处理、授权认证、安全等事宜;而服务层主要负责查询解析、分析、优化、缓存及所有内置函数,所有跨存储引擎的功能都在这一层实现:存储过程、触发器、视图、binlog、表锁等;存储引擎层主要负责MySQL中数据的存储和提取,服务层通过API与存储引擎通信,存储引擎包含几十个底层函数API,每种引擎提供一套具体实现;系统文件层主要负责底层文件系统的读写。本申请中提供的数据索引方法可以应用于存储引擎层中插件式设置的多个存储引擎的交互过程中,通过将第一存储引擎的事务通过第二存储引擎复用,实现底层函数的统一应用。可以理解的是,图中示出了包含了1个第一存储引擎和1个第二存储引擎,但在实际场景中,还可以是更多的第一存储引擎或更多的第二存储引擎,具体数量因实际场景而定,此处不做限定,具体的,存储引擎的类型可以是InnoDB、MyISAM或NEWDB等。
可以理解的是,上述数据索引系统可以运行于服务器,例如:作为云端数据存储的应用,也可以运行于终端设备,还可以作为运行于第三方设备以提供数据索引,以得到数据索引后的节点分布结果;具体的数据索引系统可以是以一种程序的形式在上述设备中运行, 也可以作为上述设备中的系统部件进行运行,还可以作为云端服务程序的一种,具体运作模式因实际场景而定,此处不做限定。
本申请实施例可以是云技术(Cloud technology)的一种应用,云技术是指在广域网或局域网内将硬件、软件、网络等系列资源统一起来,实现数据的计算、储存、处理和共享的一种托管技术。
具体的,云技术基于云计算商业模式应用的网络技术、信息技术、整合技术、管理平台技术、应用技术等的总称,可以组成资源池,按需所用,灵活便利。云计算技术将变成重要支撑。技术网络系统的后台服务需要大量的计算、存储资源,如视频网站、图片类网站和更多的门户网站。伴随着互联网行业的高度发展和应用,将来每个物品都有可能存在自己的识别标志,都需要传输到后台系统进行逻辑处理,不同程度级别的数据将会分开处理,各类行业数据皆需要强大的系统后盾支撑,只能通过云计算来实现。
而在云技术中,需要数据库的参与。数据库,简而言之可视为电子化的文件柜——存储电子文件的处所,用户可以对文件中的数据进行新增、查询、更新、删除等操作。
所谓“数据库”是以一定方式储存在一起、能与多个用户共享、具有尽可能小的冗余度、与应用程序彼此独立的数据集合。
数据库管理系统(Database Management System,DBMS)是为管理数据库而设计的电脑软件系统,一般具有存储、截取、安全保障、备份等基础功能。数据库管理系统可以依据它所支持的数据库模型来作分类,例如关系式、XML;或依据所支持的计算机类型来作分类,例如服务器群集、移动电话;或依据所用查询语言来作分类,例如SQL、XQuery;或依据性能冲量重点来作分类,例如最大规模、最高运行速度;亦或其他的分类方式。不论使用哪种分类方式,一些DBMS能够跨类别,例如,同时支持多种查询语言。
而随着云技术的发展,越来越多的应用出现在人们生活中,而在云技术中,需要数据库的参与,以实现数据交互。
一般,可以采用MySQL这一支持多种存储引擎的数据库管理系统以适用于不同的应用场景中。其中,不同存储引擎对事务的支持能力可以不同。例如InnoDB存储引擎实现了完整的事务支持,也支持分布式事务的XA协议;而MyISAM存储引擎便于拓展;另外NEWDB引擎则支持单语句的事务;从而实现多种场景的数据索引过程。
但是,在MySQL的插件式体系架构中,每个存储引擎的代码都是独立的,每开发一个新的存储引擎,都需要单独开发完整的一套数据存储和事务机制;然而这个开发过程复杂度高,且容易造成代码的冗余,影响数据库的索引效率。
为了解决上述问题,本申请提出了一种数据索引的方法,该方法应用于图2所示的数据索引的流程框架中,如图2所示,为本申请实施例提供的一种数据索引的流程架构图,通过第一存储引擎接入目标事务,并基于目标事务在第二存储引擎中进行索引表的映射,从而实现跨存储引擎的数据索引过程;一般对于第二存储引擎采用具有完整事务功能的存储引擎,例如:InnoDB,从而使得第一存储引擎可以较为简单的进行复用存储引擎层中的底层函数,进行数据索引过程。
可以理解的是,本申请所提供的方法可以为一种程序的写入,以作为硬件系统中的一 种处理逻辑,也可以作为一种数据索引装置,采用集成或外接的方式实现上述处理逻辑。作为一种实现方式,该数据索引装置通过获取指示数据索引过程目标事务,并根据目标事务确定第一存储引擎中的目标数据,目标数据包含于至少一个第一索引表中;然后根据第一索引表确定在第二存储引擎中配置的至少一个第二索引表,第二索引表为基于第一索引表增加行标识所得,行标识用于指示目标数据中的行数据,行数据与目标数据中的索引列相对应,索引列基于目标事务所得,第二存储引擎支持目标事务的执行;进而从第二索引表中确定与第一存储引擎中的数据索引条件对应的索引数据。从而实现了跨存储引擎的数据索引过程,由于减少了新的存储引擎引入的开发复杂度,实现了多存储引擎的代码复用,提高了数据库的索引效率;进一步的通过对于第二存储引擎的功能设计使得第一存储引擎可以执行自身无法搭载的功能,进一步的提高了数据库的应用范围以及索引效率。
结合上述流程架构,下面将对本申请中数据索引的方法进行介绍,请参阅图3,图3为本申请实施例提供的一种数据索引的方法的流程图,应用于计算机设备中,本申请实施例至少包括以下步骤:
301、计算机设备获取目标事务。
本实施例中,目标事务用于指示数据索引过程;其中,目标事务可以是当前数据库一直在执行的事务;也可以是新下发的事务;而对于第一存储引擎而言,可以不进行底层函数的设计,通过第二存储引擎的底层函数即可实现数据索引功能。
302、计算机设备根据目标事务确定第一存储引擎中的目标数据。
本实施例中,目标数据包含于至少一个第一索引表中;其中,第一索引表记录了不同版本的第一存储引擎中的目标数据;在一种可能的场景中,第一存储引擎为NEWDB,该数据库存储引擎仅支持单一语句的事务,且在新加入该引擎的情况下,需要对于该单一语句的事务设计底层函数,以通过接口进行数据交互。
303、计算机设备根据第一索引表确定在第二存储引擎中配置的至少一个第二索引表。
本实施例中,第二索引表为基于第一索引表增加行标识所得,行标识用于指示目标数据中的行数据,行数据与目标数据中的索引列相对应,索引列基于目标事务所得,第二存储引擎支持目标事务的执行;由于每条插入或删除的记录都有独立的行标识(rowno),通过rowno可以直接获取到某一行数据,以进行批量操作。
可以理解的是,由于不同存储引擎功能的对应性,本实施例中以第一存储引擎不支持多语句并发事务,而第二存储引擎支持多语句并发事务进行说明,以指出存储引擎复用过程中的功能特征,即使用完整功能的存储引擎复用单一功能的存储引擎;下面,以第一存储引擎为NEWDB,第二存储引擎为InnoDB作为示例进行说明,此处不做限定。
可选的,由于数据库在使用过程中会涉及数据的交互,其数据可能存在变化的情况,比如DDL和DML语句的实现;此时需要对相关数据进行更新,基于本申请中的数据索引方法可以进行如下操作以实现数据更新。
一、对于DDL语句。
本实施例中,首先确定目标事务中的数据库模式信息,即DDL语句;然后根据DDL语 句对第二索引表进行更新。其中,DDL语句包括CREATE、ALTER、DROP等。具体的,如图4所示,是本申请实施例提供的一种索引表更新流程示意图,即首先在第一存储引擎中接入事务,并确定DDL语句,然后确定对应的第一索引表,基于第一索引表查找映射在第二存储引擎中的第二索引表,进而进行索引表的创建过程。
对于CREATE语句,判断是否在创建带索引的NEWDB表,如果是,则额外创建一个对应的索引表。该索引表包含与NEWDB表相同定义的索引列,以及一个额外的列rowno,其数据类型为BIGINT,用来表示NEWDB引擎中记录的行号,即基于索引方式对第二索引表进行分类,以对第二索引表进行更新,索引方式包括主键索引、唯一索引或普通索引。
在一种可能的场景中,假设第二索引表t拥有多个列c1、c2、c3、c4、c5,其中主键索引列为c1、c2,唯一索引列为c3、c4,普通索引列为c5。我们将在InnoDB中创建索引对应的三个索引表:t_pk(c1,c2,owno),t_unique1(c3,c4,rowno),t_index1(c5,rowno)。
具体的,可以通过如下语句执行上述过程:
对于主键索引;
CREATE TABLE_NDB_INDEX.t_pk(c1<Type>//确定主键索引指令;
,c2<Type>//输入索引条件;
,rowno BIGINT//输入行标识;
,PRIMARY KEY(c1,c2))//确定主键索引列;
ENGINE=InnoDB;//指示存储引擎;
对于唯一索引;
CREATE TABLE_NDB_INDEX.t_uniq1(c3<Type>//确定唯一索引指令;
,c4<Type>//输入索引条件;
,rowno BIGINT//输入行标识;
,UNIQUE INDEX(c3,c4))//确定唯一索引列;
ENGINE=InnoDB;//指示存储引擎;
对于普通索引;
CREATE TABLE_NDB_INDEX.t_index1(c5<Type>//确定普通索引指令
,rowno BIGINT//输入行标识;
,INDEX(c5))//确定普通索引列;
ENGINE=InnoDB;//指示存储引擎。
通过上述实现方式即可以实现CREATE语句对于第二索引表的更新过程。
另外,在对NEWDB引擎执行DROP TABLE语句时,如果表中有索引,则删除在_NDB_INDEX数据库中对应的索引表,即获取DDL语句中对于行数据的处理信息;然后基于处理信息提取对应的索引列变化数据;并根据索引列变化数据对第二索引表进行更新。
在执行ALTERE TABLE语句时。如果是ADD COLUMN,则在第二索引表中增加新列,不需要处理已有的索引表。如果是DROP COLUMN,则需要删除第二索引表中对应的列。如果是ADD INDEX、DROP INDEX等,则需要去增加或删除对应的索引表。
可以理解的是,上述语句为示例说明,其他DDL语句的执行也可以参照上述CREATE、 ALTER或DROP语句的执行过程进行,此处不做限定。
二、对于DML语句。
本实施例中,DML语句包括INSERT、UPDATE、DELETE;在执行这些语句的时候需要同时维护对应的索引表。具体的,即确定目标事务中的DML语句,DML语句用于指示数据的插入、更新或删除;然后根据DML语句对第二索引表进行更新。
对于INSERT语句,首先对于表上的每个主键索引和唯一索引,在索引表中检查待插入的记录是否已经存在;若记录已经存在,则结束事务,返回INSERT失败;否则,在NEWDB引擎中插入记录,并且返回所有记录的索引列和rowno的对照表;然后根据前一个步骤中返回的对照表,在索引表中批量插入。
对于DELETE语句,首先按照DELETE语句中的WHERE条件在NEWDB引擎中查找满足条件的记录;然后在NEWDB引擎中删除满足条件的记录,并记录其rowno;接下来按照rowno在索引表中依次执行删除操作。
可选的,若DELETE语句的WHERE条件带有索引列,即数据索引条件带有索引列,可以先在索引表中按照WHERE条件中的索引列条件找出匹配的记录的rowno。然后将查询条件中的索引列条件替换为找出的rowno,再进行删除操作。通过数据索引条件确定对应的行标识,进而进行删除操作,减少了数据处理量,提高了流程效率。
对于DELETE语句,首先按照UPDATE语句中的WHERE条件在NEWDB引擎中查找满足条件的记录;然后在NEWDB引擎中删除满足条件的记录,并记录其rowno;接下来在NEWDB引擎中根据UPDATE语句计算出新值并插入记录,记录新记录的rowno;并按照记录的新旧rowno在索引表中依次执行更新操作。
可选的,若UPDATE语句的WHERE条件带有索引列,通过数据索引条件确定对应的行标识,进而进行更新操作;可以减少了数据处理量,提高了流程效率。
可以理解的是,通过提供指定行标识的查询和修改,可以大大提高操作的执行效率;并且在某些业务场景下有更好的效果,比如业务需要分批处理多行数据,指定行号的查询可以保证处理时数据不会有重复或者错漏。
304、计算机设备从第二索引表中确定与第一存储引擎中的数据索引条件对应的索引数据。
本实施例中,索引数据包含于目标数据中。其中,对于索引的过程可以基于行标识进行,具体的,首先获取目标事务在第一存储引擎中的数据索引条件,以确定数据索引条件指示的行标识;然后根据数据索引条件指示的行标识确定在第二索引表中对应的行标识,以得到行数据;进而根据行数据确定对应的索引数据。由于行标识对应于多个行数据,直接在NEWDB引擎中查找对应行标识位置的记录并返回结果,不需要读取所有数据顺序进行查找,极大的提高查询速度。
可选的,若数据索引条件指示的是索引值,则可以根据目标索引值从所述至少一个第二索引表中确定对应的第二索引表;然后在这些索引表中筛选对应于目标索引值的行数据,以得到对应的索引数据。例如:如果WHERE条件指定了索引值,则先通过索引表进行查询,将符合条件的所有索引表中的记录查询出来,再根据索引表中记录的rowno值快速的查找到 实际的记录值。
在一种可能的场景中,可以通过下述语句实现上述数据查询的过程:
SELECT_rowno FROM t;//选择行标识;
SELECT_rowno FROM t WHERE a=12;//确定数据索引条件中的行标识;
SELECT*FROM t WHERE_rowno=12;//查询对应的行标识
DELETE FROM t WHERE_rowno=12;//提取查询结果
通过上述查询加速过程,可以提高从第二索引表中确定与第一存储引擎中的数据索引条件对应的索引数据的效率。
结合上述实施例可知,通过获取指示数据索引过程目标事务,并根据目标事务确定第一存储引擎中的目标数据,目标数据包含于至少一个第一索引表中;然后根据第一索引表确定在第二存储引擎中配置的至少一个第二索引表,第二索引表为基于第一索引表增加行标识所得,索引列基于目标事务所得,第二存储引擎支持目标事务的执行;进而从第二索引表中确定与第一存储引擎中的数据索引条件对应的索引数据。从而实现了跨存储引擎的数据索引过程,由于减少了新的存储引擎引入的开发复杂度,实现了多存储引擎的代码复用,提高了数据库的索引效率;进一步的通过对于第二存储引擎的功能设计使得第一存储引擎可以执行自身无法搭载的功能,进一步的提高了数据库的应用范围以及索引效率。
上述实施例介绍了数据索引的过程,在一种可能的场景中,还可以对数据更新过程进行检测,以保证数据一致,请参阅图5,图5为本申请实施例提供的另一种数据索引的方法的流程图,本申请实施例至少包括以下步骤:
501、计算机设备获取目标事务。
502、计算机设备根据目标事务确定第一存储引擎中的目标数据。
503、计算机设备根据第一索引表确定在第二存储引擎中配置的至少一个第二索引表。
本实施例中,步骤501-503与图3所述实施例的步骤301-303相似,相关特征描述可以进行参考,此处不做赘述。
504、计算机设备执行目标事务中的数据操纵信息。
本实施例中,可以参考图3所述实施例的步骤303中对于DML语句的执行的相关描述,此处不做赘述。
505、计算机设备在执行过程中设置至少一个故障检测点。
本实施例中,在执行过程中对数据使用多个版本存储,每个索引表都用版本号来表示数据变化;其中,版本号是一个递增的整数,即事务提交的时候为切换版本,否则使用老的版本。具体的,首先根据变化信息确定第一存储引擎对应的版本号,以在第二存储引擎中生成日志表;然后确定变化信息对应的处理流程;并为处理流程设置至少一个故障检测点,故障检测点基于第一存储引擎和第二存储引擎的交互过程确定;进而基于故障检测点确定相邻的第一存储引擎版本号的数据对应情况。
可选的,日志表可以采用如下结构:
CREATE TABLE_newdb.wal(dbid BIGINT//创建日志表
,tableid BIGINT//记录索引表
,version BIGINT//记录版本号
,PRIMARY KEY(dbid,tableid))//记录关联关系
ENGINE=InnoDB;//数据库类型
具体的,对应于具体的DML语句的执行过程中。对于INSERT语句,可以参考如图6所示的过程,图6为本申请实施例提供的一种数据插入操作流程示意图。该过程包括如下步骤:
601、计算机设备开始插入。
602、计算机设备判断是否记录存在。
本实施例中,对于第二索引表上的每个主键索引和唯一索引,在索引表中检查带插入的记录是否已经存在,如果记录已经存在,则结束事务,返回INSERT失败。
603、计算机设备报错结束。
604、计算机设备插入日志表记录。
本实施例中,记录步骤602中第二索引表的版本,然后在索引表_newdb.wal中插入记录(dbid,tableid,version),即存储引擎ID、索引表ID以及版本号。
605、计算机设备提交第二存储引擎事务。
606、计算机设备插入第一存储引擎日志表。
本实施例中,通过日志表对比的方法来判断数据一致性,此时在第一存储引擎进行日志表的插入过程;即插入NEWDB表记录数据,并进行NEWDB的事务提交,提交后的NEWDB表版本切换为版本version+1。
607、计算机设备提交第一存储引擎事务。
608、计算机设备插入索引表记录。
本实施例中,将插入NEWDB表返回的记录索引列和rowno插入到对应的索引表中。
609、计算机设备删除日志表记录。
本实施例中,删除表_newdb.wal中在步骤604中插入的记录。
610、计算机设备提交第二存储引擎事务。
611、计算机设备插入成功并结束进程。
对于上述插入过程,可以设置4个故障点;其中,步骤601-步骤605为故障点1的检测过程;步骤606-步骤607为故障点2的检测过程;步骤608-步骤610为故障点3的检测过程;步骤611为故障点4的检测过程。
具体的,对于故障点1,其可能的故障原因为_newdb.wal表未插入记录,或插入了记录但是没有提交。此时由于_newdb.wal表是使用的完整事务支持的存储引擎,恢复后由于事务原子性的保证,插入但未提交的记录会被回滚掉。故该故障点可以不进行额外的故障恢复过程,就可以保证NEWDB的数据和索引是一致的。
对于故障点2,此时_newdb.wal表已经记录了新数据,但是NEWDB表的事务未提交,数据和索引虽然没有不一致,但是_newdb.wal表的记录可能是不正确的,需要进行恢复。恢复时可以根据dbid和tableid字段查找出_newdb.wal中记录的版本,比如为1,然后判断NEWDB表的当前版本,如果也为1则说明需要回滚_newdb.wal表中的记录。其中,回滚操作可以是直接删除_newdb.wal表中对应的记录即可。
对于故障点3,此时_newdb.wal表已经记录了新数据,NEWDB表的事务也已经提交,但是索引表的数据还没有进行更新,_newdb.wal表的记录也没有删除,数据与索引不一致,需要进行恢复。恢复时根据dbid和tableid字段查找出_newdb.wal表中记录的版本,比如为1,然后判断NEWDB表的当前版本,如果为2,则说明是这种情况,需要重做恢复过程。由于_newdb.wal表和索引表是同一个存储引擎,且该引擎支持完整的事务,所以这两个表在第三步的操作是保持一致的。在NEWDB表中查询对比版本1和2之间发生的记录变化,并将这些变化的记录rowno写入到索引表中,然后删除_newdb.wal表的记录,并提交事务。
对于故障点4,由于所有步骤的事务都已经提交,故数据都处于一致状态,可以不进行数据恢复。
另外,对于DELETE语句,首先在NEWDB中索引表删除操作之前,记录下NEWDB表的版本,对_newdb.wal进行插入记录(dbid,tableid,version),并进行事务提交;然后按照DELETE语句中的WHERE条件在NEWDB引擎中查找满足条件的记录。如果WHERE语句指定了rowno,直接通过rowno查找到需要删除的NEWDB表记录。其中,如果WHERE语句指定了索引列,则通过索引表查找到需要删除的NEWDB表记录rowno,然后通过rowno找到需要删除的NEWDB表记录;如果WHERE语句未指定rowno和索引列,需要顺序查找NEWDB表来找到需要删除的记录。进而删除查找到的NEWDB记录,并进行提交,提交后的NEWDB表版本变化为version+1。然后根据NEWDB表删除的行的rowno,在索引表中将对应记录进行删除,再删除_newdb.wal在第一步插入的记录。并进行事务提交。
对于UPDATE语句,首先NEWDB中索引表更新操作之前,记录下NEWDB表的版本,对_newdb.wal进行插入记录(dbid,tableid,version),并进行事务提交。然后按照UPDATE语句中的WHERE条件在NEWDB引擎中查找满足条件的记录;如果WHERE语句指定了rowno,直接通过rowno查找到需要删除的NEWDB表记录;如果WHERE语句指定了索引,通过索引表查找到需要删除的NEWDB表记录的rowno,然后找到需要删除的NEWDB表记录;如果WHERE语句未指定rowno和索引,需要顺序查找NEWDB表来找到需要删除的记录。接下来根据UPDATE语句计算出新值并插入记录,记录新记录的rowno并进行提交,提交后的NEWDB表版泵变化为version+1。进而根据NEWDB表新老行的rowno,在索引表中将对应记录进行更新,再删除_newdb.wal在第一步插入的记录。并进行事务提交。
可选的,上述故障检测过程中的日志表也可以采用一个记录所有修改过的记录的行号的日志表,然后在恢复阶段根据日志表中的数据来进行REDO或者UNDO操作,即基于行标识为恢复单元进行检测与恢复。
506、计算机设备判断是否故障。
本实施例中,通过步骤505中介绍的对于不同故障点的判断过程,可以得到判断的结果,若判断为故障,则需进行数据恢复,以保证数据的一致性;若判断为正常,则进行步骤507的索引过程。
507、计算机设备从第二索引表中确定与第一存储引擎中的数据索引条件对应的索引数据。
本实施例中,步骤507与图3所述实施例的步骤304相似,相关特征描述可以进行参考, 此处不做赘述。
结合上述实施例可见,通过复用具有完整事务功能的存储引擎,来实现新引擎的索引,降低了新引擎索引实现的难度以及出现问题的风险。进一步的通过使用完整事务功能的存储引擎来实现新引擎的索引,可以复用已有的成熟模块,降低开发工作量,使得新引擎不需要重新实现B+树,也不需要实现其上完整的事务支持。
上述实施例介绍了数据索引中故障检测的过程,下面对检测出故障后的数据恢复过程进行介绍,请参阅图7,图7为本申请实施例提供的一种数据恢复的方法的流程图,本申请实施例至少包括以下步骤:
701、计算机设备恢复开始。
702、计算机设备查看日志表。
本实施例中,日志表为第二存储引擎的日志表,其中记录了第二存储引擎中的索引表根据DML语句而执行的相关数据操作记录。
703、计算机设备判断记录是否存在。
本实施例中,若操作记录存在则进行步骤704;若不存在,则进行步骤711。
704、计算机设备获取日志表中的任一记录。
本实施例中,获取日志表中记录的过程可以是随机进行的记录挑选,也可以是基于一定规则顺序的记录挑选,例如:首选获取数据插入的记录。
705、计算机设备获取日志表中的版本。
706、计算机设备获取日志表的实际版本。
本实施例中,日志表的实际版本即为该日志表对应事务执行过后的存储引擎数据状态。
707、计算机设备判断差异。
本实施例中,通过对日志表中的版本对应的数据,以及日志表的实际版本的数据进行对比,以确定数据差异。
708、计算机设备获取版本差异。
709、计算机设备更新索引表。
本实施例中,根据版本差异确定对应的差异数据,并对差异数据在索引表中进行相应的修改,以对索引表进行更新,具体的,可以将索引表更新为日志表的实际版本对应的数据。
710、计算机设备删除日志表对应记录。
711、计算机设备结束恢复。
结合上述实施例可见,通过确定第二存储引擎中日志表中记录的版本号;
然后根据变化信息确定第一存储引擎的版本号;并比对日志表中记录的版本号和第一存储引擎的版本号,以生成差异记录;进而根据差异记录更新第一索引表和第二索引表。实现了数据故障的恢复过程,保证了数据的一致性,提高了存储引擎数据处理的准确性。
为了更好的实施本申请实施例的上述方案,下面还提供用于实施上述方案的相关装置。 请参阅图8,图8为本申请实施例提供的一种数据索引装置的结构示意图,数据索引装置800包括:
获取单元801,用于获取目标事务,所述目标事务用于指示数据索引过程;
确定单元802,用于根据所述目标事务确定第一存储引擎中的目标数据,所述目标数据包含于至少一个第一索引表中;
映射单元803,用于根据所述第一索引表确定在第二存储引擎中配置的至少一个第二索引表,所述第二索引表为基于所述第一索引表增加行标识所得,所述行标识用于指示所述目标数据中的行数据,所述第二存储引擎支持所述目标事务的执行;
索引单元804,用于从所述第二索引表中确定与所述第一存储引擎中的数据索引条件对应的索引数据,所述索引数据包含于所述目标数据。
可选的,在本申请一些可能的实现方式中,所述索引单元804,具体用于获取所述目标事务在所述第一存储引擎中的数据索引条件,以确定所述数据索引条件指示的行标识;
所述索引单元804,具体用于根据所述数据索引条件指示的行标识,确定在所述第二索引表中所对应行标识的行数据;
所述索引单元804,具体用于根据所述第二索引表中所对应行标识的行数据确定对应的索引数据。
可选的,在本申请一些可能的实现方式中,所述数据索引条件包括目标索引值,所述索引单元804,具体用于根据所述目标索引值从所述至少一个第二索引表中确定对应的第二索引表;
所述索引单元804,具体用于确定所述对应的第二索引表中对应于所述目标索引值的行数据,以得到对应的索引数据。
可选的,在本申请一些可能的实现方式中,所述映射单元803,还用于确定所述目标事务中的数据操纵信息,所述数据操纵信息用于指示数据的插入、更新或删除;
所述映射单元803,还用于根据所述数据操纵信息对所述第二索引表进行更新。
可选的,在本申请一些可能的实现方式中,所述映射单元803,具体用于根据所述数据操纵信息确定所述第一索引表中的变化信息;
所述映射单元803,具体用于根据所述变化信息确定在所述第二索引表中对应的行标识;
所述映射单元803,具体用于基于所述在所述第二索引表中对应的行标识确定所述变化信息在所述第二索引表中的对应项,以对所述第二索引表进行更新。
可选的,在本申请一些可能的实现方式中,所述映射单元803,具体用于根据所述变化信息确定所述第一存储引擎对应的版本号,以在所述第二存储引擎中生成日志表;
所述映射单元803,具体用于确定所述日志表中相邻的所述第一存储引擎版本号的数据对应情况,以进行数据恢复。
可选的,在本申请一些可能的实现方式中,所述映射单元803,具体用于确定所述变化信息对应的处理流程;
所述映射单元803,具体用于为所述处理流程设置至少一个故障检测点,所述故障检测点基于所述第一存储引擎和所述第二存储引擎的交互过程确定;
所述映射单元803,具体用于基于所述故障检测点确定相邻的所述第一存储引擎版本号的数据对应情况,以进行数据恢复。
可选的,在本申请一些可能的实现方式中,所述索引单元804,具体用于确定所述第二存储引擎中所述日志表中记录的版本号;
所述索引单元804,具体用于根据所述变化信息确定所述第一存储引擎的版本号;
所述索引单元804,具体用于比对所述日志表中记录的版本号和所述第一存储引擎的版本号,以生成差异记录;
所述索引单元804,具体用于根据所述差异记录更新所述第一索引表和所述第二索引表。
可选的,在本申请一些可能的实现方式中,所述映射单元803,还用于确定所述目标事务中的数据库模式信息;
所述映射单元803,还用于根据所述数据库模式信息对所述第二索引表进行更新。
可选的,在本申请一些可能的实现方式中,所述数据库模式信息包括至少一种索引方式,所述映射单元803,具体用于基于所述索引方式对所述第二索引表进行分类,以对所述第二索引表进行更新,所述索引方式包括主键索引、唯一索引或普通索引。
可选的,在本申请一些可能的实现方式中,所述第二索引表的行标识所指示的行数据与所述目标数据中的索引列相对应,所述索引列基于所述目标事务所得,所述映射单元803,具体用于获取所述数据库模式信息中对于所述行数据的处理信息;
所述映射单元803,具体用于基于所述处理信息提取对应的索引列变化数据;
所述映射单元803,具体用于根据所述索引列变化数据对所述第二索引表进行更新。
通过获取指示数据索引过程目标事务,并根据目标事务确定第一存储引擎中的目标数据,目标数据包含于至少一个第一索引表中;然后根据第一索引表确定在第二存储引擎中配置的至少一个第二索引表,第二索引表为基于第一索引表增加行标识所得,行标识用于指示目标数据中的行数据,第二存储引擎支持目标事务的执行;进而从第二索引表中确定与第一存储引擎中的数据索引条件对应的索引数据。从而实现了跨存储引擎的数据索引过程,由于减少了新的存储引擎引入的开发复杂度,实现了多存储引擎的代码复用,提高了数据库的索引效率;进一步的通过对于第二存储引擎的功能设计使得第一存储引擎可以执行自身无法搭载的功能,进一步的提高了数据库的应用范围以及索引效率。
本申请实施例还提供了另一种数据索引装置,请参阅图9,图9是本申请实施例提供的另一种数据索引装置的结构示意图,该数据索引装置900可因配置或性能不同而产生比较大的差异,可以包括一个或一个以上中央处理器(central processing units,CPU)922(例如,一个或一个以上处理器)和存储器932,一个或一个以上存储应用程序942或数据944的存储介质930(例如一个或一个以上海量存储设备)。其中,存储器932和存储介质930可以是短暂存储或持久存储。存储在存储介质930的程序可以包括一个或一个以上模块(图示没标出),每个模块可以包括对数据索引装置中的一系列指令操作。更进一步地,中央处理器922可以设置为与存储介质930通信,在数据索引装置900上执行存储介质930中的一系列指令操作。
数据索引装置900还可以包括一个或一个以上电源926,一个或一个以上有线或无线网 络接口950,一个或一个以上输入输出接口958,和/或,一个或一个以上操作系统941,例如Windows ServerTM,Mac OS XTM,UnixTM,LinuxTM,FreeBSDTM等等。
上述实施例中由数据索引装置所执行的步骤可以基于该图9所示的数据索引装置结构。
本申请实施例中还提供一种计算机可读存储介质,,所述存储介质用于存储计算机程序,所述计算机程序用于执行如前述图2至图7所示实施例描述的方法中数据索引装置所执行的步骤。
本申请实施例中还提供一种包括数据索引指令的计算机程序产品,当其在计算机上运行时,使得计算机执行如前述图2至图7所示实施例描述的方法中数据索引装置所执行的步骤。
本申请实施例还提供了一种数据索引系统,所述数据索引系统可以包含图8所描述实施例中的数据索引装置,或者图8所描述的数据索引装置。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统,装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在本申请所提供的几个实施例中,应该理解到,所揭露的系统,装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,数据索引装置,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(read-only memory,ROM)、随机存取存储器(random access memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
以上所述,以上实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围。

Claims (16)

  1. 一种存储引擎中数据索引的方法,所述方法由计算机设备执行,所述方法包括:
    获取目标事务,所述目标事务用于指示数据索引过程;
    根据所述目标事务确定第一存储引擎中的目标数据,所述目标数据包含于至少一个第一索引表中;
    根据所述第一索引表确定在第二存储引擎中配置的至少一个第二索引表,所述第二索引表为基于所述第一索引表增加行标识所得,所述行标识用于指示所述目标数据中的行数据,所述第二存储引擎支持所述目标事务的执行;
    从所述第二索引表中确定与所述第一存储引擎中的数据索引条件对应的索引数据,所述索引数据包含于所述目标数据中。
  2. 根据权利要求1所述的方法,所述从所述第二索引表中确定与所述第一存储引擎中的数据索引条件对应的索引数据,包括:
    获取所述目标事务在所述第一存储引擎中的所述数据索引条件,以确定所述数据索引条件指示的行标识;
    根据所述数据索引条件指示的行标识,确定在所述第二索引表中所对应行标识的行数据;
    根据所述第二索引表中所对应行标识的行数据确定对应的索引数据。
  3. 根据权利要求1所述的方法,所述数据索引条件包括目标索引值,所述从所述第二索引表中确定与所述第一存储引擎中的数据索引条件对应的索引数据,包括:
    根据所述目标索引值从所述至少一个第二索引表中确定对应的第二索引表;
    确定所述对应的第二索引表中对应于所述目标索引值的行数据,以得到对应的索引数据。
  4. 根据权利要求1所述的方法,所述方法还包括:
    确定所述目标事务中的数据操纵信息,所述数据操纵信息用于指示数据的插入、更新或删除;
    根据所述数据操纵信息对所述第二索引表进行更新。
  5. 根据权利要求4所述的方法,所述根据所述数据操纵信息对所述第二索引表进行更新,包括:
    根据所述数据操纵信息确定所述第一索引表中的变化信息;
    根据所述变化信息确定在所述第二索引表中对应的行标识;
    基于所述在所述第二索引表中对应的行标识确定所述变化信息在所述第二索引表中的对应项,以对所述第二索引表进行更新。
  6. 根据权利要求5所述的方法,所述基于所述在所述第二索引表中对应的行标识确定所述变化信息在所述第二索引表中的对应项,以对所述第二索引表进行更新之后,所述方法还包括:
    根据所述变化信息确定所述第一存储引擎对应的版本号,以在所述第二存储引擎中生成日志表;
    确定所述日志表中相邻的所述第一存储引擎版本号的数据对应情况,以进行数据恢复。
  7. 根据权利要求6所述的方法,所述确定所述日志表中相邻的所述第一存储引擎版本号的数据对应情况,以进行数据恢复,包括:
    确定所述变化信息对应的处理流程;
    为所述处理流程设置至少一个故障检测点,所述故障检测点基于所述第一存储引擎和所述第二存储引擎的交互过程确定;
    基于所述故障检测点确定相邻的所述第一存储引擎版本号的数据对应情况,以进行数据恢复。
  8. 根据权利要求6所述的方法,所述进行数据恢复,包括:
    确定所述第二存储引擎中所述日志表中记录的版本号;
    根据所述变化信息确定所述第一存储引擎的版本号;
    比对所述日志表中记录的版本号和所述第一存储引擎的版本号,以生成差异记录;
    根据所述差异记录更新所述第一索引表和所述第二索引表。
  9. 根据权利要求1-8任一项所述的方法,所述方法还包括:
    确定所述目标事务中的数据库模式信息;
    根据所述数据库模式信息对所述第二索引表进行更新。
  10. 根据权利要求9所述的方法,所述数据库模式信息包括至少一种索引方式,所述根据所述数据库模式信息对所述第二索引表进行更新,包括:
    基于所述索引方式对所述第二索引表进行分类,以对所述第二索引表进行更新,所述索引方式包括主键索引、唯一索引或普通索引。
  11. 根据权利要求9所述的方法,所述第二索引表的行标识所指示的行数据与所述目标数据中的索引列相对应,所述索引列基于所述目标事务所得,所述根据所述数据库模式信息对所述第二索引表进行更新,包括:
    获取所述数据库模式信息中对于所述行数据的处理信息;
    基于所述处理信息提取对应的索引列变化数据;
    根据所述索引列变化数据对所述第二索引表进行更新。
  12. 根据权利要求1所述的方法,所述第一存储引擎不支持多语句并发事务,所述第二存储引擎支持所述多语句并发事务。
  13. 一种数据索引的装置,包括:
    获取单元,用于获取目标事务,所述目标事务用于指示数据索引过程;
    确定单元,用于根据所述目标事务确定第一存储引擎中的目标数据,所述目标数据包含于至少一个第一索引表中;
    映射单元,用于根据所述第一索引表确定在第二存储引擎中配置的至少一个第二索引表,所述第二索引表为基于所述第一索引表增加行标识所得,所述行标识用于指示所述目标数据中的行数据,所述第二存储引擎支持所述目标事务的执行;
    索引单元,用于从所述第二索引表中确定与所述第一存储引擎中的数据索引条件对应的索引数据,所述索引数据包含于所述目标数据中。
  14. 一种计算机设备,所述计算机设备包括处理器以及存储器:
    所述存储器用于存储程序代码;所述处理器用于根据所述程序代码中的指令执行权利要求1至12任一项所述的数据索引的方法。
  15. 一种计算机可读存储介质,所述存储介质用于存储计算机程序,所述计算机程序用于执行上述权利要求1至12任一项所述的数据索引的方法。
  16. 一种包括指令的计算机程序产品,当其在计算机上运行时,使得所述计算机执行权利要求1-12任意一项所述的数据索引的方法。
PCT/CN2020/126397 2020-01-08 2020-11-04 一种存储引擎中数据索引的方法以及相关装置 WO2021139376A1 (zh)

Priority Applications (3)

Application Number Priority Date Filing Date Title
JP2022519003A JP7362190B2 (ja) 2020-01-08 2020-11-04 ストレージエンジンにおけるデータインデックス付け方法、データインデックス付け装置、コンピュータ装置、及びコンピュータプログラム
EP20912278.7A EP4006740A4 (en) 2020-01-08 2020-11-04 METHOD FOR INDEXING DATA IN STORAGE ENGINES, AND RELATED DEVICE
US17/671,578 US11868330B2 (en) 2020-01-08 2022-02-14 Method for indexing data in storage engine and related apparatus

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010018746.0 2020-01-08
CN202010018746.0A CN111259004B (zh) 2020-01-08 2020-01-08 一种存储引擎中数据索引的方法以及相关装置

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/671,578 Continuation US11868330B2 (en) 2020-01-08 2022-02-14 Method for indexing data in storage engine and related apparatus

Publications (1)

Publication Number Publication Date
WO2021139376A1 true WO2021139376A1 (zh) 2021-07-15

Family

ID=70954139

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/126397 WO2021139376A1 (zh) 2020-01-08 2020-11-04 一种存储引擎中数据索引的方法以及相关装置

Country Status (5)

Country Link
US (1) US11868330B2 (zh)
EP (1) EP4006740A4 (zh)
JP (1) JP7362190B2 (zh)
CN (1) CN111259004B (zh)
WO (1) WO2021139376A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11954345B2 (en) 2021-12-03 2024-04-09 Samsung Electronics Co., Ltd. Two-level indexing for key-value persistent storage device

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111259004B (zh) 2020-01-08 2023-04-14 腾讯科技(深圳)有限公司 一种存储引擎中数据索引的方法以及相关装置
CN111782589B (zh) * 2020-06-10 2022-06-10 厦门市美亚柏科信息股份有限公司 一种用于操作历史重现的数据模型的构建方法及系统
JP7424501B2 (ja) * 2020-09-02 2024-01-30 日本電気株式会社 結合テーブル特定システム、結合テーブル探索装置、方法およびプログラム
CN112835905B (zh) * 2021-02-05 2023-08-01 上海达梦数据库有限公司 一种数组类型列的索引方法、装置、设备以及存储介质
CN117555894A (zh) * 2022-08-05 2024-02-13 华为技术有限公司 一种分布式数据库中创建全局二级索引的方法及装置

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102750376A (zh) * 2012-06-25 2012-10-24 天津神舟通用数据技术有限公司 一种多版本数据库存储引擎系统及其相关处理的实现方法
CN102918529A (zh) * 2010-04-26 2013-02-06 诺基亚公司 索引生成和使用的方法和装置
CN106326381A (zh) * 2016-08-16 2017-01-11 梁猛 基于MapDB构建的HBase数据检索方法
CN106777027A (zh) * 2016-12-08 2017-05-31 北京国电通网络技术有限公司 大规模并行处理行列混合数据存储装置及存储、查询方法
CN111259004A (zh) * 2020-01-08 2020-06-09 腾讯科技(深圳)有限公司 一种存储引擎中数据索引的方法以及相关装置

Family Cites Families (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP3269849B2 (ja) * 1992-05-29 2002-04-02 株式会社日立製作所 並列データベース処理システムとその検索方法
US20120143912A1 (en) * 2010-12-05 2012-06-07 Unisys Corp. Extending legacy database engines with object-based functionality
US10262050B2 (en) * 2015-09-25 2019-04-16 Mongodb, Inc. Distributed database systems and methods with pluggable storage engines
CN102955792A (zh) * 2011-08-23 2013-03-06 崔春明 一种实时全文搜索引擎事务处理的实现方法
JP5685213B2 (ja) * 2012-02-22 2015-03-18 日本電信電話株式会社 差分レプリケーションシステム、マスターデータベース装置、及びスレーブデータベース装置
CN104239357B (zh) * 2013-06-21 2019-01-18 Sap欧洲公司 用于数据库事务的并发请求处理
CN105488124A (zh) * 2015-11-24 2016-04-13 浪潮(北京)电子信息产业有限公司 一种创建索引文件的方法及装置
JP2020514935A (ja) * 2017-03-15 2020-05-21 ファウナ, インク.Fauna, Inc. データベース用の方法及びシステム
CN108062358B (zh) * 2017-11-28 2020-12-29 厦门市美亚柏科信息股份有限公司 innodb引擎删除记录的离线恢复方法、存储介质
CN108415982B (zh) * 2018-02-09 2021-07-06 上海商米科技集团股份有限公司 数据库的处理方法和装置
CN110287198A (zh) * 2019-07-01 2019-09-27 四川新网银行股份有限公司 基于HBase数据库的金融数据索引方法
CN110413568A (zh) * 2019-07-30 2019-11-05 北京百度网讯科技有限公司 一种数据复用方法、装置、电子设备及存储介质
CN110502506B (zh) * 2019-08-29 2023-12-15 北京博睿宏远数据科技股份有限公司 一种数据处理方法、装置、设备和存储介质
CN110609839B (zh) * 2019-09-17 2021-05-25 北京海益同展信息科技有限公司 区块链数据处理的方法、装置、设备及可读存储介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102918529A (zh) * 2010-04-26 2013-02-06 诺基亚公司 索引生成和使用的方法和装置
CN102750376A (zh) * 2012-06-25 2012-10-24 天津神舟通用数据技术有限公司 一种多版本数据库存储引擎系统及其相关处理的实现方法
CN106326381A (zh) * 2016-08-16 2017-01-11 梁猛 基于MapDB构建的HBase数据检索方法
CN106777027A (zh) * 2016-12-08 2017-05-31 北京国电通网络技术有限公司 大规模并行处理行列混合数据存储装置及存储、查询方法
CN111259004A (zh) * 2020-01-08 2020-06-09 腾讯科技(深圳)有限公司 一种存储引擎中数据索引的方法以及相关装置

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11954345B2 (en) 2021-12-03 2024-04-09 Samsung Electronics Co., Ltd. Two-level indexing for key-value persistent storage device

Also Published As

Publication number Publication date
US11868330B2 (en) 2024-01-09
EP4006740A1 (en) 2022-06-01
CN111259004B (zh) 2023-04-14
CN111259004A (zh) 2020-06-09
EP4006740A4 (en) 2022-11-23
JP7362190B2 (ja) 2023-10-17
US20220171754A1 (en) 2022-06-02
JP2022550049A (ja) 2022-11-30

Similar Documents

Publication Publication Date Title
WO2021139376A1 (zh) 一种存储引擎中数据索引的方法以及相关装置
US11429641B2 (en) Copying data changes to a target database
US10191932B2 (en) Dependency-aware transaction batching for data replication
US5625815A (en) Relational database system and method with high data availability during table data restructuring
US20150032695A1 (en) Client and server integration for replicating data
US8560500B2 (en) Method and system for removing rows from directory tables
US8296269B2 (en) Apparatus and method for read consistency in a log mining system
US10157211B2 (en) Method and system for scoring data in a database
US10866968B1 (en) Compact snapshots of journal-based storage systems
US9390111B2 (en) Database insert with deferred materialization
CN115552390A (zh) 无服务器数据湖索引子系统及应用编程接口
Lu et al. A lightweight and efficient temporal database management system in TDSQL
CN113934750A (zh) 基于编译方式的数据血缘关系分析方法
Kvet et al. Master Index Access as a Data Tuple and Block Locator
US20230229645A1 (en) Schema management for journal-based storage systems
CN112463447B (zh) 一种基于分布式数据库实现物理备份的优化方法
US11188228B1 (en) Graphing transaction operations for transaction compliance analysis
Zhao et al. T-SQL: A Lightweight Implementation to Enable Built-in Temporal Support in MVCC-Based RDBMSs
van Otterdijk et al. Succinct Data Structures and Delta Encoding for Modern Databases
US20240143594A1 (en) Offloading graph components to persistent storage for reducing resident memory in distributed graph processing
Jota et al. A physical design strategy on a nosql dbms
US11599520B1 (en) Consistency management using query restrictions in journal-based storage systems
Theodorakis et al. Aion: Efficient Temporal Graph Data Management.
CN115729930A (zh) 使用自维护的结构信息以进行更快的数据访问
Singh et al. DATA QUALITY TOOLS FOR DATAWAREHOUSE MODELS

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20912278

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2020912278

Country of ref document: EP

Effective date: 20220214

ENP Entry into the national phase

Ref document number: 2022519003

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE