WO2024022329A1 - 一种基于键值存储系统的数据管理方法及其相关设备 - Google Patents

一种基于键值存储系统的数据管理方法及其相关设备 Download PDF

Info

Publication number
WO2024022329A1
WO2024022329A1 PCT/CN2023/109096 CN2023109096W WO2024022329A1 WO 2024022329 A1 WO2024022329 A1 WO 2024022329A1 CN 2023109096 W CN2023109096 W CN 2023109096W WO 2024022329 A1 WO2024022329 A1 WO 2024022329A1
Authority
WO
WIPO (PCT)
Prior art keywords
keys
multiple keys
transaction
key
user
Prior art date
Application number
PCT/CN2023/109096
Other languages
English (en)
French (fr)
Inventor
姚婷
朱挺炜
苏晓航
王道辉
Original Assignee
华为云计算技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为云计算技术有限公司 filed Critical 华为云计算技术有限公司
Publication of WO2024022329A1 publication Critical patent/WO2024022329A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24552Database cache management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/466Transaction processing

Definitions

  • Embodiments of the present application relate to the field of computer technology, and in particular, to a data management method based on a key-value storage system and related equipment.
  • the key-value (KV) storage system is widely used in computers due to its advantages of high performance, high scalability, strong scalability, and simple and easy-to-use interfaces.
  • the key-value storage system can use key-value pairs.
  • Form to coordinate and manage user data for example, user file data, user file metadata, etc.
  • the key-value storage system can be called to implement these operations on key-value pairs.
  • the key-value storage system can create a user's transaction under the user's call, and the transaction can include the user's operations on multiple key-value pairs. Since the system includes a database instance that stores key-value pairs, the system can complete operations on multiple key-value pairs in the instance to complete the transaction.
  • the embodiment of the present application provides a data management method and related equipment based on a key-value storage system, which can fully utilize the CPU resources and memory resources of the server where the key-value storage system is located, and can effectively utilize the concurrency of the key-value storage system. ability.
  • the first aspect of the embodiment of the present application provides a data management method based on a key-value storage system.
  • the key-value storage system includes a cache and multiple storage instances.
  • the user data table of each storage instance stores key-value pairs. Methods include:
  • a user can call the key-value storage system, so that the key-value storage system can create a transaction unique to that user and assign a unique timestamp to the user's transaction.
  • the user can also call the key-value storage system, so that the key-value storage system obtains multiple keys that the user needs to read in his transaction.
  • the key-value storage system can first add the user's transaction foreground lock to these keys in the cache. After adding foreground locks to these keys, the key-value storage system can detect whether the cache stores the values of these keys. There are two situations:
  • the key-value storage system can successfully read the values of these multiple keys from the cache and return the values of these multiple keys to the user.
  • the key-value storage system cannot read the values of these multiple keys from the cache. Therefore, the key-value storage system can first determine the values of these multiple keys in multiple storage instances. One or several storage instances where it is located. Then, the key-value storage system reads the values of these multiple keys from the user data tables of these storage instances. The key-value storage system can then return the values of the multiple keys to the user and write the values of the multiple keys to the cache.
  • the key-value storage system can release the user's transaction foreground lock on these keys.
  • the user has successfully read the values of the multiple keys, so the user can call the key-value storage system, so that the key-value storage system ends the user's transaction.
  • the key-value storage system can obtain multiple keys that need to be read in the transaction and add transaction foreground locks to multiple keys. If the values of multiple keys are stored in the cache, the key-value storage system reads the values of multiple keys from the cache and provides the values of multiple keys to the user. If the values of multiple keys are not stored in the cache, then Determine the storage instance where multiple keys are located from multiple storage instances, read the values of multiple keys from the user data table of the storage instance where multiple keys are located, provide the values of multiple keys to the user, and write to the cache Enter values for multiple keys. The key-value storage system can then release the foreground lock on multiple keys and end the user's transaction.
  • the key-value storage system contains multiple storage instances, which can make full use of the CPU resources and memory resources of the server where the key-value storage system is located, and when the user needs to read the values of multiple keys, the key-value storage system In the storage instance where multiple keys are located, the values of these multiple keys can be read one by one. The values of these multiple keys can also be read concurrently to return them to the user, which can effectively leverage the concurrency capabilities of the key-value storage system.
  • the method before releasing the foreground lock on multiple keys, also includes: obtaining new values of multiple keys that need to be written in the transaction.
  • the new values of the multiple keys are determined by the user on multiple keys.
  • the value of The key releases the background lock of the transaction; after releasing the foreground lock on multiple keys, the method also includes notifying the user that new values for multiple keys have been successfully written.
  • the user after the user obtains the values of the multiple keys, the user can modify the values of the multiple keys to obtain new values of the multiple keys. Then, the user can also call the key-value storage system, so that the key-value storage system obtains the new values of the multiple keys that the user needs to write in his transaction.
  • the key-value storage system can add background locks for the user's transaction on multiple keys.
  • the key-value storage system can then write new values for multiple keys into the storage instance's user data table and cache.
  • the key-value storage system can then unlock the user's transaction background lock on multiple keys.
  • the key-value storage system can then release the foreground lock on the user's transaction on multiple keys and then notify the user that new values for the multiple keys have been successfully written.
  • the user has successfully read the values of the multiple keys and written the new values of the multiple keys. Therefore, the user can call the key-value storage system to cause the key-value storage system to end the user's transaction.
  • each storage instance also contains a transaction status table.
  • the method Before releasing the foreground lock on multiple keys, the method also includes: obtaining new values of multiple keys that need to be written in the transaction, multiple The new value of a key is obtained by the user modifying the values of multiple keys; if multiple keys are stored in several storage instances, select the target key among the multiple keys and write the new values of the multiple keys to the target key.
  • the user can modify the values of the multiple keys to obtain new values of the multiple keys.
  • the user can also call the key-value storage system, so that the key-value storage system obtains the new values of the multiple keys that the user needs to write in his transaction. If these multiple keys are stored in several storage instances, the key-value storage system can first select the target key among the multiple keys. Then, the key-value storage system writes the new values of these multiple keys into the transaction status table and cache of the storage instance where the target key is located. Then, the key-value storage system can release the foreground lock on multiple keys, then notify the user that new values for multiple keys have been successfully written, and the foreground operation ends. The key-value storage system can then add transactional background locks on multiple keys.
  • the key-value storage system can concurrently write new values for multiple keys into the user data tables of these several storage instances.
  • the key-value storage system can then unlock the transaction's background lock on multiple keys.
  • the background operation ends.
  • the user has successfully read the values of the multiple keys and written the new values of the multiple keys. Therefore, the user can call the key-value storage system to cause the key-value storage system to end the user's transaction.
  • determining the storage instance where multiple keys are located from multiple storage instances includes: determining the storage instance numbers corresponding to the multiple keys, and determining the multiple keys based on the storage instance numbers corresponding to the multiple keys.
  • the key-value storage system sets a unique storage instance number for each storage instance among multiple storage instances, and maintains a mathematical relationship between the storage instance number and the key. Then, based on this mathematical relationship, the key-value storage system can determine the storage instance numbers corresponding to multiple keys, and based on the storage instance numbers corresponding to the multiple keys, determine the storage instance where the multiple keys are located.
  • the key-value storage system is set in a file system.
  • each key contains metadata of a file or directory
  • the metadata contains an index number of the file or directory, which determines the multiple keys.
  • the corresponding storage instance number includes: the remainder obtained by dividing the index numbers contained in multiple keys by the number of multiple storage instances is used as the storage instance number corresponding to the multiple keys.
  • the key-value storage system can divide the index number of the directory or file containing the key by the storage instance. The remainder of the total number of instances is used as the storage instance number corresponding to the key. Then, after determining the storage instance number corresponding to the key, the storage instance where the key is located can be found based on the number.
  • the operation of adding or releasing foreground locks on multiple keys is implemented in the cache, and the operation of adding or releasing background locks on multiple keys is implemented in the cache.
  • the cache further stores at least one of the following: a timestamp associated with a plurality of keys, a timestamp associated with a transaction, and a state of the transaction.
  • the second aspect of the embodiment of the present application provides a data management device based on a key-value storage system.
  • the key-value storage system includes a cache and multiple storage instances.
  • the user data table of each storage instance stores key-value pairs.
  • the device Including: creation module, used to create user transactions; first acquisition module, used to obtain multiple keys that need to be read in the transaction, and add transaction foreground locks to multiple keys; first reading module block, used to read the values of multiple keys from the cache if the values of multiple keys are stored in the cache, and provide the values of multiple keys to the user; the second reading module, used to read the values of multiple keys if they are not stored in the cache.
  • the storage instance where the multiple keys are located is determined from multiple storage instances, and the values of the multiple keys are read from the user data table of the storage instance where the multiple keys are located, and multiple key values are provided to the user.
  • the value of a key and writes the value of multiple keys to the cache; the processing module is used to release the foreground lock on multiple keys; the end module is used to end the transaction.
  • the device further includes: a second acquisition module, used to acquire new values of multiple keys that need to be written in the transaction.
  • the new values of the multiple keys are obtained by the user. Obtained by modification;
  • the first writing module is used to add transaction background locks to multiple keys if multiple keys are stored in the same storage instance, and write the new values of multiple keys into the user data table of the storage instance.
  • the notification module is used to notify the user that new values for multiple keys have been successfully written.
  • each storage instance also includes a transaction status table
  • the device further includes: a second acquisition module, used to acquire new values of multiple keys that need to be written in the transaction, and The new value is obtained by the user modifying the values of multiple keys; the second writing module is used to select the target key among the multiple keys if multiple keys are stored in several storage instances, and write the new values of the multiple keys.
  • the value is written into the transaction status table and cache of the storage instance where the target key is located; the notification module is used to notify the user that new values for multiple keys have been successfully written; the third writing module is used to add the background of transactions to multiple keys Lock, write the new values of multiple keys to the user data tables of several storage instances concurrently, release the background lock of the transaction for multiple keys, and delete the values of multiple keys in the transaction status table of the storage instance where the target key is located. new value.
  • the second reading module is used to determine the storage instance numbers corresponding to the multiple keys, and determine the storage instance where the multiple keys are located based on the storage instance numbers corresponding to the multiple keys.
  • the key-value storage system is set in a file system.
  • each key contains metadata of a file or directory
  • the metadata contains an index number of the file or directory.
  • the second read Module used to divide the index number contained in multiple keys by the number of multiple storage instances, and use the remainder as the storage instance number corresponding to the multiple keys.
  • the operation of adding or releasing foreground locks on multiple keys is implemented in the cache, and the operation of adding or releasing background locks on multiple keys is implemented in the cache.
  • the cache further stores at least one of the following: a timestamp associated with a plurality of keys, a timestamp associated with a transaction, and a state of the transaction.
  • the third aspect of the embodiment of the present application provides a data management device based on a key-value storage system.
  • the device includes a memory and a processor; the memory stores code, and the processor is configured to execute the code.
  • the device performs the method described in the first aspect or any possible implementation manner in the first aspect.
  • a fourth aspect of the embodiments of the present application provides a computer storage medium.
  • the computer storage medium stores one or more instructions. When executed by one or more computers, the instructions cause one or more computers to implement the first aspect or the third aspect. On the one hand, the method described in any possible implementation manner.
  • the fifth aspect of the embodiments of the present application provides a computer program product.
  • the computer program product stores instructions. When the instructions are executed by a computer, the computer implements the first aspect or any of the possible implementation methods in the first aspect. method described.
  • the key-value storage system can obtain multiple keys that need to be read in the transaction, and add transaction foreground locks to the multiple keys. If the values of multiple keys are stored in the cache, the key-value storage system reads the values of multiple keys from the cache and provides the values of multiple keys to the user. If the values of multiple keys are not stored in the cache, then Determine the storage instance where multiple keys are located from multiple storage instances, read the values of multiple keys from the user data table of the storage instance where multiple keys are located, provide the values of multiple keys to the user, and write to the cache Enter values for multiple keys. The key-value storage system can then release the foreground lock on multiple keys and end the user's transaction.
  • the key-value storage system contains multiple storage instances, which can make full use of the CPU resources and memory resources of the server where the key-value storage system is located, and when the user needs to read the values of multiple keys, the key-value storage system In the storage instance where multiple keys are located, the values of the multiple keys can be read one by one, or the values of the multiple keys can be read concurrently to return them to the user. This can effectively take advantage of the concurrency of the key-value storage system. ability.
  • Figure 1 is a schematic structural diagram of a key-value storage system provided by an embodiment of the present application.
  • Figure 2 is a schematic structural diagram of the file system provided by the embodiment of the present application.
  • Figure 3 is a schematic flow chart of a data management method based on a key-value storage system provided by an embodiment of the present application
  • Figure 4 is a schematic diagram of an application example of the data management method based on the key-value storage system provided by the embodiment of the present application;
  • Figure 5 is a schematic diagram of another application example of the data management method based on the key-value storage system provided by the embodiment of the present application;
  • Figure 6 is a schematic structural diagram of a data management device based on a key-value storage system provided by an embodiment of the present application
  • FIG. 7 is another schematic structural diagram of a data management device based on a key-value storage system provided by an embodiment of the present application.
  • the embodiment of the present application provides a data management method and related equipment based on a key-value storage system, which can fully utilize the CPU resources and memory resources of the server where the key-value storage system is located, and can effectively utilize the concurrency of the key-value storage system. ability.
  • the key-value storage system is widely used in computers due to its advantages of high performance, high scalability, strong scalability, and simple and easy-to-use interfaces.
  • the key-value storage system can coordinate and manage user data in the form of key-value pairs. For example, the data of the user's files, the metadata of the user's files, etc.
  • the key-value storage system can be called to implement these operations on key-value pairs.
  • the key-value storage system can create a user's transaction under the user's call, and the transaction can include the user's operations on multiple key-value pairs. Since the system includes a database instance that stores key-value pairs, the system can complete operations on multiple key-value pairs (for example, one or more of adding, reading, writing, and deleting) in the instance. ) to complete the transaction.
  • this system only provides a single instance to manage key-value pairs, that is, to manage user data, based on a single instance.
  • a single instance can only manage one disk and cannot fully utilize the CPU resources and memory resources of the server where the system is located, resulting in limited overall performance of data management.
  • the performance of a single instance can only be exerted, and the system cannot be maximized for concurrency.
  • FIG. 1 is a schematic structural diagram of a key-value storage system provided by an embodiment of the present application.
  • the system includes: a transaction interface layer, a transaction processing layer, a timestamp allocation module, a data partition module, and a data cache module (i.e. the aforementioned cache) and multiple storage instances.
  • a transaction interface layer As shown in Figure 1, the system includes: a transaction interface layer, a transaction processing layer, a timestamp allocation module, a data partition module, and a data cache module (i.e. the aforementioned cache) and multiple storage instances.
  • a data cache module i.e. the aforementioned cache
  • the transaction interface layer can provide transaction interfaces to the outside of the system, including create transaction interface (tnx_begin()), end transaction interface (tnx_end()), write transaction interface (tnx_put()), and read transaction interface (tnx_get_for_update()) , commit transaction interface (tnx_commit()) and rollback transaction interface (tnx_rollback()), etc.
  • create transaction interface tnx_begin()
  • end transaction interface tnx_end()
  • write transaction interface tnx_put()
  • read transaction interface tnx_get_for_update()
  • commit transaction interface tnx_commit()
  • rollback transaction interface tnx_rollback()
  • the system can obtain a certain key or certain keys that need to be read (value) in the user's transaction (that is, the user adds a certain key or certain keys to be read in his transaction). some key operations).
  • the system can obtain a certain key or new values of certain keys that need to be written in the user's transaction (that is, the user adds a certain key or certain key to be written in his transaction). operations on the new values of some keys).
  • the system can write the new value of a certain key or certain keys into the storage instance.
  • the system can end the user's transaction. At this point, the user's transaction has been completed.
  • the transaction processing layer is responsible for processing the user's transaction logic.
  • the timestamp allocation module is responsible for maintaining the allocation of global timestamps. For example, this module can allocate an exclusive timestamp to a user's transaction, and use the timestamp of the transaction (that is, the timestamp associated with the transaction mentioned above) as the user's transaction. unique identifier.
  • the data partition module is responsible for maintaining the mapping relationship between keys and storage instance numbers.
  • This mapping relationship can be understood as a mathematical relationship.
  • This module can determine the storage instance number corresponding to each key based on this mathematical relationship, and then determine the storage where each key is located. Example.
  • the data caching module is responsible for maintaining some specific information involved in the user's transactions. Specifically, the data cache module can be divided into two parts, one part is the key cache (key cache), and the other part is the transaction cache (dtx cache).
  • the key cache stores key entries (key_entry), which are indexed by the key.
  • the content includes the transaction's foreground lock (rw_lock), the timestamp associated with the key (committable_ts and commit_ts, that is, the timestamp when the latest value of the key can be submitted and the timestamp when the latest value of the key has been submitted), the value of the key (committable_value).
  • the transaction cache stores transaction entries (dtx_entry), which are indexed by the transaction timestamp (ts).
  • the content includes the key-value pairs involved in the transaction, the status of the transaction, and the locking information of the key-value pairs.
  • Storage instances have the ability to persistently store key-value pairs and support the atomicity of writing multiple key-value pairs within the storage instance.
  • a transaction status table is set up in the storage instance, which can store the key-value pairs of the transaction.
  • the key is the timestamp of the transaction, and the value includes all the key-value pairs involved in the transaction and the status of the transaction.
  • the storage instance is also set up with a user data table, which can store key-value pairs containing users.
  • the key is the user's user_key, and its value is the user's user_value and the timestamp of the transaction corresponding to the value.
  • user_key and user_value can be constructed from the user's data.
  • the user can directly call the key-value storage system provided by the embodiment of the present application.
  • the key-value storage system provided by the embodiment of the present application can be applied in the file system, as shown in Figure 2 ( Figure 2 is a structural schematic diagram of the file system provided by the embodiment of the present application).
  • the file system includes a metadata management module and key-value storage systems. In this case, the user no longer needs to directly call the key-value storage system.
  • the user can directly send operations on files or directories to the file system (for example, add a directory, add a file to the directory, add a file to the directory, Delete a file in the directory, modify a file in the directory, obtain basic information of multiple files in the directory, obtain file attributes or content of a file in the directory, etc.), the metadata management module in the file system can
  • the user's operation (also called the user's request) is parsed into the user's transaction, and based on the transaction, each transaction interface in the key-value storage system is called to complete the user's transaction. It can be seen that in this case, the user indirectly calls the key-value storage system.
  • the user data table in the storage instance can include a dentry table, an inode table, etc.
  • the key-value pair stored in the dentry table is the index number (pionde) of the parent directory (parent folder) and the name of the file under the parent directory or the name of the subdirectory (name), and its value is the file under the parent directory.
  • the index number of or the index number (ionde) of a subdirectory is the index number of the file or the index number of the subdirectory (ionde), and the value is the attribute information of the file or the attribute information of the subdirectory.
  • parent directories, subdirectories and files mentioned above are all directories and files created by users, so the index numbers, names and attribute information of the parent directories, subdirectories and files can all be understood as directories and files. Metadata, data belonging to the user.
  • Figure 3 is a data management method based on the key-value storage system provided by the embodiment of the present application.
  • the process diagram is shown in Figure 3. The method includes:
  • a user can call the create transaction interface of the key-value storage system, so that the key-value storage system can create an exclusive transaction for the user and assign a unique timestamp to the user's transaction.
  • Step 1 The user calls tnx_begin(), and the key-value storage system can create a transaction for the user and assign a timestamp ts1 to the user's transaction.
  • the values of multiple keys are not stored in the cache, determine the storage instance where the multiple keys are located from multiple storage instances, and read the values of the multiple keys from the user data table of the storage instance where the multiple keys are located. Values, provide values for multiple keys to the user, and write values for multiple keys to the cache.
  • the user can also call the read transaction interface of the key-value storage system, so that the key-value storage system obtains the multiple keys that the user needs to read in his transaction. .
  • the key-value storage system can first add the user's transaction foreground locks to the multiple keys in the cache. Then, if the key-value storage system needs to process other User's transactions, and other users' transactions also need to read these multiple keys, the key-value storage system cannot complete the reading requirements of other users' transactions, and other users' transactions will be suspended (the timestamp of other users' transactions Usually greater than the timestamp of the user's transaction), until the foreground lock of the user's transaction is released for multiple keys, this can avoid conflicts between different transactions.
  • the key-value storage system can try to read the values of these keys from the cached entries for these keys. Then there are the following two situations:
  • the key-value storage system can successfully read the values of these multiple keys from the cache and return the values of these multiple keys to the user.
  • the key-value storage system cannot read the values of these multiple keys from the cache. Therefore, the key-value storage system can first store the values of these multiple keys in the cache. In the instance, determine one or several storage instances where these keys are located. Then, the key-value storage system reads the values of these multiple keys from the user data tables of these storage instances. The key-value storage system can then return the values of the multiple keys to the user and write the values of the multiple keys into the cached entries for the multiple keys.
  • the key-value storage system can determine the storage instance where multiple keys are located in the following manner: the key-value storage system sets a unique storage instance number for each of the multiple storage instances, and maintains the storage instance number. mathematical relationship between keys. Then, based on this mathematical relationship, the key-value storage system can determine the storage instance numbers corresponding to multiple keys, and based on the storage instance numbers corresponding to the multiple keys, determine the storage instance where the multiple keys are located.
  • the user can only call the read transaction interface once, allowing the key-value storage system to determine multiple keys that need to be read in the user's transaction at one time.
  • the key-value storage system can cache or store instances. Concurrently read the values of these multiple keys and return the values of these multiple keys to the user at the same time.
  • the user can also call the read transaction interface multiple times. One call can cause the key-value storage system to determine one of the multiple keys that needs to be read in the user's transaction.
  • the key-value storage system can Read the values of these multiple keys sequentially (that is, read them one by one in sequence) from the cache or storage instance, and return the values of these multiple keys to the user in sequence (that is, return them one by one in sequence).
  • step 2 The user calls tnx_get_for_update(key1) so that the key-value storage system can read the value of key1 and return it to the user.
  • This process includes:
  • dtx_entry uses the transaction timestamp ts1 as the index, records the lock information of key1 in dtx_entry, and adds it to the cache.
  • Step 3 The user calls tnx_get_for_update(key2) again, so that the system reads the value of key2 and returns it to the user.
  • tnx_get_for_update(key2) again, so that the system reads the value of key2 and returns it to the user.
  • Step 4 The user calls tnx_get_for_update(key3) again, so that the system reads the value of key3 and returns it to the user.
  • the user please refer to the above (2.1)-(2.8), which will not be described again here. In this way, the user successfully reads the value of key1, the value of key2, and the value of key3.
  • the new values of multiple keys are obtained by the user modifying the values of multiple keys.
  • the user After the user obtains the values of the multiple keys, he can modify the values of the multiple keys to obtain new values of the multiple keys. Then, the user can also call the write transaction interface of the key-value storage system, so that the key-value storage system obtains the new values of the multiple keys that the user needs to write in his transaction.
  • the key-value storage system can determine whether these multiple keys are stored in the same storage instance. There are two situations:
  • the key-value storage system can first add background locks of the user's transaction to the multiple keys in the cached entries. The key-value storage system can then write the new values for the keys into the storage instance's user data table and cached entries for the keys. The key-value storage system can then release (delete) the background lock on the user's transaction for multiple keys in the cached entries for the multiple keys. Subsequently, the key-value storage system can release the foreground lock of the user's transaction on multiple keys in the cached entries of the multiple keys, and then notify the user that new values for the multiple keys have been successfully written.
  • the purpose of the background lock for the user's transaction is that if the key-value storage system needs to process the transactions of other users in the future, and the transactions of other users also need to write the new values of these multiple keys (with the user)
  • the new values of the multiple keys that need to be written by the transaction are usually different values), and the key-value storage system is unable to complete the writing requirements of other users' transactions, causing the transactions of other users to be suspended until this transaction is completed.
  • Multiple keys release the background lock of the user's transaction, which can also avoid conflicts between different transactions.
  • the key-value storage system can first select the target key (also called the primary key) among the multiple keys. Then, the key-value storage system writes the new values of the multiple keys into the transaction status table of the storage instance where the target key is located and the cached entries for the multiple keys. Then, the key-value storage system can release the foreground lock on multiple keys in the cached entries of the multiple keys, and then notify the user that the new values of the multiple keys have been successfully written, until the foreground operation ends. The key-value storage system can then add transactional background locks on multiple keys in the cached entries for those keys.
  • the target key also called the primary key
  • the key-value storage system can concurrently write new values for multiple keys into the user data tables of these several storage instances.
  • the key-value storage system can then release the transaction's background lock on multiple keys in the cached entries for those keys.
  • the background operation ends.
  • the key-value storage system can first perform these operations in the foreground.
  • the new values of multiple keys are temporarily stored in the transaction status table of the storage instance where the primary key is located, and the user is notified in advance that the write has been successful, which can reduce the user's perceived latency.
  • the key-value storage system can perform background operations. If the server where the storage instance is located is powered off due to various reasons, resulting in the new values of these multiple keys not being successfully written to these storage instances, the server will be powered on again after the notification is completed. Finally, the key-value storage system can obtain the new values of these multiple keys again from the transaction status table of the storage instance where the primary key is located, and write them to several storage instances concurrently, thereby successfully completing the user's transaction.
  • the user has successfully read the values of the multiple keys and written the new values of the multiple keys. Therefore, the user can call the end transaction interface of the key-value storage system to cause the key-value storage system to end the user's transaction. affairs.
  • step 5 The user changes the value of key1 to value1, the value of key2 to value2, and the value of key3 to value3.
  • Step 6 The user calls tnx_put(key1, value1), and the system writes key1 and value1 to the temporary cache in the user's transaction, which is equivalent to the system determining that the new value value1 of key1 needs to be written.
  • Step 7 The user calls tnx_put(key2, value2) again, and the system determines that the new value value2 of key2 needs to be written.
  • Step 8 The user calls tnx_put(key3, value3) again, and the system determines that the new value value3 of key3 needs to be written.
  • Step 9 The user calls tnx_commit(), and the system can determine that the user's transaction needs to be submitted.
  • the system's submission process is as follows:
  • key1_entry In key1_entry, key2_entry and key3_entry respectively, the basic front desk lock rw_lock of key1, key2 and key3.
  • Key1, key2, and key3 are classified according to the storage instance, where key1 and key2 are located in the storage instance shard1, and key3 is located in the storage instance shard2.
  • Step 10 The user calls tnx_end(), and the system can end the user's transaction.
  • the key-value storage system can obtain multiple keys that need to be read in the transaction, and Key adds the foreground lock of the transaction. If the values of multiple keys are stored in the cache, the key-value storage system reads the values of multiple keys from the cache and provides the values of multiple keys to the user. If the values of multiple keys are not stored in the cache, then Determine the storage instance where multiple keys are located from multiple storage instances, read the values of multiple keys from the user data table of the storage instance where multiple keys are located, provide the values of multiple keys to the user, and write to the cache Enter values for multiple keys. The key-value storage system can then release the foreground lock on multiple keys and end the user's transaction.
  • the key-value storage system contains multiple storage instances, which can make full use of the CPU resources and memory resources of the server where the key-value storage system is located, and when the user needs to read the values of multiple keys, the key-value storage system In the storage instance where multiple keys are located, the values of the multiple keys can be read one by one, or the values of the multiple keys can be read concurrently to return them to the user. This can effectively take advantage of the concurrency of the key-value storage system. ability.
  • the key-value storage system can store the new values of the multiple keys. New values are written concurrently to these storage instances, which can further effectively leverage the concurrency capabilities of the key-value storage system, and ensure the transactional nature of key-value pair operations across storage instances, minimizing transaction overhead and improving key-value storage. System throughput.
  • FIG. 4 is a schematic diagram of an application example of the data management method based on the key-value storage system provided by the embodiment of the present application.
  • FIG. 5 is a schematic diagram of another application example of the data management method based on the key-value storage system provided by the embodiment of the present application.
  • this application example is a transaction process for a single storage instance. The process includes:
  • the key-value storage system determines K1, K2, and K3 that need to be written in the user's transaction through the transaction interface layer;
  • KV data (that is, the new values of K1, K2, and K3 that need to be written) to the cf_data table in storage instance 1.
  • this application example is a transaction process for multiple single storage instances.
  • the process includes:
  • the key-value storage system determines K1, K2, and K3 that need to be written in the user's transaction through the transaction interface layer;
  • K1, K2 and K3 are located in different storage instances. Select the leader from K1, K2 and K3. Assume that the leader is located in storage instance 1.
  • the system triggers the commit operation in the background, takes out K1, K2, and K3 from dtx_entry, and sorts them by storage instances.
  • the user data tables in the storage instance are the dentry table and the inode table.
  • the keys of the key-value pairs stored in these two tables either contain directories (including parent directories and child directories).
  • the index number of the directory or the index number of the file.
  • These index numbers are assigned to the file or directory by the file system when the file or directory is created.
  • the key-value storage system when the key-value storage system needs to determine the storage instance where multiple keys are located, for any one of these keys, the key-value storage system can divide the index number of the directory or file containing the key by the storage instance The remainder obtained from the total quantity is used as the storage instance number corresponding to the key. Then, after determining the storage instance number corresponding to the key, the storage instance where the key is located can be found based on the number.
  • Example 3 Suppose the user needs to create files /a.txt,/b/c.txt,/d/e.c,/f/g.h, where the inode of / is 1, the inode of /b/ is 5, and the inode of /d/ is 7, and the inode of /f/ is 11.
  • the create operations belonging to the same storage instance will be aggregated into one transaction. Therefore, the two operations of /a.txt and /b/c.txt will be aggregated into one transaction tx1, while /d/e.c,
  • the two operations of /f/g.h are aggregated into another transaction tx2. Both Tx1 and tx2 are transactions of a single storage instance.
  • Figure 6 is a schematic structural diagram of a data management device based on a key-value storage system provided by an embodiment of the present application.
  • the key-value storage system includes cache and multiple storage instances.
  • the user data table of each storage instance stores key-value pairs, such as As shown in Figure 6, the device includes:
  • the creation module 601 is used to create a user's transaction; for example, the creation module 601 can be used to implement step 301 in the embodiment shown in Figure 3 .
  • the first acquisition module 602 is used to acquire multiple keys that need to be read in the transaction, and add transaction foreground locks to the multiple keys; for example, the first acquisition module 602 can be used to implement the steps in the embodiment shown in Figure 3 302.
  • the first reading module 603 is used to read the values of multiple keys from the cache if the values of multiple keys are stored in the cache, and provide the values of the multiple keys to the user; for example, the first reading module 603 It can be used to implement step 304 in the embodiment shown in Figure 3.
  • the second reading module 604 is used to determine the storage instance where the multiple keys are located from the multiple storage instances if the values of the multiple keys are not stored in the cache, and obtain the user data table of the storage instance where the multiple keys are located. , read the values of multiple keys, provide the values of the multiple keys to the user, and write the values of the multiple keys to the cache; for example, the second reading module 604 can be used to implement the steps in the embodiment shown in Figure 3 305.
  • the processing module 605 is used to release the foreground lock on multiple keys
  • the end module 606 is used to end the transaction.
  • the end module 606 may be used to implement step 310 in the embodiment shown in FIG. 3 .
  • the key-value storage system can obtain multiple keys that need to be read in the transaction, and add transaction foreground locks to the multiple keys. If the values of multiple keys are stored in the cache, the key-value storage system reads the values of multiple keys from the cache and provides the values of multiple keys to the user. If the values of multiple keys are not stored in the cache, then Determine the storage instance where multiple keys are located from multiple storage instances, read the values of multiple keys from the user data table of the storage instance where multiple keys are located, provide the values of multiple keys to the user, and write to the cache Enter values for multiple keys. The key-value storage system can then release the foreground lock on multiple keys and end the user's transaction.
  • the key-value storage system contains multiple storage instances, which can make full use of the CPU resources and memory resources of the server where the key-value storage system is located, and when the user needs to read the values of multiple keys, the key-value storage system In the storage instance where multiple keys are located, the values of the multiple keys can be read one by one, or the values of the multiple keys can be read concurrently to return them to the user. This can effectively take advantage of the concurrency of the key-value storage system. ability.
  • the device further includes: a second acquisition module, used to acquire new values of multiple keys that need to be written in the transaction.
  • the new values of the multiple keys are obtained by the user. Obtained by modification;
  • the first writing module is used to add transaction background locks to multiple keys if multiple keys are stored in the same storage instance, and write the new values of multiple keys into the user data table of the storage instance.
  • the notification module is used to notify the user that new values for multiple keys have been successfully written.
  • each storage instance also includes a transaction status table
  • the device further includes: a second acquisition module, used to acquire new values of multiple keys that need to be written in the transaction, and The new value is obtained by the user modifying the values of multiple keys; the second writing module is used to select the target key among the multiple keys if multiple keys are stored in several storage instances, and write the new values of the multiple keys.
  • the value is written into the transaction status table and cache of the storage instance where the target key is located; the notification module is used to notify the user that new values for multiple keys have been successfully written; the third writing module is used to add the background of transactions to multiple keys Lock, write the new values of multiple keys to the user data tables of several storage instances concurrently, release the background lock of the transaction for multiple keys, and delete the values of multiple keys in the transaction status table of the storage instance where the target key is located. new value.
  • the second reading module is used to determine the storage instance numbers corresponding to the multiple keys, and determine the storage instance where the multiple keys are located based on the storage instance numbers corresponding to the multiple keys.
  • the key-value storage system is set in a file system.
  • each key contains metadata of a file or directory
  • the metadata contains an index number of the file or directory.
  • the second read Module used to divide the index number contained in multiple keys by the number of multiple storage instances, and use the remainder as the storage instance number corresponding to the multiple keys.
  • the operation of adding or releasing foreground locks on multiple keys is implemented in the cache, and the operation of adding or releasing background locks on multiple keys is implemented in the cache.
  • the cache further stores at least one of the following: a timestamp associated with a plurality of keys, a timestamp associated with a transaction, and a state of the transaction.
  • FIG. 7 is another schematic structural diagram of a data management device based on a key-value storage system provided by an embodiment of the present application.
  • an embodiment of a data management device based on a key-value storage system may include one or more central processors 701, memory 702, input and output interfaces 703, wired or wireless network interfaces 704, and power supplies 705.
  • Memory 702 may be ephemeral storage or persistent storage. Furthermore, the central processing unit 701 may be configured to communicate with the memory 702 and execute a series of instruction operations in the memory 702 on the data management device based on the key-value storage system.
  • the central processing unit 701 can execute the method steps in the embodiment shown in FIG. 3, which will not be described again here.
  • the specific functional module division in the central processor 701 may be similar to the division of each module in FIG. 6 , and will not be described again here.
  • Embodiments of the present application also relate to a computer storage medium.
  • the computer-readable storage medium stores a program for performing signal processing. When the program is run on a computer, it causes the computer to perform the steps in the embodiment shown in FIG. 3 .
  • Embodiments of the present application also relate to a computer program product that stores instructions that, when executed by a computer, cause the computer to perform the steps in the embodiment shown in FIG. 3 .
  • the disclosed systems, devices and methods can be implemented in other ways.
  • the device embodiments described above are only illustrative.
  • the division of the units is only a logical function division. In actual implementation, there may be other division methods.
  • multiple units or components may be combined or can be integrated into another system, or some features can be ignored, or not implemented.
  • the coupling or direct coupling or communication connection between each other shown or discussed may be through some interfaces, and the indirect coupling or communication connection of the devices or units may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place, or they may be distributed to multiple network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • each functional unit in various embodiments of the present application may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit.
  • the above integrated units can be implemented in the form of hardware or software functional units.
  • the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it may be stored in a computer-readable storage medium.
  • the technical solution of the present application essentially contributes to the existing technology. Or all or part of the technical solution can be embodied in the form of a software product.
  • the computer software product is stored in a storage medium and includes a number of instructions to enable a computer device (which can be a personal computer, a server, or a network device). etc.) perform all or part of the steps of the methods described in various embodiments of this application.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM), random access memory (RAM), magnetic disk or optical disk and other media that can store program code. .

Abstract

本申请公开一种基于键值存储系统的数据管理方法及基于文件系统的元数据管理方法,可充分利用键值存储系统所在的服务器的CPU资源以及内存资源,且可有效地发挥键值存储系统的并发能力。本申请的键值存储系统包含缓存以及多个存储实例,每个存储实例的用户数据表存储有键值对,该方法包括:键值存储系统可获取用户的事务中所需读取的多个键,并从多个键所在的存储实例读取多个键的值。系统还可获取事务中所需写入的多个键的新值,若这多个键位于不同的存储实例,系统可先完成前台操作,以提前通知用户多个键的新值已写入成功,然后在后台操作中,向多个键所在的若干个存储实例并发写入多个键的新值,从而完成用户的事务。

Description

一种基于键值存储系统的数据管理方法及其相关设备
本申请要求于2022年7月25日提交中国专利局、申请号为202210877102.6、发明名称为“文件系统元数据管理系统”的中国专利申请的优先权,以及于2022年10月28日提交中国专利局、申请号为202211336267.9、发明名称为“一种基于键值存储系统的数据管理方法及其相关设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请实施例涉及计算机技术领域,尤其涉及一种基于键值存储系统的数据管理方法及其相关设备。
背景技术
键值(key-value,KV)存储系统凭借着其具备高性能高扩展、可伸缩性强以及接口简单易用等等优点,被广泛应用在计算机中,键值存储系统可通过键值对的形式来统筹管理用户数据,例如,用户的文件的数据,用户的文件的元数据等等。
目前,当用户存在添加、读取、写入和删除键值对的需求时,可调用键值存储系统来实现这些针对键值对的操作。具体地,键值存储系统在用户的调用下,可创建用户的事务,该事务可包含用户对多个键值对的操作。由于该系统包含存储有键值对的数据库实例,故该系统可在该实例中,完成对这多个键值对的操作,从而完成该事务。
然而,该系统仅提供了单个实例,无法充分利用该系统所在服务器的中央处理器(entral processing unit,CPU)资源以及内存资源,且限制了该系统的并发能力。
发明内容
本申请实施例提供了一种基于键值存储系统的数据管理方法及其相关设备,可充分利用键值存储系统所在的服务器的CPU资源以及内存资源,且可有效地发挥键值存储系统的并发能力。
本申请实施例的第一方面提供了一种基于键值存储系统的数据管理方法,该键值存储系统包含缓存以及多个存储实例,每个存储实例的用户数据表存储有键值对,该方法包括:
某个用户可调用键值存储系统,以使得键值存储系统可为该用户创建专属的事务,并为该用户的事务分配唯一的时间戳。键值存储系统为该用户创建该用户的事务后,该用户还可调用键值存储系统,以使得键值存储系统获取该用户在其事务中所需读取的多个键。
确定这多个键后,键值存储系统可先在缓存中,对这多个键添加该用户的事务的前台锁。对这多个键添加前台锁后,键值存储系统可检测缓存是否存储有这多个键的值,则存在以下两种情况:
若缓存存储有这多个键的值,键值存储系统可成功从缓存中读取到这多个键的值,并将这多个键的值返回给用户。
若缓存未存储有这多个键的值,键值存储系统则无法从缓存中读取到这多个键的值,故键值存储系统可先在多个存储实例中,确定这多个键所在的一个或若干个存储实例。接着,键值存储系统从这些存储实例的用户数据表中,读取这多个键的值。然后,键值存储系统可将这多个键的值返回给用户,并向缓存写入这多个键的值。
向用户返回这多个键的值后,键值存储系统可对这多个键解除用户的事务的前台锁。至此,该用户已成功读取到这多个键的值,故该用户可调用键值存储系统,以使得键值存储系统结束该用户的事务。
从上述方法可以看出:键值存储系统在创建用户的事务后,可获取事务中所需读取的多个键,并对多个键添加事务的前台锁。若缓存中存储有多个键的值,键值存储系统则从缓存中读取多个键的值,并向用户提供多个键的值,若缓存中未存储有多个键的值,则从多个存储实例中确定多个键所在的存储实例,从多个键所在的存储实例的用户数据表中,读取多个键的值,向用户提供多个键的值,并向缓存写入多个键的值。然后,键值存储系统可对多个键解除前台锁,并结束用户的事务。基于前述过程可知,键值存储系统包含多个存储实例,可充分利用键值存储系统所在的服务器的CPU资源以及内存资源,且当用户需要读取多个键的值时,键值存储系统在多个键所在的存储实例中,既可逐个读取这多个键的值, 也可以并发读取这多个键的值,以返回给用户使用,这样可以有效地发挥键值存储系统的并发能力。
在一种可能实现的方式中,对多个键解除前台锁之前,该方法还包括:获取事务中所需写入的多个键的新值,多个键的新值由用户对多个键的值进行修改得到;若多个键存储于同一个存储实例中,则对多个键添加事务的后台锁,将多个键的新值写入存储实例的用户数据表以及缓存,对多个键解除事务的后台锁;对多个键解除前台锁之后,该方法还包括:通知用户已成功写入多个键的新值。前述实现方式中,用户得到这多个键的值后,可对这多个键的值进行修改,从而得到这多个键的新值。那么,该用户还可调用键值存储系统,以使得键值存储系统获取该用户在其事务中所需写入的这多个键的新值。若这多个键存储于同一个存储实例中,键值存储系统可对多个键添加该用户的事务的后台锁。接着,键值存储系统可将多个键的新值写入该存储实例的用户数据表以及缓存中。然后,键值存储系统可对多个键解除该用户的事务的后台锁。随后,键值存储系统可对多个键解除该用户的事务的前台锁,再通知该用户已成功写入多个键的新值。至此,该用户已成功读取到这多个键的值并写入这多个键的新值,故该用户可调用键值存储系统,以使得键值存储系统结束该用户的事务。
在一种可能实现的方式中,每个存储实例还包含事务状态表,对多个键解除前台锁之前,该方法还包括:获取事务中所需写入的多个键的新值,多个键的新值由用户对多个键的值进行修改得到;若多个键存储于若干个存储实例中,则在多个键中选择目标键,并将多个键的新值写入目标键所在的存储实例的事务状态表以及缓存;对多个键解除前台锁之后,该方法还包括:通知用户已成功写入多个键的新值;对多个键添加事务的后台锁,将多个键的新值并发写入若干个存储实例的用户数据表,对多个键解除事务的后台锁,并在目标键所在的存储实例的事务状态表中,删除多个键的新值。前述实现方式中,用户得到这多个键的值后,可对这多个键的值进行修改,从而得到这多个键的新值。那么,该用户还可调用键值存储系统,以使得键值存储系统获取该用户在其事务中所需写入的这多个键的新值。若这多个键存储于若干个存储实例中,键值存储系统可先在多个键中选择目标键。接着,键值存储系统将这多个键的新值写入目标键所在的存储实例的事务状态表以及缓存中。然后,键值存储系统可对多个键解除前台锁,再通知该用户已成功写入多个键的新值,至此前台操作结束。随后,键值存储系统可对多个键添加事务的后台锁。紧接着,键值存储系统可将多个键的新值并发写入这若干个存储实例的用户数据表中。之后,键值存储系统可对多个键解除事务的后台锁。最后,在目标键所在的存储实例的事务状态表中,删除多个键的新值,至此,后台操作结束。至此,该用户已成功读取到这多个键的值并写入这多个键的新值,故该用户可调用键值存储系统,以使得键值存储系统结束该用户的事务。
在一种可能实现的方式中,从多个存储实例中确定多个键所在的存储实例包括:确定多个键对应的存储实例编号,并基于多个键对应的存储实例编号,确定多个键所在的存储实例。前述实现方式中,键值存储系统为多个存储实例中的每个存储实例设置有唯一的存储实例编号,且维护有存储实例编号与键之间的数学关系。那么,基于该数学关系,键值存储系统可确定多个键对应的存储实例编号,并基于多个键对应的存储实例编号,确定多个键所在的存储实例。
在一种可能实现的方式中,键值存储系统设置于文件系统中,在多个键中,每个键包含文件或目录的元数据,元数据包含文件或目录的索引编号,确定多个键对应的存储实例编号包括:将多个键包含的索引编号除以多个存储实例的数量所得到的余数,作为多个键对应的存储实例编号。前述实现方式中,当键值存储系统需要确定多个键所在的存储实例时,对于这多个键中的任意一个键,键值存储系统可将该键包含目录或文件的索引编号除以存储实例的总数量所得到的余数,作为该键对应的存储实例编号。那么,在确定该键对应的存储实例编号后,可基于该编号来找到该键所在的存储实例。
在一种可能实现的方式中,对多个键添加或解除前台锁的操作在缓存中实现,且对多个键添加或解除后台锁的操作在缓存中实现。
在一种可能实现的方式中,缓存还存储有以下至少一项:与多个键相关联的时间戳,与事务相关联的时间戳以及事务的状态。
本申请实施例的第二方面提供了一种基于键值存储系统的数据管理装置,键值存储系统包含缓存以及多个存储实例,每个存储实例的用户数据表存储有键值对,该装置包括:创建模块,用于创建用户的事务;第一获取模块,用于获取事务中所需读取的多个键,并对多个键添加事务的前台锁;第一读取模 块,用于若缓存中存储有多个键的值,则从缓存中读取多个键的值,并向用户提供多个键的值;第二读取模块,用于若缓存中未存储有多个键的值,则从多个存储实例中确定多个键所在的存储实例,并从多个键所在的存储实例的用户数据表中,读取多个键的值,向用户提供多个键的值,并向缓存写入多个键的值;处理模块,用于对多个键解除前台锁;结束模块,用于结束事务。
在一种可能实现的方式中,该装置还包括:第二获取模块,用于获取事务中所需写入的多个键的新值,多个键的新值由用户对多个键的值进行修改得到;第一写入模块,用于若多个键存储于同一个存储实例中,则对多个键添加事务的后台锁,将多个键的新值写入存储实例的用户数据表以及缓存,对多个键解除事务的后台锁;通知模块,用于通知用户已成功写入多个键的新值。
在一种可能实现的方式中,每个存储实例还包含事务状态表,该装置还包括:第二获取模块,用于获取事务中所需写入的多个键的新值,多个键的新值由用户对多个键的值进行修改得到;第二写入模块,用于若多个键存储于若干个存储实例中,则在多个键中选择目标键,将多个键的新值写入目标键所在的存储实例的事务状态表以及缓存;通知模块,用于通知用户已成功写入多个键的新值;第三写入模块,用于对多个键添加事务的后台锁,将多个键的新值并发写入若干个存储实例的用户数据表,对多个键解除事务的后台锁,并在目标键所在的存储实例的事务状态表中,删除多个键的新值。
在一种可能实现的方式中,第二读取模块,用于确定多个键对应的存储实例编号,并基于多个键对应的存储实例编号,确定多个键所在的存储实例。
在一种可能实现的方式中,键值存储系统设置于文件系统中,在多个键中,每个键包含文件或目录的元数据,元数据包含文件或目录的索引编号,第二读取模块,用于将多个键包含的索引编号除以多个存储实例的数量所得到的余数,作为多个键对应的存储实例编号。
在一种可能实现的方式中,对多个键添加或解除前台锁的操作在缓存中实现,且对多个键添加或解除后台锁的操作在缓存中实现。
在一种可能实现的方式中,缓存还存储有以下至少一项:与多个键相关联的时间戳,与事务相关联的时间戳以及事务的状态。
本申请实施例的第三方面提供了一种基于键值存储系统的数据管理装置,该装置包括存储器和处理器;存储器存储有代码,处理器被配置为执行所述代码,当代码被执行时,该装置执行如第一方面或第一方面中任意一种可能实现的方式所述的方法。
本申请实施例的第四方面提供了一种计算机存储介质,计算机存储介质存储有一个或多个指令,指令在由一个或多个计算机执行时使得一个或多个计算机实施如第一方面或第一方面中任意一种可能实现的方式所述的方法。
本申请实施例的第五方面提供了一种计算机程序产品,计算机程序产品存储有指令,指令在由计算机执行时,使得计算机实施如第一方面或第一方面中任意一种可能实现的方式所述的方法。
本申请实施例中,键值存储系统在创建用户的事务后,可获取事务中所需读取的多个键,并对多个键添加事务的前台锁。若缓存中存储有多个键的值,键值存储系统则从缓存中读取多个键的值,并向用户提供多个键的值,若缓存中未存储有多个键的值,则从多个存储实例中确定多个键所在的存储实例,从多个键所在的存储实例的用户数据表中,读取多个键的值,向用户提供多个键的值,并向缓存写入多个键的值。然后,键值存储系统可对多个键解除前台锁,并结束用户的事务。基于前述过程可知,键值存储系统包含多个存储实例,可充分利用键值存储系统所在的服务器的CPU资源以及内存资源,且当用户需要读取多个键的值时,键值存储系统在多个键所在的存储实例中,既可逐个读取这多个键的值,也可以并发读取这多个键的值,以返回给用户使用,这样可以有效地发挥键值存储系统的并发能力。
附图说明
图1为本申请实施例提供的键值存储系统的一个结构示意图;
图2为本申请实施例提供的文件系统的一个结构示意图;
图3为本申请实施例提供的基于键值存储系统的数据管理方法的一个流程示意图;
图4为本申请实施例提供的基于键值存储系统的数据管理方法的一个应用例示意图;
图5为本申请实施例提供的基于键值存储系统的数据管理方法的另一应用例示意图;
图6为本申请实施例提供的基于键值存储系统的数据管理装置的一个结构示意图;
图7为本申请实施例提供的基于键值存储系统的数据管理装置的另一结构示意图。
具体实施方式
本申请实施例提供了一种基于键值存储系统的数据管理方法及其相关设备,可充分利用键值存储系统所在的服务器的CPU资源以及内存资源,且可有效地发挥键值存储系统的并发能力。
本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的术语在适当情况下可以互换,这仅仅是描述本申请的实施例中对相同属性的对象在描述时所采用的区分方式。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,以便包含一系列单元的过程、方法、系统、产品或设备不必限于那些单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它单元。
键值存储系统凭借着其具备高性能高扩展、可伸缩性强以及接口简单易用等等优点,被广泛应用在计算机中,键值存储系统可通过键值对的形式来统筹管理用户数据,例如,用户的文件的数据,用户的文件的元数据等等。
目前,当用户存在添加、读取、写入和删除键值对的需求时,可调用键值存储系统来实现这些针对键值对的操作。具体地,键值存储系统在用户的调用下,可创建用户的事务,该事务可包含用户对多个键值对的操作。由于该系统包含存储有键值对的数据库实例,故该系统可在该实例中,完成对这多个键值对的操作(例如,添加、读取、写入和删除中的一种或多种),从而完成该事务。
然而,该系统仅提供了单个实例,以基于单个实例来管理键值对,也就是管理用户数据。但是,单个实例只能管理一个盘,无法充分利用该系统所在服务器的CPU资源以及内存资源,导致数据管理的整体性能受限。而且,在访问实例中的键值对时,也只能发挥单个实例的性能,无法将该系统最大限度地并发起来。
为了解决上述问题,本申请实施例提供了一种新的键值存储系统,该键值存储系统也可以称为支持单机内跨多存储实例的事务键值存储系统(miniature elasticDB,MEDB)。图1为本申请实施例提供的键值存储系统的一个结构示意图,如图1所示,该系统包括:事务接口层、事务处理层、时间戳分配模块、数据分区模块、数据缓存模块(即前述的缓存)以及多个存储实例。其中,在这多个存储实例中,一个存储实例包含一个事务状态表以及多个用户数据表。下文对该系统所涉及的多个概念进行介绍:
事务接口层,可对该系统的外部提供事务接口,包含创建事务接口(tnx_begin())、结束事务接口(tnx_end()),写事务接口(tnx_put()),读事务接口(tnx_get_for_update()),提交事务接口(tnx_commit())以及回滚事务接口(tnx_rollback())等等。这些事务接口可以被用户调用,以使得该系统基于这些事务接口来完成用户的事务。例如,当用户调用创建事务接口时,该系统可为用户创建用户的事务。又如,当用户调用读事务接口时,该系统可获取用户的事务中所需读取(值)的某个键或某些键(即用户在其事务中增加了读取某个键或某些键的操作)。又如,当用户调用写事务接口时,该系统可获取用户的事务中所需写入的某个键或某些键的新值(即用户在其事务中增加了写入某个键或某些键的新值的操作)。又如,当用户调用提交事务接口,该系统可将某个键或某些键的新值,写入到存储实例中。又如,当用户调用结束事务接口时,该系统可以结束用户的事务,至此,用户的事务已完成。
事务处理层,负责处理用户的事务逻辑。
时间戳分配模块,负责维护全局时间戳的分配,例如,该模块可为用户的事务分配一个专属的时间戳,以事务的时间戳(即前述与事务相关联的时间戳)来作为用户的事务的唯一标识。
数据分区模块,负责维护键与存储实例编号之间的映射关系,该映射关系可以理解为数学关系,该模块可基于该数学关系来确定各个键对应的存储实例编号,进而确定各个键所在的存储实例。
数据缓存模块,负责维护用户的事务所涉及的一些具体信息。具体地,数据缓存模块可划分为两部分,一部分为键缓存(key cache),另一部分为事务缓存(dtx cache)。键缓存存储有键的条目(key_entry),键的条目以键为索引,内容包括事务的前台锁(rw_lock),与键相关联的时间戳(committable_ts以及 commit_ts,即键的最新值可提交的时间戳以及键的最新值已提交的时间戳),键的值(committable_value)。事务缓存存储有事务的条目(dtx_entry),事务的条目以事务的时间戳(ts)为索引,内容包含事务所涉及的键值对,事务的状态以及键值对的加锁信息。
存储实例具有持久化存储键值对的能力,支持实力内多个键值对写入的原子性。存储实例中设置有事务状态表,该表可存储有事务的键值对,其键为事务的时间戳,其值包含事务所涉及的所有键值对以及事务的状态。存储实例中还设置有用户数据表,该表可存储包含用户的键值对,其键为用户的user_key,其值为用户的user_value以及写入该值对应的事务的时间戳,其中,user_key以及user_value均可以由用户的数据来构建。
值得注意的是,在上述介绍中,用户可直接调用本申请实施例提供的键值存储系统。进一步地,本申请实施例提供的键值存储系统可应用在文件系统中,如图2所示(图2为本申请实施例提供的文件系统的一个结构示意图),文件系统包含元数据管理模块以及键值存储系统。在这种情况下,不再需要由用户来直接调用键值存储系统,用户可直接向文件系统发送针对文件或目录的操作(例如,添加某个目录、向目录中添加某个文件、在目录中删除某个文件、修改该目录中某个文件、获取该目录多个文件的基本信息、获取该目录中某个文件的文件属性或内容等等),文件系统中的元数据管理模块可将用户的操作(也可以称为用户的请求)解析为用户的事务,并基于该事务来调用键值存储系统中的各个事务接口,从而完成用户的事务。可见,在这种情况下,用户是间接地调用了键值存储系统。
当键值存储系统应用在文件系统中时,存储实例中的用户数据表可包含dentry表以及inode表等等。其中,dentry表所存储的键值对,其键为父目录(父文件夹)的索引编号(pionde)以及父目录下文件的名称或子目录的名称(name),其值为父目录下文件的索引编号或子目录的索引编号(ionde)。inode表所存储的键值对,其键为文件的索引编号或子目录的索引编号(ionde),其值为文件的属性信息或子目录的属性信息。
需要说明的是,上面所提及的父目录、子目录以及文件,均为用户创建的目录以及文件,故父目录、子目录以及文件的索引编号、名称和属性信息均可理解为目录和文件的元数据,属于用户的数据。
为了进一步了解本申请实施例提供的键值存储系统的工作流程,下文结合图3对该工作流程做进一步的介绍,图3为本申请实施例提供的基于键值存储系统的数据管理方法的一个流程示意图,如图3所示,该方法包括:
301、创建用户的事务。
本实施例中,某个用户可调用键值存储系统的创建事务接口,以使得键值存储系统可为该用户创建专属的事务,并为该用户的事务分配唯一的时间戳。
例如,以key1,key2,key3的读改写操作为例,用户需要先读key1,key2,key3,再写入<key1,value1>,<key2,value2>,<key3,value3>,该事务包括:步骤1:用户调用tnx_begin(),键值存储系统可为用户创建事务,并为用户的事务分配时间戳ts1。
302、获取事务中所需读取的多个键,并对多个键添加事务的前台锁。
303、检测缓存中是否存储有多个键的值。
304、若缓存中存储有多个键的值,则从缓存中读取多个键的值,并向用户提供多个键的值。
305、若缓存中未存储有多个键的值,则从多个存储实例中确定多个键所在的存储实例,从多个键所在的存储实例的用户数据表中,读取多个键的值,向用户提供多个键的值,并向缓存写入多个键的值。
键值存储系统为该用户创建该用户的事务后,该用户还可调用键值存储系统的读取事务接口,以使得键值存储系统获取该用户在其事务中所需读取的多个键。
确定这多个键后,键值存储系统可先在缓存的这多个键的条目中,对这多个键添加该用户的事务的前台锁,那么,若键值存储系统此后还需处理其余用户的事务,且其余用户的事务也需读取这多个键,键值存储系统则无法完成其余用户的事务的读取需求,其余用户的事务将被暂缓处理(其余用户的事务的时间戳通常大于该用户的事务的时间戳),直至对这多个键解除该用户的事务的前台锁,这样可以避免不同事务之间的冲突。
对这多个键添加前台锁后,键值存储系统可尝试从缓存的这多个键的条目中,读取这多个键的值, 则存在以下两种情况:
若缓存的这多个键的条目中存储有这多个键的值,键值存储系统可成功从缓存中读取到这多个键的值,并将这多个键的值返回给用户。
若缓存的这多个键的条目中未存储有这多个键的值,键值存储系统则无法从缓存中读取到这多个键的值,故键值存储系统可先在多个存储实例中,确定这多个键所在的一个或若干个存储实例。接着,键值存储系统从这些存储实例的用户数据表中,读取这多个键的值。然后,键值存储系统可将这多个键的值返回给用户,并向缓存的这多个键的条目中,写入这多个键的值。
具体地,键值存储系统可通过以下方式来确定多个键所在的存储实例:键值存储系统为多个存储实例中的每个存储实例设置有唯一的存储实例编号,且维护有存储实例编号与键之间的数学关系。那么,基于该数学关系,键值存储系统可确定多个键对应的存储实例编号,并基于多个键对应的存储实例编号,确定多个键所在的存储实例。
需要说明的是,该用户可以只调用一次读取事务接口,就使得键值存储系统一次性确定该用户的事务中所需读取的多个键,这样键值存储系统可以在缓存或存储实例中并发读取这多个键的值,并将这多个键的值同时返回给该用户。该用户也可以多次调用读取事务接口,一次调用可使得键值存储系统确定该用户的事务中所需读取的多个键中的一个键,在这种情况下,键值存储系统可以在缓存或存储实例中依次读取(即有先后顺序地逐个读取)这多个键的值,并依次将这多个键的值返回(即有先后顺序地逐个返回)给用户。
依旧如上述例子,步骤2:用户调用tnx_get_for_update(key1),以使得键值存储系统可读取key1的value,并返回给用户,该过程包括:
(2.1)从缓存的key cache中获取key1的条目,即key1_entry。
(2.2)在key1_entry中,对key1添加前台锁rw_lock。
(2.3)生成用户的事务的条目,即dtx_entry。dtx_entry以事务的时间戳ts1为索引,在dtx_entry记录key1的加锁信息,并加入到缓存。
(2.4)从key1_entry中读取committable_value,若committable_value为非空,则读取该结果(即key1的value)返回给用户,若committable_value为空,则执行(2.5)-(2.8)。
(2.5)通过数据分区模块计算出key1所对应的存储实例编号shard1。
(2.6)在存储实例shard1的用户数据表中,读取key1的value以及key1的value被写入的时间ts。
(2.7)将读取到的key1的value+ts解析,令key1_entry中的committable_value为key1的value,并令committable_ts为ts。
(2.8)返回读取结果key1的value。
步骤3:用户再次调用tnx_get_for_update(key2),以使得系统读取key2的value并返回给用户,该过程可参考前述(2.1)-(2.8),此处不再赘述。
步骤4:用户又一次调用tnx_get_for_update(key3),以使得系统读取key3的value并返回给用户,该过程可参考前述(2.1)-(2.8),此处不再赘述。如此一来,用户则成功读取到key1的value、key2的value、key3的value。
306、获取事务中所需写入的多个键的新值,多个键的新值由用户对多个键的值进行修改得到。
307、检测多个键是否存储于同一个存储实例中。
308、若多个键存储于同一个存储实例中,则对多个键添加事务的后台锁,将多个键的新值写入存储实例的用户数据表以及缓存,对多个键解除事务的后台锁,对多个键解除前台锁,并通知用户已成功写入多个键的新值。
309、若多个键存储于若干个存储实例中,则在多个键中选择目标键,将多个键的新值写入目标键所在的存储实例的事务状态表以及缓存,对多个键解除前台锁,通知用户已成功写入多个键的新值,对多个键添加事务的后台锁,将多个键的新值并发写入若干个存储实例的用户数据表,对多个键解除事务的后台锁,并在目标键所在的存储实例的事务状态表中,删除多个键的新值。
310、结束用户的事务。
用户得到这多个键的值后,可对这多个键的值进行修改,从而得到这多个键的新值。那么,该用户还可调用键值存储系统的写事务接口,以使得键值存储系统获取该用户在其事务中所需写入的这多个键的新值。
确定这多个键的新值后,键值存储系统可判断这多个键是否存储在同一个存储实例中,则存在以下两种情况:
(1)若这多个键存储于同一个存储实例中,键值存储系统可先在缓存的这多个键的条目中,对多个键添加该用户的事务的后台锁。接着,键值存储系统可将多个键的新值写入该存储实例的用户数据表以及缓存的这多个键的条目中。然后,键值存储系统可在缓存的这多个键的条目中,对多个键解除(删除)该用户的事务的后台锁。随后,键值存储系统可在缓存的这多个键的条目中,对多个键解除该用户的事务的前台锁,再通知该用户已成功写入多个键的新值。
需要说明的是,该用户的事务的后台锁的作用在于,若键值存储系统此后还需处理其余用户的事务,且其余用户的事务也需写入这多个键的新值(与该用户的事务所需写入的这多个键的新值通常是不同的值),键值存储系统则无法完成其余用户的事务的写入需求,导致其余用户的事务将被暂缓处理,直至对这多个键解除该用户的事务的后台锁,这样也可以避免不同事务之间的冲突。
(2)若这多个键存储于若干个存储实例(即不同的存储实例)中,键值存储系统可先在多个键中选择目标键(也可以称为主键)。接着,键值存储系统将这多个键的新值写入目标键所在的存储实例的事务状态表以及缓存的这多个键的条目中。然后,键值存储系统可在缓存的这多个键的条目中,对多个键解除前台锁,再通知该用户已成功写入多个键的新值,至此前台操作结束。随后,键值存储系统可在缓存的这多个键的条目中,对多个键添加事务的后台锁。紧接着,键值存储系统可将多个键的新值并发写入这若干个存储实例的用户数据表中。之后,键值存储系统可在缓存的这多个键的条目中,对多个键解除事务的后台锁。最后,在目标键所在的存储实例的事务状态表中,删除多个键的新值,至此,后台操作结束。
需要说明的是,由于这多个键分别存储在若干个存储实例中,并发写入这多个键的新值所需的时长较大,故键值存储系统可先在前台操作中,将这多个键的新值暂时存储在主键所在的存储实例的事务状态表中,并提前通知该用户已写入成功,可以减少该用户的感知时延。通知完成后,键值存储系统可执行后台操作,若存储实例所在的服务器由于各种原因出现断电,导致这多个键的新值没成功写入这若干个存储实例中,在服务器恢复通电后,键值存储系统可从主键所在的存储实例的事务状态表中,再次拿到这多个键的新值,并将其并发写入若干个存储实例中,从而顺利完成该用户的事务。
至此,该用户已成功读取到这多个键的值并写入这多个键的新值,故该用户可调用键值存储系统的结束事务接口,以使得键值存储系统结束该用户的事务。
依旧如上述例子,步骤5:用户将key1的value修改为value1,将key2的value修改为value2,将key3的value修改为value3。
步骤6:用户调用tnx_put(key1,value1),系统将key1,value1写入到用户的事务内的临时缓存,也就相当于系统确定需要写入key1的新值value1。
步骤7:用户再次调用tnx_put(key2,value2),系统确定需要写入key2的新值value2。
步骤8:用户又一次调用tnx_put(key3,value3),系统确定需要写入key3的新值value3。
步骤9:用户调用tnx_commit(),系统可确定需要提交用户的事务,系统的提交过程如下:
(9.1)基于用户的事务的时间戳ts1,从dtx cache中获取dtx_entry。
(9.2)从dtx_entry中获取key1,key2,key3的加锁信息,由于前面读取的时候已经加过前台锁,故无需再对key1,key2,key3添加前台锁。
(9.3)从key1,key2,key3中随机选取一个作为primary_key。
(9.4)通过数据分区模块计算出primary_key对应的存储实例编号primary_shard。
(9.5)将<key1,value1>,<key2,value2>,<key3,value3>内容写入到dtx_entry中,并在dtx_entry中,将用户的事务的状态改为“active”。
(9.6)若key1,key2,key3属于同一个存储实例,假设为存储实例shard1,则执行以下步骤:
(9.6.1)分别在key1_entry、key2_entry以及key3_entry中,对key1,key2,key3加后台锁commit_lock。
(9.6.2)写入<key1,value1>,<key2,value2>,<key3,value3>到存储实例shard1的用户数据表。
(9.6.3)令key1_entry中的committable_value为value1,并令committable_ts为ts1。同样地,令key2_entry中的committable_value为value2,并令committable_ts为ts1。同样地,令key3_entry中的committable_value为value3,并令committable_ts为ts1。
(9.6.4)分别在key1_entry、key2_entry以及key3_entry中,解除(删除)key1,key2,key3的后台锁commit_lock。
(9.6.5)从dtx cache中,删除dtx_entry。
(9.6.6)分别在key1_entry、key2_entry以及key3_entry中,基础key1,key2,key3的前台锁rw_lock。
(9.6.7)向用户返回执行结果,即通知用户<key1,value1>,<key2,value2>,<key3,value3>已写入成功。
(9.7)若key1,key2,key3位于不同的存储实例,则执行以下步骤:
(9.7.1)将dtx_entry中的<key1,value1>,<key2,value2>,<key3,value3>写入到存储实例primary_shard的事务状态表中。
(9.7.2)在dtx_entry中,将用户的事务的状态修改为“committable”。
(9.7.3)令key1_entry中的committable_value为value1,并令committable_ts为ts1。同样地,令key2_entry中的committable_value为value2,并令committable_ts为ts1。同样地,令key3_entry中的committable_value为value3,并令committable_ts为ts1。
(9.7.4)分别在key1_entry、key2_entry以及key3_entry中,基础key1,key2,key3的前台锁rw_lock。
(9.7.5)返回执行结果,并触发后台提交流程。
(9.8)执行后台提交流程:
(9.8.1)从dtx cache中获取dtx_entry。
(9.8.2)对key1,key2,key3按照存储实例分类,其中key1、key2位于存储实例shard1,key3位于存储实例shard2。
(9.8.3)对存储实例shard1内的key1,key2和存储实例shard2内的key3并发执行以下步骤:
(9.8.3.1)分别在key1_entry、key2_entry以及key3_entry中,对key1,key2,key3加后台锁commit_lock。
(9.8.3.2)若key1_entry中的commit_ts大于ts1(说明有其余事务修改了key1的值),则跳过改key1,若小于,则写入<key1,value1>到存储实例shard1的用户数据表中。同样地,若key2_entry中的commit_ts大于ts1,则跳过改key2,若小于,则写入<key2,value2>到存储实例shard1的用户数据表中。同样地,若key3_entry中的commit_ts大于ts1,则跳过改key3,若小于,则写入<key3,value3>到存储实例shard2的用户数据表中。
(9.8.3.3)分别在key1_entry、key2_entry以及key3_entry中,令commit_ts为ts1。
(9.8.3.4)分别在key1_entry、key2_entry以及key3_entry中,解除key1,key2,key3的后台锁commit_lock。
(9.8.4)针对存储实例Shard1和存储实例shard2都执行完成之后,删除存储实例primary_shard的事务状态表中的记录。
(9.8.5)从dtx cache中,删除dtx_entry。
步骤10:用户调用tnx_end(),系统可结束用户的事务。
本申请实施例中,键值存储系统在创建用户的事务后,可获取事务中所需读取的多个键,并对多个 键添加事务的前台锁。若缓存中存储有多个键的值,键值存储系统则从缓存中读取多个键的值,并向用户提供多个键的值,若缓存中未存储有多个键的值,则从多个存储实例中确定多个键所在的存储实例,从多个键所在的存储实例的用户数据表中,读取多个键的值,向用户提供多个键的值,并向缓存写入多个键的值。然后,键值存储系统可对多个键解除前台锁,并结束用户的事务。基于前述过程可知,键值存储系统包含多个存储实例,可充分利用键值存储系统所在的服务器的CPU资源以及内存资源,且当用户需要读取多个键的值时,键值存储系统在多个键所在的存储实例中,既可逐个读取这多个键的值,也可以并发读取这多个键的值,以返回给用户使用,这样可以有效地发挥键值存储系统的并发能力。
进一步地,当用户需要向键值存储系统写入这多个键的新值时,无论这多个键位于同一个存储实例还是不同的存储实例,键值存储系统均可将这多个键的新值并发写入这些存储实例中,可进一步有效地发挥键值存储系统的并发能力,而且可保证跨存储实例进行键值对操作的事务性,使事务的开销极小化,提升键值存储系统的吞吐率。
更进一步地,为了理解本申请实施例提供的键值存储系统的工作流程,下文结合两个具体应用例进行介绍。图4为本申请实施例提供的基于键值存储系统的数据管理方法的一个应用例示意图,图5为本申请实施例提供的基于键值存储系统的数据管理方法的另一应用例示意图。如图4所示,该应用例为单存储实例的事务流程,该流程包括:
1.键值存储系统通过事务接口层,确定用户的事务中需写入的K1、K2以及K3;
2.在缓存中对K1、K2以及K3加锁;
3.在缓存中生成dtx_entry,令用户的事务的状态为active;
4.检测到K1、K2以及K3位于存储实例1。
5.写入KV数据(即需写入的K1、K2以及K3的新值)到存储实例1中的cf_data表。
6.删除dtx_entry。
7.更新committable_ts和committable_value。
8.在缓存中对K1、K2以及K3解锁。
9.返回执行结果。
如图5所示,该应用例为多单存储实例的事务流程,该流程包括:
1.键值存储系统通过事务接口层,确定用户的事务中需写入的K1、K2以及K3;
2.在缓存中对K1、K2以及K3加锁;
3.在缓存中生成dtx_entry,令用户的事务的状态为active;
4.检测到K1、K2以及K3位于不同的存储实例,在K1、K2以及K3选取leader,设leader位于存储实例1。
5.写入dtx_entry(包含KV数据,即需写入的K1、K2以及K3的新值)到存储实例1中的cf_dtx表。
6.在dtx_entry中,令用户的事务的状态为committable。
7.更新committable_ts和committable_value。
8.在缓存中对K1、K2以及K3解锁。
9.返回执行结果。
10.系统后台触发commit操作,从dtx_entry取出K1、K2以及K3,并按存储实例分类。
11.将KV数据写到对应存储实例的cf_data表。
12.删除存储实例1中的cf_dtx表中的dtx_entry。
更进一步地,若键值存储系统应用在文件系统中,存储实例中的用户数据表为dentry表以及inode表,这两个表中存储的键值对的键,要么包含目录(包含父目录和子目录)的索引编号,要么包含文件的索引编号,这些索引编号均是文件或目录在创建的时候,由文件系统来为文件或目录分配的。具体地,文件系统可通过以下公式来确定文件或目录的索引编号:inode=shard_ID+ID*N,其中,inode为文件或目录的索引编号,shard_ID为文件或目录对应的存储实例编号(即文件或目录所在的存储实例的编号),ID为在文件或目录所在的存储实例中,文件或目录的排序,N为存储实例的数量。
基于以上公式,当键值存储系统需要确定多个键所在的存储实例时,对于这多个键中的任意一个键,键值存储系统可将该键包含目录或文件的索引编号除以存储实例的总数量所得到的余数,作为该键对应的存储实例编号。那么,在确定该键对应的存储实例编号后,可基于该编号来找到该键所在的存储实例。
下面结合三个例子对分配索引编号的过程进行介绍,在这三个例子,假设存在4个存储实例,分别存储实例shard0、存储实例shard1、存储实例shard2、存储实例shard3:
例子1:设用户需创建文件/a.txt,父目录/的inode为1,Create操作输入1+a.txt,由于1在存储实例shard1上,因此,在存储实例shard1上的inode生成器上获取当前的counter值,假设为5,以此作为a.txt在存储实例shard1中的排序。那么,a.txt文件的inode为1+5*4=21。因此,本次Create操作需要更新的key包括:dentry表的1+a.txt、inode表的1和inode表的21,这3个key都属于shard1,故该事务属于单存储实例的事务。
例子2:设用户需创建目录/b/,父目录的/的inode为1,Create操作输入1+b/,由于是创建目录,因此,从4个存储实例中选取counter最小的存储实例,假设是存储实例shard0,其counter值为2,则目录b/的inode为0+2*4=8。因此本次create操作需要更新的key包括:dentry表的1+b/、inode表的1和inode表的8,其中2个key属于shard1,1个key属于shard0,故该事务属于多存储实例的事务。
例子3:设用户需创建文件/a.txt,/b/c.txt,/d/e.c,/f/g.h,其中/的inode为1,/b/的inode为5,/d/的inode为7,/f/的inode为11。在聚合生成事务时,会将属于同一个存储实例的create操作聚合成一个事务,因此会将/a.txt,/b/c.txt两个操作聚合成一个事务tx1,而/d/e.c,/f/g.h两个操作聚合成另一个事务tx2。Tx1和tx2均为单存储实例的事务。
以上是对本申请实施例提供的基于键值存储系统的数据管理方法所进行的详细说明,以下将对本申请实施例提供的基于键值存储系统的数据管理装置进行介绍。图6为本申请实施例提供的基于键值存储系统的数据管理装置的一个结构示意图,键值存储系统包含缓存以及多个存储实例,每个存储实例的用户数据表存储有键值对,如图6所示,该装置包括:
创建模块601,用于创建用户的事务;例如,创建模块601可用于实现图3所示实施例中的步骤301。
第一获取模块602,用于获取事务中所需读取的多个键,并对多个键添加事务的前台锁;例如,第一获取模块602可用于实现图3所示实施例中的步骤302。
第一读取模块603,用于若缓存中存储有多个键的值,则从缓存中读取多个键的值,并向用户提供多个键的值;例如,第一读取模块603可用于实现图3所示实施例中的步骤304。
第二读取模块604,用于若缓存中未存储有多个键的值,则从多个存储实例中确定多个键所在的存储实例,并从多个键所在的存储实例的用户数据表中,读取多个键的值,向用户提供多个键的值,并向缓存写入多个键的值;例如,第二读取模块604可用于实现图3所示实施例中的步骤305。
处理模块605,用于对多个键解除前台锁;
结束模块606,用于结束事务。例如,结束模块606可用于实现图3所示实施例中的步骤310。
本申请实施例中,键值存储系统在创建用户的事务后,可获取事务中所需读取的多个键,并对多个键添加事务的前台锁。若缓存中存储有多个键的值,键值存储系统则从缓存中读取多个键的值,并向用户提供多个键的值,若缓存中未存储有多个键的值,则从多个存储实例中确定多个键所在的存储实例,从多个键所在的存储实例的用户数据表中,读取多个键的值,向用户提供多个键的值,并向缓存写入多个键的值。然后,键值存储系统可对多个键解除前台锁,并结束用户的事务。基于前述过程可知,键值存储系统包含多个存储实例,可充分利用键值存储系统所在的服务器的CPU资源以及内存资源,且当用户需要读取多个键的值时,键值存储系统在多个键所在的存储实例中,既可逐个读取这多个键的值,也可以并发读取这多个键的值,以返回给用户使用,这样可以有效地发挥键值存储系统的并发能力。
在一种可能实现的方式中,该装置还包括:第二获取模块,用于获取事务中所需写入的多个键的新值,多个键的新值由用户对多个键的值进行修改得到;第一写入模块,用于若多个键存储于同一个存储实例中,则对多个键添加事务的后台锁,将多个键的新值写入存储实例的用户数据表以及缓存,对多个键解除事务的后台锁;通知模块,用于通知用户已成功写入多个键的新值。
在一种可能实现的方式中,每个存储实例还包含事务状态表,该装置还包括:第二获取模块,用于获取事务中所需写入的多个键的新值,多个键的新值由用户对多个键的值进行修改得到;第二写入模块,用于若多个键存储于若干个存储实例中,则在多个键中选择目标键,将多个键的新值写入目标键所在的存储实例的事务状态表以及缓存;通知模块,用于通知用户已成功写入多个键的新值;第三写入模块,用于对多个键添加事务的后台锁,将多个键的新值并发写入若干个存储实例的用户数据表,对多个键解除事务的后台锁,并在目标键所在的存储实例的事务状态表中,删除多个键的新值。
在一种可能实现的方式中,第二读取模块,用于确定多个键对应的存储实例编号,并基于多个键对应的存储实例编号,确定多个键所在的存储实例。
在一种可能实现的方式中,键值存储系统设置于文件系统中,在多个键中,每个键包含文件或目录的元数据,元数据包含文件或目录的索引编号,第二读取模块,用于将多个键包含的索引编号除以多个存储实例的数量所得到的余数,作为多个键对应的存储实例编号。
在一种可能实现的方式中,对多个键添加或解除前台锁的操作在缓存中实现,且对多个键添加或解除后台锁的操作在缓存中实现。
在一种可能实现的方式中,缓存还存储有以下至少一项:与多个键相关联的时间戳,与事务相关联的时间戳以及事务的状态。
需要说明的是,上述装置各模块/单元之间的信息交互、执行过程等内容,由于与本申请方法实施例基于同一构思,其带来的技术效果与本申请方法实施例相同,具体内容可参考本申请实施例前述所示的方法实施例中的叙述,此处不再赘述。
图7为本申请实施例提供的基于键值存储系统的数据管理装置的另一结构示意图。如图7所示,基于键值存储系统的数据管理装置的一个实施例可以包括一个或一个以上中央处理器701,存储器702,输入输出接口703,有线或无线网络接口704,电源705。
存储器702可以是短暂存储或持久存储。更进一步地,中央处理器701可以配置为与存储器702通信,在基于键值存储系统的数据管理装置上执行存储器702中的一系列指令操作。
本实施例中,中央处理器701可以执行前述图3所示实施例中的方法步骤,具体此处不再赘述。
本实施例中,中央处理器701中的具体功能模块划分可以与前述图6中的各个模块的划分方式类似,此处不再赘述。
本申请实施例还涉及一种计算机存储介质,该计算机可读存储介质中存储有用于进行信号处理的程序,当其在计算机上运行时,使得计算机执行如图3所示实施例中的步骤。
本申请实施例还涉及一种计算机程序产品,该计算机程序产品存储有指令,该指令在由计算机执行时使得计算机执行如图3所示实施例中的步骤。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统,装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在本申请所提供的几个实施例中,应该理解到,所揭露的系统,装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部 分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(read-only memory,ROM)、随机存取存储器(random access memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。

Claims (17)

  1. 一种基于键值存储系统的数据管理方法,其特征在于,所述键值存储系统包含缓存以及多个存储实例,每个存储实例的用户数据表存储有键值对,所述方法包括:
    创建用户的事务;
    获取所述事务中所需读取的多个键,并对所述多个键添加所述事务的前台锁;
    若所述缓存中存储有所述多个键的值,则从所述缓存中读取所述多个键的值,并向所述用户提供所述多个键的值;
    若所述缓存中未存储有所述多个键的值,则从所述多个存储实例中确定所述多个键所在的存储实例,从所述多个键所在的存储实例的用户数据表中,读取所述多个键的值,向所述用户提供所述多个键的值,并向所述缓存写入所述多个键的值;
    对所述多个键解除所述前台锁;
    结束所述事务。
  2. 根据权利要求1所述的方法,其特征在于,所述对所述多个键解除所述前台锁之前,所述方法还包括:
    获取所述事务中所需写入的所述多个键的新值,所述多个键的新值由所述用户对所述多个键的值进行修改得到;
    若所述多个键存储于同一个存储实例中,则对所述多个键添加所述事务的后台锁,将所述多个键的新值写入所述存储实例的用户数据表以及所述缓存,对所述多个键解除所述事务的后台锁;
    所述对所述多个键解除所述前台锁之后,所述方法还包括:
    通知所述用户已成功写入所述多个键的新值。
  3. 根据权利要求1所述的方法,其特征在于,每个存储实例还包含事务状态表,所述对所述多个键解除所述前台锁之前,所述方法还包括:
    获取所述事务中所需写入的所述多个键的新值,所述多个键的新值由所述用户对所述多个键的值进行修改得到;
    若所述多个键存储于若干个存储实例中,则在所述多个键中选择目标键,并将所述多个键的新值写入所述目标键所在的存储实例的事务状态表以及所述缓存;
    所述对所述多个键解除所述前台锁之后,所述方法还包括:
    通知所述用户已成功写入所述多个键的新值;
    对所述多个键添加所述事务的后台锁,将所述多个键的新值并发写入所述若干个存储实例的用户数据表,对所述多个键解除所述事务的后台锁,并在所述目标键所在的存储实例的事务状态表中,删除所述多个键的新值。
  4. 根据权利要求1至3任意一项所述的方法,其特征在于,所述从所述多个存储实例中确定所述多个键所在的存储实例包括:
    确定所述多个键对应的存储实例编号,并基于所述多个键对应的存储实例编号,确定所述多个键所在的存储实例。
  5. 根据权利要求4所述的方法,其特征在于,所述键值存储系统设置于文件系统中,在所述多个键中,每个键包含文件或目录的元数据,所述元数据包含文件或目录的索引编号,所述确定所述多个键对应的存储实例编号包括:
    将所述多个键包含的索引编号除以所述多个存储实例的数量所得到的余数,作为所述多个键对应的存储实例编号。
  6. 根据权利要求2或3所述的方法,其特征在于,对所述多个键添加或解除所述前台锁的操作在所述缓存中实现,且对所述多个键添加或解除所述后台锁的操作在所述缓存中实现。
  7. 根据权利要求1至6任意一项所述的方法,其特征在于,所述缓存还存储有以下至少一项:与所述多个键相关联的时间戳,与所述事务相关联的时间戳以及所述事务的状态。
  8. 一种基于键值存储系统的数据管理装置,其特征在于,所述键值存储系统包含缓存以及多个存储 实例,每个存储实例的用户数据表存储有键值对,所述装置包括:
    创建模块,用于创建用户的事务;
    第一获取模块,用于获取所述事务中所需读取的多个键,并对所述多个键添加所述事务的前台锁;
    第一读取模块,用于若所述缓存中存储有所述多个键的值,则从所述缓存中读取所述多个键的值,并向所述用户提供所述多个键的值;
    第二读取模块,用于若所述缓存中未存储有所述多个键的值,则从所述多个存储实例中确定所述多个键所在的存储实例,并从所述多个键所在的存储实例的用户数据表中,读取所述多个键的值,向所述用户提供所述多个键的值,并向所述缓存写入所述多个键的值;
    处理模块,用于对所述多个键解除所述前台锁;
    结束模块,用于结束所述事务。
  9. 根据权利要求8所述的装置,其特征在于,所述装置还包括:
    第二获取模块,用于获取所述事务中所需写入的所述多个键的新值,所述多个键的新值由所述用户对所述多个键的值进行修改得到;
    第一写入模块,用于若所述多个键存储于同一个存储实例中,则对所述多个键添加所述事务的后台锁,将所述多个键的新值写入所述存储实例的用户数据表以及所述缓存,对所述多个键解除所述事务的后台锁;
    通知模块,用于通知所述用户已成功写入所述多个键的新值。
  10. 根据权利要求8所述的装置,其特征在于,每个存储实例还包含事务状态表,所述装置还包括:
    第二获取模块,用于获取所述事务中所需写入的所述多个键的新值,所述多个键的新值由所述用户对所述多个键的值进行修改得到;
    第二写入模块,用于若所述多个键存储于若干个存储实例中,则在所述多个键中选择目标键,将所述多个键的新值写入所述目标键所在的存储实例的事务状态表以及所述缓存;
    通知模块,用于通知所述用户已成功写入所述多个键的新值;
    第三写入模块,用于对所述多个键添加所述事务的后台锁,将所述多个键的新值并发写入所述若干个存储实例的用户数据表,对所述多个键解除所述事务的后台锁,并在所述目标键所在的存储实例的事务状态表中,删除所述多个键的新值。
  11. 根据权利要求8至10任意一项所述的装置,其特征在于,所述第二读取模块,用于确定所述多个键对应的存储实例编号,并基于所述多个键对应的存储实例编号,确定所述多个键所在的存储实例。
  12. 根据权利要求11所述的装置,其特征在于,所述键值存储系统设置于文件系统中,在所述多个键中,每个键包含文件或目录的元数据,所述元数据包含文件或目录的索引编号,所述第二读取模块,用于将所述多个键包含的索引编号除以所述多个存储实例的数量所得到的余数,作为所述多个键对应的存储实例编号。
  13. 根据权利要求9或10所述的装置,其特征在于,对所述多个键添加或解除所述前台锁的操作在所述缓存中实现,且对所述多个键添加或解除所述后台锁的操作在所述缓存中实现。
  14. 根据权利要求8至13任意一项所述的装置,其特征在于,所述缓存还存储有以下至少一项:与所述多个键相关联的时间戳,与所述事务相关联的时间戳以及所述事务的状态。
  15. 一种基于键值存储系统的数据管理装置,其特征在于,所述装置包括存储器和处理器;所述存储器存储有代码,所述处理器被配置为执行所述代码,当所述代码被执行时,所述装置执行如权利要求1至7任意一项所述的方法。
  16. 一种计算机存储介质,其特征在于,所述计算机存储介质存储有一个或多个指令,所述指令在由一个或多个计算机执行时使得所述一个或多个计算机实施权利要求1至7任一所述的方法。
  17. 一种计算机程序产品,其特征在于,所述计算机程序产品存储有指令,所述指令在由计算机执行时,使得所述计算机实施权利要求1至7任意一项所述的方法。
PCT/CN2023/109096 2022-07-25 2023-07-25 一种基于键值存储系统的数据管理方法及其相关设备 WO2024022329A1 (zh)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
CN202210877102.6 2022-07-25
CN202210877102 2022-07-25
CN202211336267.9 2022-10-28
CN202211336267.9A CN117493388A (zh) 2022-07-25 2022-10-28 一种基于键值存储系统的数据管理方法及其相关设备

Publications (1)

Publication Number Publication Date
WO2024022329A1 true WO2024022329A1 (zh) 2024-02-01

Family

ID=89678779

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/109096 WO2024022329A1 (zh) 2022-07-25 2023-07-25 一种基于键值存储系统的数据管理方法及其相关设备

Country Status (2)

Country Link
CN (1) CN117493388A (zh)
WO (1) WO2024022329A1 (zh)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110347852A (zh) * 2019-06-06 2019-10-18 华中科技大学 嵌入横向扩展键值存储系统的文件系统及文件管理方法
CN113568908A (zh) * 2021-07-16 2021-10-29 华中科技大学 一种键值请求并行调度方法及系统
CN113704261A (zh) * 2021-08-26 2021-11-26 平凯星辰(北京)科技有限公司 基于云存储的键值存储系统
WO2022063059A1 (zh) * 2020-09-23 2022-03-31 华为云计算技术有限公司 键值存储系统的数据管理方法及其装置
CN114780564A (zh) * 2022-04-21 2022-07-22 京东科技控股股份有限公司 数据处理方法、数据处理装置、电子设备和存储介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110347852A (zh) * 2019-06-06 2019-10-18 华中科技大学 嵌入横向扩展键值存储系统的文件系统及文件管理方法
WO2022063059A1 (zh) * 2020-09-23 2022-03-31 华为云计算技术有限公司 键值存储系统的数据管理方法及其装置
CN113568908A (zh) * 2021-07-16 2021-10-29 华中科技大学 一种键值请求并行调度方法及系统
CN113704261A (zh) * 2021-08-26 2021-11-26 平凯星辰(北京)科技有限公司 基于云存储的键值存储系统
CN114780564A (zh) * 2022-04-21 2022-07-22 京东科技控股股份有限公司 数据处理方法、数据处理装置、电子设备和存储介质

Also Published As

Publication number Publication date
CN117493388A (zh) 2024-02-02

Similar Documents

Publication Publication Date Title
KR102141234B1 (ko) 분산된 데이터 스토어 내의 버젼형 계층 데이터 구조
US9767131B2 (en) Hierarchical tablespace space management
US10275489B1 (en) Binary encoding-based optimizations at datastore accelerators
US20090012932A1 (en) Method and System For Data Storage And Management
CN111143389A (zh) 事务执行方法、装置、计算机设备及存储介质
US11080253B1 (en) Dynamic splitting of contentious index data pages
US8380663B2 (en) Data integrity in a database environment through background synchronization
WO2018157602A1 (zh) 一种同步活动事务表的方法及装置
EP3788489B1 (en) Data replication in a distributed storage system
CN112162846B (zh) 事务处理方法、设备及计算机可读存储介质
US10909091B1 (en) On-demand data schema modifications
US11221777B2 (en) Storage system indexed using persistent metadata structures
US11100129B1 (en) Providing a consistent view of associations between independently replicated data objects
US10146833B1 (en) Write-back techniques at datastore accelerators
US10387384B1 (en) Method and system for semantic metadata compression in a two-tier storage system using copy-on-write
US20240020298A1 (en) Serialization of data in a concurrent transaction processing distributed database
CN112685417B (zh) 数据库操作方法、系统、装置、服务器及存储介质
WO2024022329A1 (zh) 一种基于键值存储系统的数据管理方法及其相关设备
US20200242086A1 (en) Distribution of global namespace to achieve performance and capacity linear scaling in cluster filesystems
US11615083B1 (en) Storage level parallel query processing
US11188228B1 (en) Graphing transaction operations for transaction compliance analysis
CN114661690A (zh) 多版本并发控制和日志清除方法、节点、设备和介质
WO2024022330A1 (zh) 一种基于文件系统的元数据管理方法及其相关设备
WO2022001629A1 (zh) 一种数据库系统、管理事务的方法及装置
US11914571B1 (en) Optimistic concurrency for a multi-writer database

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23845539

Country of ref document: EP

Kind code of ref document: A1