WO2023165691A1 - Method of updating key/value pair in object storage system and object storage system - Google Patents

Method of updating key/value pair in object storage system and object storage system Download PDF

Info

Publication number
WO2023165691A1
WO2023165691A1 PCT/EP2022/055291 EP2022055291W WO2023165691A1 WO 2023165691 A1 WO2023165691 A1 WO 2023165691A1 EP 2022055291 W EP2022055291 W EP 2022055291W WO 2023165691 A1 WO2023165691 A1 WO 2023165691A1
Authority
WO
WIPO (PCT)
Prior art keywords
key
value
range
computer
storage system
Prior art date
Application number
PCT/EP2022/055291
Other languages
French (fr)
Inventor
Aviv Kuvent
Idan Zach
Assaf Natanzon
Elizabeth FIRMAN
Ovad Somech
Original Assignee
Huawei Technologies Co., Ltd.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co., Ltd. filed Critical Huawei Technologies Co., Ltd.
Priority to PCT/EP2022/055291 priority Critical patent/WO2023165691A1/en
Publication of WO2023165691A1 publication Critical patent/WO2023165691A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures

Definitions

  • the present disclosure relates generally to the field of data management and more specifically, to a computer-implemented method of updating key/value pair in an object storage system and the object storage system.
  • a file-system is a computer data storage architecture that manages data as a collection of files and directories.
  • the directories allow a user to group the files into separate collections.
  • an object storage system that manages data as objects
  • the conventional object storage system allows the retention of massive amounts of unstructured data and is used for various purposes, such as storing different types of data, for example, photos, videos, or files.
  • the object storage systems are designed and built for archiving large objects. Such object storage systems provide high-bandwidth access to large objects. However, these systems face technical challenges while dealing with many small-sized objects.
  • each object’s data requires first accessing a metadata server for mapping and other settings and then, accessing physical storage of each object’s data.
  • the latency introduced by one-time-per-object metadata access is almost negligible in comparison to the time required to completely load a full large object.
  • the metadata server access can basically double the latency for data access and therefore, becomes a technical challenge for the overall object storage system(s).
  • the object storage systems face a technical problem to provide thousands of concurrent object operations in a manner that is strictly consistent, performance-optimized, and uses physical storage efficiently.
  • the technical problem is further compounded by taxing the object storage systems with serving metadata for more and more copies of objects as the objects are replicated. Therefore, the object storage systems designed and optimized for large objects are not able to satisfy the requirements for small objects.
  • a conventional object storage system that removes the dependency on external metadata databases which allows to work with large number of small objects a bit faster.
  • the conventional object storage system stores metadata and data directly on disk to provide a partially improved performance and scalability.
  • Another conventional object storage system is proposed, which deals with small objects in a way that it stores the data inside a metadata object instead of storing the data in a data object, therefore, the read operations do not require to read the data from two objects (i.e., the metadata object and data object), but only from a single object (i.e., the metadata object).
  • a yet another object storage system is proposed that supports a key/value application programming interface (API).
  • the key/value API allows the object storage system to store records, each comprised of a key and a value, in a simple and an efficient way.
  • the object storage system key/value API creates an object per key and hence, becomes highly inefficient for small values, which result in creation of a number of small objects.
  • the proposed solutions partially improve the performance with small objects and do not efficiently solve the problem of dealing with a huge number of tiny (very small) objects. Because in case of huge number of tiny (very small) objects, the metadata used for maintaining the huge number of tiny objects is still large and inefficient in terms of disk space and search performance. Thus, there exists a technical problem of how to efficiently store and update a huge number of tiny (very small) objects in an object storage system with reduced latency as well.
  • the present disclosure provides a computer-implemented method of updating key/value pair in an object storage system and the object storage system.
  • the present disclosure provides a solution to the existing problem of how to efficiently store and update a huge number of tiny (very small) objects in an object storage system with reduced latency.
  • An objective of the present disclosure is to provide a solution that overcomes at least partially the problems encountered in the prior art and provides an improved computer-implemented method of updating key/value pair in an object storage system, and an improved object storage system.
  • the present disclosure provides a computer-implemented method of updating a key/value pair in an object storage system, in which each of the one or more objects contains a range of key/value pairs.
  • the computer-implemented method includes receiving a key number and a corresponding value and identifying a subset of objects for which a starting key in the range is less than or equal to the received key number.
  • the computer- implemented method further includes identifying an object in the subset of objects for which an end key in the range is equal to or greater than the received key number and updating the corresponding value to the received key number in the identified object.
  • the computer-implemented method provides an efficient way of managing, updating, and storing tiny (i.e., very small) objects in the object storage system.
  • the computer- implemented method efficiently updates the objects using the key/value pairs.
  • the computer- implemented method provides an efficient and accurate identification of the subset of objects for which the starting key in the range is less than or equal to the received key number, further resulting into an accurate and reliable identification of the object for which the end key in the range is equal to or greater than the received key number. By virtue of identifying the subset of the objects, the object having the received key number is identified more accurately and efficiently.
  • each object has a name corresponding to a starting key in the range.
  • the naming of the object by its corresponding starting key enables to locate the object in the object storage system more efficiently and with reduced latency.
  • the computer-implemented method further comprises deducing the end key in the range of an object based on the name of the subsequent object.
  • deducing the end key in the range of the object based on the name of the subsequent object enables to identify and locate the object from the subset of the objects in the object storage system more efficiently and accurately.
  • updating the value includes reading the object and rewriting the object with the updated value.
  • reading the object includes reading metadata relating to the object to determine a plurality of sub-ranges within the object, identifying a sub-range containing the received key number, and reading the identified sub-range.
  • reading metadata relating to the object and identifying the sub-range of keys containing the received key number eliminates the requirement of reading an entire object that provides an efficient way of reading the object.
  • updating the value includes rebuilding the object based on locally stored data.
  • rebuilding the object based on locally stored data provides an efficient way of updating the object in the object storage system with reduced latency.
  • updating the value comprises adding a new key/value pair if the received key number is not found within the object.
  • the addition of new key/value pair if the received key number is not found enables to maintain an ideal size and the range of the object in the object storage system more efficiently and accurately.
  • updating the value comprises generating a new object including the new key/value pair if the number of key/value pairs in the object is greater than a predefined threshold.
  • generating a new object including the new key/value pair if the number of key/value pairs in the object is greater than a predefined threshold enables to maintain an ideal size of the object in the object storage system.
  • updating the value comprises dividing the object into two or more objects if the number of key/value pairs in the object is greater than a predefined threshold.
  • the division of the object into two or more objects if the number of key/value pairs in the object is greater than the predefined threshold, enables an equal size of each object.
  • the computer-implemented method further comprises merging the object with one or more adjacent objects if the number of key/value pairs in the object is below a predefined threshold.
  • the merging of two or more objects into one object if the number of key/value pairs in the object is less than the predefined threshold, enables an equal size of each object.
  • the present disclosure provides a computer-readable medium comprising instructions which, when executed by a processor, cause the processor to perform the method.
  • the processor achieves all the advantages and technical effects of the method after execution of the method.
  • the present disclosure provides an object storage system comprising one or more processors configured to perform the method.
  • the object storage system achieves all the advantages and technical effects of the computer- implemented method of the present disclosure.
  • FIG. 1 is a flow chart of a computer-implemented method of updating a key/value pair in an object storage system, in accordance with an embodiment of the present disclosure
  • FIG. 2 is an illustration of an object storage system that includes generation of a new object, in accordance with an embodiment of the present disclosure
  • FIG. 3 is an illustration of an object storage system describing the retrieval of a key value from an object, in accordance with an embodiment of the present disclosure.
  • FIG. 4 is a block diagram that illustrates various exemplary components of an object storage system, in accordance with an embodiment of the present disclosure.
  • an underlined number is employed to represent an item over which the underlined number is positioned or an item to which the underlined number is adjacent.
  • a non-underlined number relates to an item identified by a line linking the nonunderlined number to the item.
  • the non-underlined number is used to identify a general item at which the arrow is pointing.
  • FIG. 1 is a flow chart of a computer-implemented method of updating a key/value pair in an object storage system, in accordance with an embodiment of the present disclosure.
  • FIG. 1 there is shown a flow chart of a computer- implemented method 100 of updating a key/value pair in an object storage system.
  • the computer- implemented method 100 includes steps 102 to 108.
  • the computer-implemented method 100 may also be referred to as a key division function in which various keys are divided across objects in the object storage system.
  • the computer-implemented method 100 is used to determine to which object in the object storage system belongs to a given key (or a key number).
  • the object storage system may include three objects and each object may have the range of key/value pairs in a form of integer numbers. These three objects may be named as an object “0”, an object “100”, and an object “200”.
  • each of the three objects contains the range of key/value pairs, such as the object “0” contains the range from 0 to 99 keys, the object “100” contains the range from 100 to 199 keys, and the object “200” contains the range from 200 to INF keys, where INF (infinite) represents a maximum theoretical value of the keys.
  • each object contains the range of key/value pairs therefore, the object storage system eliminates the requirement of generating an object per key, hence results in an efficient generation of objects.
  • the computer- implemented method 100 is executed by the object storage system, described in detail, for example, in FIG. 4.
  • the computer-implemented method 100 comprises, receiving a key number and a corresponding value.
  • the received key number corresponds to the key number that is to be updated with the corresponding value.
  • the received key number may be “150”.
  • the received key number i.e., “150” is updated with the corresponding value in the object in which the received key number is located.
  • the computer-implemented method 100 further comprises, identifying a subset of objects for which a starting key in the range is less than or equal to the received key number.
  • each of the three objects are searched and the subset of objects are identified for which the starting key in the range of keys is less than or equal to the received key number.
  • the starting key refers to a key that lies at a first location in the range of keys. For example, in case of the object “0”, the starting key is “0” in the range of keys from 0 to 99. Similarly, in case of the object “100” and the object “200”, the starting keys are 100 and 200, respectively.
  • the object “0” and the object “100” are identified as the subset of objects because in case of the object “0” and the object “100”, the starting key in the range is less than the received key number (i.e., “150”).
  • the identification of the subset of objects for which the starting key in the range is less than or equal to the received key number enables to locate the received key number in the object storage system more efficiently and with reduced latency.
  • each object has a name corresponding to the starting key in the range.
  • Each of the one or more objects in the object storage system cannot have overlapping ranges of the key/value pairs due to which each of the two or more objects cannot have similar names.
  • the object “0” contains the range from 0 to 99 keys
  • the object “100” includes the range of keys from 100 to 199
  • the object “200” includes the range of keys from 200 to INF, and therefore, no overlapping of keys is there.
  • the starting keys are 0, 100, and 200, respectively and the names of the objects are the object “0”, the object “100”, and the object “200” respectively.
  • each of the object “0”, the object “100”, and the object “200” includes their respective starting keys “0”, “100”, and “200”, in their names. Therefore, for a given key number, an object having the given key number may be located based only on the object name in the object storage system.
  • each of the one or more objects may be represented by the name of the object that includes an object ID that is represented using strings.
  • the name of the object includes an object ID of 8 bytes that is divided into four parts named as bucket ID, prefix, most significant bit (MSB), and least significant bit (LSB).
  • MSB most significant bit
  • LSB least significant bit
  • the name of the object can be represented as “my-bucket/my-folder/0140- 0AFF-090B-0007”.
  • the object ID is represented in the form of strings such as the object ID “1” is represented as “0000-0000-0000-0001”.
  • the object ID “2” is represented as “0000-0000-0000-0002”
  • the object ID “255” is represented as “0000- 0000-0000-00FF”
  • the object ID “16385” is represented as “0000-0000-0000-4001”
  • the object ID “97000000” is represented as “0000-0000-05C8-1A40”.
  • the object ID (whose size is 8 bytes) is represented in the form of strings as described in Table 1.
  • the naming of the object by its corresponding starting key enables to locate the object in the object storage system more efficiently and with reduced efficiency.
  • the computer-implemented method 100 further comprises, identifying an object in the subset of objects for which an end key in the range is equal to or greater than the received key number. After identification of the subset of objects, the object in the subset of objects is identified for which the end key in the range is equal to or greater than the received key number (i.e., “150”).
  • the end key refers to a key that lies at the end in the range of keys.
  • the end key can be deduced from the starting key of the subsequent object. For example, for the object “0”, the end key is “99” which is deduced from the starting key of the object “100”. Similarly, for the object “100”, the end key is “199”, deduced from the starting key of the object “200”.
  • the end key In order to identify the object in the subset of objects (i.e., the object “0” and the object “100”) for the received key number (i.e., “150”), the end key must be equal or greater than the received key number (i.e., “150”). Therefore, out of the subset of objects (i.e., the object “0” and the object “100”), the object “100” is identified for the received key number “150”, because the end key (i.e., 199) for the object “100” is greater than the received key number (i.e., “150”). Alternatively stated, the received key number (i.e., “150”) lies in the object “100”.
  • the computer-implemented method 100 further comprises, deducing the end key in the range of an object based on the name of the subsequent object. Since each object has the staring key in its name, therefore, the end key of a preceding object can be deduced from name of the subsequent object. For example, the end key (i.e., “99”) of the object “0” is deduced from name of the object “100” that includes the starting key “100” in its name. The end key is deduced from the starting key of the subsequent object when the objects are ordered by their names.
  • deducing the end key in the range of the object based on the name of the subsequent object enables to identify and locate the object from the subset of the objects in the object storage system more efficiently and accurately.
  • the computer-implemented method 100 comprises, updating the corresponding value to the received key number in the identified object. After identification of the object (i.e., the object “100”) in which the received key number (i.e., “150”) is located, the value corresponding to the received key number (i.e., “150”) is updated in the identified object (i.e., the object “100”). Similarly, if another key number (e.g., “250”) is received and required to update then, the object containing the other key number (i.e., “250”) is identified. Therefore, the object “200” is identified having the range of keys from 200 to 299 and the other key number (i.e., “250”) is located and updated with the corresponding value. In this way, the computer-implemented method 100 enables the object storage system to update the object more efficiently.
  • updating the value includes reading the object and rewriting the object with the updated value. Updating the corresponding value for the received key number (i.e., “150”) includes either reading, more particularly, a partial reading of the object “100”. An exemplary scenario of the partial reading of an object is described in detail, for example, in FIG. 3.
  • the computer-implemented method 100 further comprises, reading the object that includes reading metadata relating to the object to determine a plurality of sub-ranges within the object, identifying a sub-range containing the received key number, and reading the identified sub-range.
  • the metadata of the object is read while reading the object. The reason being the metadata of the object can be more efficiently accessed than the object data. Since, each object includes the range of keys and the range of keys is divided into the non-overlapping sub-ranges of keys. The keys and values of the sub-ranges of keys appear sequentially in the object and, the metadata of the object maintains a mapping of the plurality of sub-ranges of keys in the object through which a sub-range of keys that contains the received key number is identified in the object.
  • the metadata of the object is used to locate a relevant sub-range of keys containing the received key number.
  • each subrange of keys is read from the object data and the relevant keys and values are located in the read sub-range based on the received key number.
  • reading metadata relating to the object and identifying the sub-range of keys containing the received key number eliminates the requirement of reading an entire object, hence results in an efficient way of reading the object.
  • updating the value includes rebuilding the object based on locally stored data.
  • the rebuilding of the object corresponds to the rewriting of the values with the updated value.
  • the values of the keys in the object are retrieved through the data that is stored locally and is used to re-write the object.
  • the key/value pairs of the object are stored in the database locally without backing up metadata to the object storage system in a cloud storage without limiting the scope of the present disclosure.
  • the object storage system can maintain a local copy of the metadata to serve local requests efficiently. The local storage can be accessed to retrieve the values for certain unchanged keys and re-write the object.
  • a cache of the most frequently accessed objects can be stored in the local database, as a partial optimization.
  • an object storage system with an object “1” contains the range from 1 to 100 keys.
  • the values for the keys 1 to 9 and 11 to 100 are retrieved from the local database.
  • the retrieved key/value pairs are combined with the key “10” and its corresponding new value.
  • the object “1” is updated with its updated key/value pair.
  • rebuilding the object based on locally stored data provides an efficient way of updating the object in the object storage system with reduced latency.
  • updating the value includes adding a new key/value pair if the received key number is not found within the object.
  • the key with the key number “10” is identified. If the key with the key number “10” is not found in the object “1” due to the deletion of the key/value pair, then a new key with the key number “10” with its corresponding updated value is created within the object “1”.
  • the addition of new key/value pair if the received key number is not found enables to maintain the predefined threshold size and the range of the object in the object storage system more efficiently and accurately.
  • updating the value comprises generating a new object including the new key/value pair if the number of key/value pairs in the object is greater than a predefined threshold.
  • the predefined threshold may be defined as the maximum ideal size of an object in terms of a total number of key/value pairs lying within the object. For example, the object “0” contains the range from 0 to 99, the object “100” contains the range from 100 to 199, and the object “200” contains the range from 200-INF and the received key number is “300”.
  • the ideal size of each object is 100 keys.
  • the identified object that contains the received key number is the object “200” and the received key number “300” is located at the end of the range (i.e., 200 to INF) of the key/value pairs in the object “200”.
  • the addition of a new key/value pair within the object “200” exceeds the ideal size of the object “200”. Therefore, a new object is created.
  • An exemplary scenario of generating a new object including the new key/value pair is described in detail, for example, in FIG. 2.
  • generating a new object including the new key/value pair if the number of key/value pairs in the object is greater than a predefined threshold enables to maintain an ideal size of the object in the object storage system more.
  • updating the value comprises dividing the object into two or more objects if the number of key/value pairs in the object is greater than a predefined threshold.
  • a predefined threshold Each object has a maximum ideal size in terms of the number of key/value pairs after which any key/value pair cannot be added to the object. If the size of the object becomes greater than the predefined threshold value due to the addition of the key/value pair, then, the object is divided into two or more objects. Alternatively stated, if the number of key/value pairs in the object is greater than the predefined threshold value, then, in that case, the object that contains the received key is divided into two or more objects.
  • the division (or split) operation can be executed either online or in the background during a garbage collection process.
  • the division of the object into two or more objects if the number of key/value pairs in the object is greater than the predefined threshold, enables an equal size of each object.
  • the computer-implemented method 100 further comprises, merging the object with one or more adjacent objects if the number of key/value pairs in the object is below the predefined threshold.
  • Each object has an ideal size in terms of the number of key/value pairs below which any number of key/value pairs can be merged to the object. If the size of the object becomes less than the predefined threshold value due to the deletion of the key/value pair, then, two or more objects are merged into one object. Alternatively stated, if the number of key/value pairs in the object is less than the predefined threshold value, then, in that case, the object that contains the received key is merged with two or more objects.
  • the merge operation can be executed either online or in the background during a garbage collection process. Beneficially, the merging of two or more objects into one object, if the number of key/value pairs in the object is less than the predefined threshold, enables an equal size of each object.
  • the computer-implemented method 100 provides an efficient way of managing, updating, and storing tiny (i.e., very small) objects in the object storage system.
  • the computer-implemented method 100 efficiently updates the objects using the key/value pairs.
  • the computer-implemented method 100 provides an efficient and accurate identification of the subset of objects for which the starting key in the range is less than or equal to the received key number, further resulting into an accurate and reliable identification of the object for which the end key in the range is equal to or greater than the received key number. By virtue of identifying the subset of the objects, the object having the received key number is identified more accurately and efficiently.
  • the computer-implemented method 100 ensures the ideal size and the ideal range of the objects in the object storage system by merging and division of the objects. Additionally, the computer- implemented method 100 supports efficient management of the objects with reduced latency by storing, updating, and retrieving the key/value pairs.
  • steps 102 to 108 are only illustrative, and other alternatives can also be provided where one or more steps are added, one or more steps are removed, or one or more steps are provided in a different sequence without departing from the scope of the claims herein.
  • a computer-readable medium comprising instructions which, when executed by a processor, cause the processor to perform the computer-implemented method 100.
  • the instructions are implemented on the computer-readable media which include, but is not limited to, Electrically Erasable Programmable Read-Only Memory (EEPROM), Random Access Memory (RAM), Read Only Memory (ROM), Hard Disk Drive (HDD), Flash memory, a Secure Digital (SD) card, Solid-State Drive (SSD), a computer readable storage medium, and/or CPU cache memory.
  • the instructions are generated by a computer program, which is implemented in view of the computer-implemented method 100, and for use in implementing the computer- implemented method 100 on one or more processors.
  • FIG. 2 is an illustration of an object storage system that includes generation of a new object, in accordance with an embodiment of the present disclosure.
  • FIG. 2 is described in conjunction with elements from the FIG. 1.
  • an object storage system 202 that includes a first object 204, a second object 206, a third object 208, and a fourth object 210.
  • the object storage system 202 corresponds to the system that includes one or more objects and each object includes a range of keys.
  • the object storage system 202 is configured to perform the computer- implemented method 100 (of FIG. 1) of updating key/value pairs.
  • Each of the first object 204, the second object 206, the third object 208, and the fourth object 210 corresponds to the object that includes multiple key/value pairs.
  • the multiple keys/value pairs are updated, stored, and retrieved by implementation of the computer-implemented method 100.
  • the object storage system 202 includes three objects, such as the first object 204 (may also be represented as object “0”), the second object 206 (may also be represented as object “100”), and the third object 208 (may also be represented as object “200”). Moreover, each of the three objects contains the range of key/value pairs, such as the first object 204 (i.e., the object “0”) contains the range of keys from 0 to 99, the second object 206 (i.e., the object “100”) contains the range of keys from 100 to 199, and the third object 208 (i.e., the object “200”) contains the range of keys from 200 to INF.
  • the first object 204 i.e., the object “0” contains the range of keys from 0 to 99
  • the second object 206 i.e., the object “100”
  • the third object 208 i.e., the object “200” contains the range of keys from 200 to INF.
  • a received key number i.e., “300”
  • a received key number i.e., “300”
  • each object out of the three objects i.e., the first object 204, the second object 206, and the third object 208 in the object storage system 202 is searched, and the subset of objects is identified for which the starting key in the range of keys is less than or equal to the received key number (i.e., the key “300”).
  • the first object 204 with the starting key “0”, the second object 206 with the starting key “100”, and the third object 208 with starting key “200” are identified as the subsets of objects because, in case of the first object 204, the second object 206, and the third object 208, the starting key in the range are 0, 100, and 200 that is less than the received key number (i.e., “300”). Thereafter, the objects (i.e., the third object 208) for which the end key is equal or greater than the received key number (i.e., “300”) is identified.
  • the received key number (i.e., “300”) is at the end of the range (i.e., the range from 200 to INF) of key/values pairs in the identified object (i.e., the third object 208).
  • the new key/value pair i.e., of the received key number “300”
  • the predefined threshold i.e., the maximum ideal size of each object. Therefore, a new object (i.e., the fourth object 210) including the new key/value pair (i.e., “300”) is generated in the object storage system 202.
  • the object storage system 202 contains the first object 204 that represents the keys from 0 to 99, the second object 206 that represents the keys from 100 to 199, the third object 208 that represents the keys from 200 to 299 and the fourth object 210 that represents the keys from 300 to infinite (i.e., the maximum theoretical value of the key).
  • the updating and generation of the new object in the object storage system 202 if the object exceeds the predefined threshold enables the object storage system 202 to maintain the ideal size and the ideal range of the objects in the object storage system 202.
  • FIG. 3 is an illustration of an object storage system describing the retrieval of a key value from an object, in accordance with an embodiment of the present disclosure.
  • FIG. 3 is described in conjunction with elements from the FIGs. 1 and 2.
  • an object storage system 302 that includes a first object 304.
  • a number of key/value pairs along with a plurality of sub-ranges.
  • the first object 304 may correspond to the Object “100” of FIG. 1.
  • the object storage system 302 corresponds to the object storage system 202 (of FIG. 2).
  • the object storage system 302 includes the first object 304 (i.e., the object “100”) that stores multiple keys and their corresponding values in the range from 100 to 199 keys.
  • the object storage system 302 by using the computer- implemented method 100 divides the range of the first object 304 (i.e., the object “100”) into the plurality of sub-ranges, that is ten small sub-ranges. For example, the range of keys from 100 to 199 is divided into 100 to 109, 110 to 119, 120 to 129, 130 to 139, and so on till 190 to 199 in the first object 304.
  • each sub-range of the plurality of subranges is mapped to its start offset (whose size is in bytes) and stored in the object metadata. For example, a start offset of the subrange 100 to 109 is “0”, a start offset of the subrange 110 to 119 is “20”, and a start offset of the subrange 120 to 129 is “40” and so on.
  • the object i.e., the first object 304 is identified in which the received key (i.e., “115”) is stored.
  • the metadata of the identified object i.e., the first object 304 is read and the sub-range (i.e., the range from 110 to 119 keys) is determined. Thereafter, the sub-range through its corresponding start offset (i.e., 20, whose size is 40 bytes) is read.
  • the start offset of the identified object i.e., the first object 304) contains all the keys and values of the corresponding sub-range (i.e., from 110 to 119 keys). Finally, the value of the key “115” is retrieved.
  • updating the key/value pair in the object storage system 302 eliminates the requirement of reading the entire object. Further, retrieving the value through metadata reduces the latency and enables to retrieve the value more efficiently.
  • FIG. 4 is a block diagram that illustrates various exemplary components of an object storage system, in accordance with an embodiment of the present disclosure.
  • FIG. 4 is described in conjunction with elements from the FIGs. 1, 2, and 3.
  • a block diagram 400 that represents the object storage system 202 that includes one or more processors 402, a memory 404, and a network interface 406.
  • the one or more processors 402 includes suitable logic, circuitry, interfaces, and/or code that is configured to perform the computer-implemented method 100 in the object storage system 202.
  • Examples of implementation of one or more processors 402 may include but are not limited to a central data processing device, a microprocessor, a microcontroller, a complex instruction set computing (CISC) processor, an application-specific integrated circuit (ASIC) processor, a reduced instruction set (RISC) processor, a very long instruction word (VLIW) processor, a state machine, and other processors or control circuitry.
  • CISC complex instruction set computing
  • ASIC application- specific integrated circuit
  • RISC reduced instruction set
  • VLIW very long instruction word
  • the object storage system 202 (or the object storage system 302) provides updating of key/value pairs in which each of the one or more objects contains the range of key/value pairs.
  • the object storage system 202 includes receiving the key number and the corresponding value. Further, the object storage system 202 includes identification of the subsets of the objects for which the starting key in the range is less than or equal to the received key number and the end key in the range is equal to or greater than the received key number to locate the object having the received key number. Further, the object storage system 202 includes updating the corresponding value of the received key number in the identified object.
  • the memory 404 may include suitable logic, circuitry, interfaces, or code that is configured to store the instructions executable by the one or more processors 402.
  • Examples of memory 404 may include, but are not limited to, an Electrically Erasable Programmable Read-Only Memory (EEPROM), Random Access Memory (RAM), Read-Only Memory (ROM), Hard Disk Drive (HDD), Flash memory, Solid-State Drive (SSD), persistent memory, remote direct memory access (RDMA), or CPU cache memory.
  • EEPROM Electrically Erasable Programmable Read-Only Memory
  • RAM Random Access Memory
  • ROM Read-Only Memory
  • HDD Hard Disk Drive
  • Flash memory Solid-State Drive
  • SSD Solid-State Drive
  • RDMA remote direct memory access
  • the network interface 406 is communicatively coupled to each of the memory 404 and the one or more processors 402 of the object storage system 202.
  • Examples of the network interface 406 may include, but are not limited to, a computer port, a network socket, a network interface controller (NIC), and any other network interface device.
  • NIC network interface controller
  • the object storage system 202 (or the object storage system 302) includes one or more processors 402 configured to perform the computer- implemented method 100.
  • the object storage system 202 (or the object storage system 302) that includes one or more processors 402 achieves all the advantages and technical effects of the computer-implemented method 100 of the present disclosure.

Abstract

A computer-implemented method of updating a key/value pair in an object storage system, in which each of the one or more objects contains a range of key/value pairs. The computer-implemented method includes receiving a key number and a corresponding value and identifying a subset of objects for which a starting key in the range is less than or equal to the received key number. The computer-implemented method includes identifying an object in the subset of objects for which an end key in the range is equal to or greater than the received key number and updating the corresponding value to the received key number in the identified object. The computer-implemented method of updating the key/value in an object storage system enables the object storage system to efficiently manage and update tiny (very small) objects and with reduced latency as well.

Description

METHOD OF UPDATING KEY/VALUE PAIR IN OBJECT STORAGE SYSTEM AND OBJECT STORAGE SYSTEM
TECHNICAL FIELD
The present disclosure relates generally to the field of data management and more specifically, to a computer-implemented method of updating key/value pair in an object storage system and the object storage system.
BACKGROUND
Typically, a file-system is a computer data storage architecture that manages data as a collection of files and directories. The directories allow a user to group the files into separate collections. Similarly, there is another computer data storage architecture, named as an object storage system that manages data as objects, in contrast to other storage architectures, such as the typical file-system, which manages the data as a file hierarchy. The conventional object storage system allows the retention of massive amounts of unstructured data and is used for various purposes, such as storing different types of data, for example, photos, videos, or files. Initially, the object storage systems are designed and built for archiving large objects. Such object storage systems provide high-bandwidth access to large objects. However, these systems face technical challenges while dealing with many small-sized objects. In particular, access to each object’s data requires first accessing a metadata server for mapping and other settings and then, accessing physical storage of each object’s data. With large objects, the latency introduced by one-time-per-object metadata access is almost negligible in comparison to the time required to completely load a full large object. With many small objects, the metadata server access can basically double the latency for data access and therefore, becomes a technical challenge for the overall object storage system(s). Thus, the object storage systems face a technical problem to provide thousands of concurrent object operations in a manner that is strictly consistent, performance-optimized, and uses physical storage efficiently. The technical problem is further compounded by taxing the object storage systems with serving metadata for more and more copies of objects as the objects are replicated. Therefore, the object storage systems designed and optimized for large objects are not able to satisfy the requirements for small objects.
Currently, certain methods have been proposed to improve the performance of object storage systems, which are required to deal with many concurrent queries and operations across enormous number of objects. For example, a conventional object storage system is proposed that removes the dependency on external metadata databases which allows to work with large number of small objects a bit faster. The conventional object storage system stores metadata and data directly on disk to provide a partially improved performance and scalability. Another conventional object storage system is proposed, which deals with small objects in a way that it stores the data inside a metadata object instead of storing the data in a data object, therefore, the read operations do not require to read the data from two objects (i.e., the metadata object and data object), but only from a single object (i.e., the metadata object). A yet another object storage system is proposed that supports a key/value application programming interface (API). The key/value API allows the object storage system to store records, each comprised of a key and a value, in a simple and an efficient way. The object storage system key/value API creates an object per key and hence, becomes highly inefficient for small values, which result in creation of a number of small objects. The proposed solutions partially improve the performance with small objects and do not efficiently solve the problem of dealing with a huge number of tiny (very small) objects. Because in case of huge number of tiny (very small) objects, the metadata used for maintaining the huge number of tiny objects is still large and inefficient in terms of disk space and search performance. Thus, there exists a technical problem of how to efficiently store and update a huge number of tiny (very small) objects in an object storage system with reduced latency as well.
Therefore, in light of the foregoing discussion, there exists a need to overcome the aforementioned drawbacks associated with the conventional methods of improving the performance of object storage systems dealing with huge number of tiny (very small) objects. SUMMARY
The present disclosure provides a computer-implemented method of updating key/value pair in an object storage system and the object storage system. The present disclosure provides a solution to the existing problem of how to efficiently store and update a huge number of tiny (very small) objects in an object storage system with reduced latency. An objective of the present disclosure is to provide a solution that overcomes at least partially the problems encountered in the prior art and provides an improved computer-implemented method of updating key/value pair in an object storage system, and an improved object storage system.
One or more objectives of the present disclosure are achieved by the solutions provided in the enclosed independent claims. Advantageous implementations of the present disclosure are further defined in the dependent claims.
In one aspect, the present disclosure provides a computer-implemented method of updating a key/value pair in an object storage system, in which each of the one or more objects contains a range of key/value pairs. The computer-implemented method includes receiving a key number and a corresponding value and identifying a subset of objects for which a starting key in the range is less than or equal to the received key number. The computer- implemented method further includes identifying an object in the subset of objects for which an end key in the range is equal to or greater than the received key number and updating the corresponding value to the received key number in the identified object.
The computer-implemented method provides an efficient way of managing, updating, and storing tiny (i.e., very small) objects in the object storage system. The computer- implemented method efficiently updates the objects using the key/value pairs. Further, the computer- implemented method provides an efficient and accurate identification of the subset of objects for which the starting key in the range is less than or equal to the received key number, further resulting into an accurate and reliable identification of the object for which the end key in the range is equal to or greater than the received key number. By virtue of identifying the subset of the objects, the object having the received key number is identified more accurately and efficiently.
In an implementation form, each object has a name corresponding to a starting key in the range. Beneficially, the naming of the object by its corresponding starting key enables to locate the object in the object storage system more efficiently and with reduced latency.
In a further implementation form, the computer-implemented method further comprises deducing the end key in the range of an object based on the name of the subsequent object.
Beneficially, deducing the end key in the range of the object based on the name of the subsequent object enables to identify and locate the object from the subset of the objects in the object storage system more efficiently and accurately.
In a further implementation form, updating the value includes reading the object and rewriting the object with the updated value.
In a further implementation form, reading the object includes reading metadata relating to the object to determine a plurality of sub-ranges within the object, identifying a sub-range containing the received key number, and reading the identified sub-range.
Beneficially, reading metadata relating to the object and identifying the sub-range of keys containing the received key number eliminates the requirement of reading an entire object that provides an efficient way of reading the object.
In a further implementation form, updating the value includes rebuilding the object based on locally stored data.
Beneficially, rebuilding the object based on locally stored data provides an efficient way of updating the object in the object storage system with reduced latency.
In a further implementation form, updating the value comprises adding a new key/value pair if the received key number is not found within the object.
Beneficially, the addition of new key/value pair if the received key number is not found enables to maintain an ideal size and the range of the object in the object storage system more efficiently and accurately.
In a further implementation form, if the received key is at the end of the range of key/values pairs in the identified object, updating the value comprises generating a new object including the new key/value pair if the number of key/value pairs in the object is greater than a predefined threshold.
Beneficially, generating a new object including the new key/value pair if the number of key/value pairs in the object is greater than a predefined threshold enables to maintain an ideal size of the object in the object storage system.
In a further implementation form, if the received key is within the range of key/values pairs in the identified object, updating the value comprises dividing the object into two or more objects if the number of key/value pairs in the object is greater than a predefined threshold.
Beneficially, the division of the object into two or more objects, if the number of key/value pairs in the object is greater than the predefined threshold, enables an equal size of each object.
In a further implementation form, the computer-implemented method further comprises merging the object with one or more adjacent objects if the number of key/value pairs in the object is below a predefined threshold.
Beneficially, the merging of two or more objects into one object, if the number of key/value pairs in the object is less than the predefined threshold, enables an equal size of each object.
In another aspect, the present disclosure provides a computer-readable medium comprising instructions which, when executed by a processor, cause the processor to perform the method.
The processor achieves all the advantages and technical effects of the method after execution of the method.
In a yet another aspect, the present disclosure provides an object storage system comprising one or more processors configured to perform the method.
The object storage system achieves all the advantages and technical effects of the computer- implemented method of the present disclosure.
It has to be noted that all devices, elements, circuitry, units, and means described in the present application could be implemented in the software or hardware elements or any kind of combination thereof. All steps which are performed by the various entities described in the present application as well as the functionalities described to be performed by the various entities are intended to mean that the respective entity is adapted to or configured to perform the respective steps and functionalities. Even if, in the following description of specific embodiments, a specific functionality or step to be performed by external entities is not reflected in the description of a specific detailed element of that entity that performs that specific step or functionality, it should be clear for a skilled person that these methods and functionalities can be implemented in respective software or hardware elements, or any kind of combination thereof. It will be appreciated that features of the present disclosure are susceptible to being combined in various combinations without departing from the scope of the present disclosure as defined by the appended claims.
Additional aspects, advantages, features, and objects of the present disclosure would be made apparent from the drawings and the detailed description of the illustrative implementations construed in conjunction with the appended claims that follow.
BRIEF DESCRIPTION OF THE DRAWINGS
The summary above, as well as the following detailed description of illustrative embodiments, is better understood when read in conjunction with the appended drawings. For the purpose of illustrating the present disclosure, exemplary constructions of the disclosure are shown in the drawings. However, the present disclosure is not limited to specific methods and instrumentalities disclosed herein. Moreover, those in the art will understand that the drawings are not to scale. Wherever possible, like elements have been indicated by identical numbers.
Embodiments of the present disclosure will now be described, by way of example only, with reference to the following diagrams wherein:
FIG. 1 is a flow chart of a computer-implemented method of updating a key/value pair in an object storage system, in accordance with an embodiment of the present disclosure;
FIG. 2 is an illustration of an object storage system that includes generation of a new object, in accordance with an embodiment of the present disclosure; FIG. 3 is an illustration of an object storage system describing the retrieval of a key value from an object, in accordance with an embodiment of the present disclosure; and
FIG. 4 is a block diagram that illustrates various exemplary components of an object storage system, in accordance with an embodiment of the present disclosure.
In the accompanying drawings, an underlined number is employed to represent an item over which the underlined number is positioned or an item to which the underlined number is adjacent. A non-underlined number relates to an item identified by a line linking the nonunderlined number to the item. When a number is non-underlined and accompanied by an associated arrow, the non-underlined number is used to identify a general item at which the arrow is pointing.
DETAILED DESCRIPTION OF EMBODIMENTS
The following detailed description illustrates embodiments of the present disclosure and ways in which they can be implemented. Although some modes of carrying out the present disclosure have been disclosed, those skilled in the art would recognize that other embodiments for carrying out or practicing the present disclosure are also possible.
FIG. 1 is a flow chart of a computer-implemented method of updating a key/value pair in an object storage system, in accordance with an embodiment of the present disclosure. With reference to the FIG. 1, there is shown a flow chart of a computer- implemented method 100 of updating a key/value pair in an object storage system. The computer- implemented method 100 includes steps 102 to 108.
There is provided the computer- implemented method 100 of updating a key/value pair in an object storage system in which each of one or more objects contains a range of key/value pairs. The computer-implemented method 100 may also be referred to as a key division function in which various keys are divided across objects in the object storage system. The computer-implemented method 100 is used to determine to which object in the object storage system belongs to a given key (or a key number). For instance, in an implementation, the object storage system may include three objects and each object may have the range of key/value pairs in a form of integer numbers. These three objects may be named as an object “0”, an object “100”, and an object “200”. Moreover, each of the three objects contains the range of key/value pairs, such as the object “0” contains the range from 0 to 99 keys, the object “100” contains the range from 100 to 199 keys, and the object “200” contains the range from 200 to INF keys, where INF (infinite) represents a maximum theoretical value of the keys. Beneficially, in the object storage system, each object contains the range of key/value pairs therefore, the object storage system eliminates the requirement of generating an object per key, hence results in an efficient generation of objects. The computer- implemented method 100 is executed by the object storage system, described in detail, for example, in FIG. 4.
At step 102, the computer-implemented method 100 comprises, receiving a key number and a corresponding value. The received key number corresponds to the key number that is to be updated with the corresponding value. For example, the received key number may be “150”. In the aforementioned scenario, the received key number (i.e., “150”) is updated with the corresponding value in the object in which the received key number is located.
At step 104, the computer-implemented method 100 further comprises, identifying a subset of objects for which a starting key in the range is less than or equal to the received key number. In the aforementioned implementation scenario of the object storage system, each of the three objects are searched and the subset of objects are identified for which the starting key in the range of keys is less than or equal to the received key number. The starting key refers to a key that lies at a first location in the range of keys. For example, in case of the object “0”, the starting key is “0” in the range of keys from 0 to 99. Similarly, in case of the object “100” and the object “200”, the starting keys are 100 and 200, respectively. Furthermore, for the received key number (i.e., "150”), out of the three objects, the object “0” and the object “100” are identified as the subset of objects because in case of the object “0” and the object “100”, the starting key in the range is less than the received key number (i.e., “150”). Beneficially, the identification of the subset of objects for which the starting key in the range is less than or equal to the received key number enables to locate the received key number in the object storage system more efficiently and with reduced latency.
In accordance with an embodiment, each object has a name corresponding to the starting key in the range. Each of the one or more objects in the object storage system cannot have overlapping ranges of the key/value pairs due to which each of the two or more objects cannot have similar names. For example, the object “0” contains the range from 0 to 99 keys, the object “100” includes the range of keys from 100 to 199, and the object “200” includes the range of keys from 200 to INF, and therefore, no overlapping of keys is there. Similarly, in the case of the object “0”, the object “100”, and the object “200”, the starting keys are 0, 100, and 200, respectively and the names of the objects are the object “0”, the object “100”, and the object “200” respectively. Therefore, no two objects can have similar names. Also, each of the object “0”, the object “100”, and the object “200” includes their respective starting keys "0", “100”, and “200”, in their names. Therefore, for a given key number, an object having the given key number may be located based only on the object name in the object storage system.
Moreover, in an example, in the object storage system, each of the one or more objects may be represented by the name of the object that includes an object ID that is represented using strings. The name of the object includes an object ID of 8 bytes that is divided into four parts named as bucket ID, prefix, most significant bit (MSB), and least significant bit (LSB). Furthermore, the name of the object can be represented as “my-bucket/my-folder/0140- 0AFF-090B-0007”. In another example, the object ID is represented in the form of strings such as the object ID “1” is represented as “0000-0000-0000-0001”. Similarly, the object ID “2” is represented as “0000-0000-0000-0002”, the object ID “255” is represented as “0000- 0000-0000-00FF”, the object ID “16385” is represented as “0000-0000-0000-4001”, and the object ID “97000000” is represented as “0000-0000-05C8-1A40”. The object ID (whose size is 8 bytes) is represented in the form of strings as described in Table 1.
Figure imgf000011_0001
Beneficially, the naming of the object by its corresponding starting key enables to locate the object in the object storage system more efficiently and with reduced efficiency.
At step 106, the computer-implemented method 100 further comprises, identifying an object in the subset of objects for which an end key in the range is equal to or greater than the received key number. After identification of the subset of objects, the object in the subset of objects is identified for which the end key in the range is equal to or greater than the received key number (i.e., “150”). The end key refers to a key that lies at the end in the range of keys. The end key can be deduced from the starting key of the subsequent object. For example, for the object “0”, the end key is “99” which is deduced from the starting key of the object “100”. Similarly, for the object “100”, the end key is “199”, deduced from the starting key of the object “200”. In order to identify the object in the subset of objects (i.e., the object “0” and the object “100”) for the received key number (i.e., “150”), the end key must be equal or greater than the received key number (i.e., “150”). Therefore, out of the subset of objects (i.e., the object “0” and the object “100”), the object “100” is identified for the received key number “150”, because the end key (i.e., 199) for the object “100” is greater than the received key number (i.e., “150”). Alternatively stated, the received key number (i.e., “150”) lies in the object “100”.
In accordance with an embodiment, the computer-implemented method 100 further comprises, deducing the end key in the range of an object based on the name of the subsequent object. Since each object has the staring key in its name, therefore, the end key of a preceding object can be deduced from name of the subsequent object. For example, the end key (i.e., “99”) of the object “0” is deduced from name of the object “100” that includes the starting key “100” in its name. The end key is deduced from the starting key of the subsequent object when the objects are ordered by their names.
Beneficially, deducing the end key in the range of the object based on the name of the subsequent object enables to identify and locate the object from the subset of the objects in the object storage system more efficiently and accurately.
At step 108, the computer-implemented method 100 comprises, updating the corresponding value to the received key number in the identified object. After identification of the object (i.e., the object “100”) in which the received key number (i.e., “150”) is located, the value corresponding to the received key number (i.e., “150”) is updated in the identified object (i.e., the object “100”). Similarly, if another key number (e.g., “250”) is received and required to update then, the object containing the other key number (i.e., “250”) is identified. Therefore, the object “200” is identified having the range of keys from 200 to 299 and the other key number (i.e., “250”) is located and updated with the corresponding value. In this way, the computer-implemented method 100 enables the object storage system to update the object more efficiently.
In accordance with an embodiment, updating the value includes reading the object and rewriting the object with the updated value. Updating the corresponding value for the received key number (i.e., “150”) includes either reading, more particularly, a partial reading of the object “100”. An exemplary scenario of the partial reading of an object is described in detail, for example, in FIG. 3.
In accordance with an embodiment, the computer-implemented method 100 further comprises, reading the object that includes reading metadata relating to the object to determine a plurality of sub-ranges within the object, identifying a sub-range containing the received key number, and reading the identified sub-range. The metadata of the object is read while reading the object. The reason being the metadata of the object can be more efficiently accessed than the object data. Since, each object includes the range of keys and the range of keys is divided into the non-overlapping sub-ranges of keys. The keys and values of the sub-ranges of keys appear sequentially in the object and, the metadata of the object maintains a mapping of the plurality of sub-ranges of keys in the object through which a sub-range of keys that contains the received key number is identified in the object. Alternatively stated, the metadata of the object is used to locate a relevant sub-range of keys containing the received key number. For locating the relevant sub-range of keys, each subrange of keys is read from the object data and the relevant keys and values are located in the read sub-range based on the received key number. Beneficially, reading metadata relating to the object and identifying the sub-range of keys containing the received key number eliminates the requirement of reading an entire object, hence results in an efficient way of reading the object.
In accordance with an embodiment, updating the value includes rebuilding the object based on locally stored data. The rebuilding of the object corresponds to the rewriting of the values with the updated value. The values of the keys in the object are retrieved through the data that is stored locally and is used to re-write the object. In an example, the key/value pairs of the object are stored in the database locally without backing up metadata to the object storage system in a cloud storage without limiting the scope of the present disclosure. In an implementation, the object storage system can maintain a local copy of the metadata to serve local requests efficiently. The local storage can be accessed to retrieve the values for certain unchanged keys and re-write the object. Furthermore, in order to avoid duplicating the entire object storage locally, only a cache of the most frequently accessed objects can be stored in the local database, as a partial optimization. For example, in an object storage system with an object “1” contains the range from 1 to 100 keys. To update a key with key number “10” with a new value, the values for the keys 1 to 9 and 11 to 100 are retrieved from the local database. Thereafter, the retrieved key/value pairs are combined with the key “10” and its corresponding new value. Finally, the object “1” is updated with its updated key/value pair. Beneficially, rebuilding the object based on locally stored data provides an efficient way of updating the object in the object storage system with reduced latency.
In accordance with an embodiment, updating the value includes adding a new key/value pair if the received key number is not found within the object. For example, in the object storage system with the object “1” contains the range from 1 to 100 keys. To update the key with the key number “10” with the new value, the key with the key number “10” is identified. If the key with the key number “10” is not found in the object “1” due to the deletion of the key/value pair, then a new key with the key number “10” with its corresponding updated value is created within the object “1”. Beneficially, the addition of new key/value pair if the received key number is not found enables to maintain the predefined threshold size and the range of the object in the object storage system more efficiently and accurately.
In accordance with an embodiment, if the received key is at the end of the range of key/values pairs in the identified object, updating the value comprises generating a new object including the new key/value pair if the number of key/value pairs in the object is greater than a predefined threshold. The predefined threshold may be defined as the maximum ideal size of an object in terms of a total number of key/value pairs lying within the object. For example, the object “0” contains the range from 0 to 99, the object “100” contains the range from 100 to 199, and the object “200” contains the range from 200-INF and the received key number is “300”. The ideal size of each object is 100 keys. Furthermore, the identified object that contains the received key number (i.e., the key “300”) is the object “200” and the received key number “300” is located at the end of the range (i.e., 200 to INF) of the key/value pairs in the object “200”. However, the addition of a new key/value pair within the object “200” exceeds the ideal size of the object “200”. Therefore, a new object is created. An exemplary scenario of generating a new object including the new key/value pair is described in detail, for example, in FIG. 2.
Beneficially, generating a new object including the new key/value pair if the number of key/value pairs in the object is greater than a predefined threshold enables to maintain an ideal size of the object in the object storage system more.
In accordance with an embodiment, if the received key is within the range of key/values pairs in the identified object, updating the value comprises dividing the object into two or more objects if the number of key/value pairs in the object is greater than a predefined threshold. Each object has a maximum ideal size in terms of the number of key/value pairs after which any key/value pair cannot be added to the object. If the size of the object becomes greater than the predefined threshold value due to the addition of the key/value pair, then, the object is divided into two or more objects. Alternatively stated, if the number of key/value pairs in the object is greater than the predefined threshold value, then, in that case, the object that contains the received key is divided into two or more objects. The division (or split) operation can be executed either online or in the background during a garbage collection process. Beneficially, the division of the object into two or more objects, if the number of key/value pairs in the object is greater than the predefined threshold, enables an equal size of each object.
In accordance with an embodiment, the computer-implemented method 100 further comprises, merging the object with one or more adjacent objects if the number of key/value pairs in the object is below the predefined threshold. Each object has an ideal size in terms of the number of key/value pairs below which any number of key/value pairs can be merged to the object. If the size of the object becomes less than the predefined threshold value due to the deletion of the key/value pair, then, two or more objects are merged into one object. Alternatively stated, if the number of key/value pairs in the object is less than the predefined threshold value, then, in that case, the object that contains the received key is merged with two or more objects. The merge operation can be executed either online or in the background during a garbage collection process. Beneficially, the merging of two or more objects into one object, if the number of key/value pairs in the object is less than the predefined threshold, enables an equal size of each object.
Thus, the computer-implemented method 100, provides an efficient way of managing, updating, and storing tiny (i.e., very small) objects in the object storage system. The computer-implemented method 100 efficiently updates the objects using the key/value pairs. Further, the computer-implemented method 100 provides an efficient and accurate identification of the subset of objects for which the starting key in the range is less than or equal to the received key number, further resulting into an accurate and reliable identification of the object for which the end key in the range is equal to or greater than the received key number. By virtue of identifying the subset of the objects, the object having the received key number is identified more accurately and efficiently. Moreover, the computer-implemented method 100 ensures the ideal size and the ideal range of the objects in the object storage system by merging and division of the objects. Additionally, the computer- implemented method 100 supports efficient management of the objects with reduced latency by storing, updating, and retrieving the key/value pairs.
The steps 102 to 108 are only illustrative, and other alternatives can also be provided where one or more steps are added, one or more steps are removed, or one or more steps are provided in a different sequence without departing from the scope of the claims herein.
There is further provided a computer-readable medium comprising instructions which, when executed by a processor, cause the processor to perform the computer-implemented method 100. In an example, the instructions are implemented on the computer-readable media which include, but is not limited to, Electrically Erasable Programmable Read-Only Memory (EEPROM), Random Access Memory (RAM), Read Only Memory (ROM), Hard Disk Drive (HDD), Flash memory, a Secure Digital (SD) card, Solid-State Drive (SSD), a computer readable storage medium, and/or CPU cache memory. In an example, the instructions are generated by a computer program, which is implemented in view of the computer-implemented method 100, and for use in implementing the computer- implemented method 100 on one or more processors.
FIG. 2 is an illustration of an object storage system that includes generation of a new object, in accordance with an embodiment of the present disclosure. FIG. 2 is described in conjunction with elements from the FIG. 1. With reference to FIG. 2, there is shown an object storage system 202 that includes a first object 204, a second object 206, a third object 208, and a fourth object 210.
The object storage system 202 corresponds to the system that includes one or more objects and each object includes a range of keys. The object storage system 202 is configured to perform the computer- implemented method 100 (of FIG. 1) of updating key/value pairs.
Each of the first object 204, the second object 206, the third object 208, and the fourth object 210 corresponds to the object that includes multiple key/value pairs. The multiple keys/value pairs are updated, stored, and retrieved by implementation of the computer-implemented method 100.
Initially, the object storage system 202 includes three objects, such as the first object 204 (may also be represented as object “0”), the second object 206 (may also be represented as object “100”), and the third object 208 (may also be represented as object “200”). Moreover, each of the three objects contains the range of key/value pairs, such as the first object 204 (i.e., the object “0”) contains the range of keys from 0 to 99, the second object 206 (i.e., the object “100”) contains the range of keys from 100 to 199, and the third object 208 (i.e., the object “200”) contains the range of keys from 200 to INF. For a received key number (i.e., “300”) is required to be updated with a corresponding value in the object storage system. Therefore, each object out of the three objects (i.e., the first object 204, the second object 206, and the third object 208) in the object storage system 202 is searched, and the subset of objects is identified for which the starting key in the range of keys is less than or equal to the received key number (i.e., the key “300”). The first object 204 with the starting key “0”, the second object 206 with the starting key “100”, and the third object 208 with starting key “200” are identified as the subsets of objects because, in case of the first object 204, the second object 206, and the third object 208, the starting key in the range are 0, 100, and 200 that is less than the received key number (i.e., “300”). Thereafter, the objects (i.e., the third object 208) for which the end key is equal or greater than the received key number (i.e., “300”) is identified. The received key number (i.e., “300”) is at the end of the range (i.e., the range from 200 to INF) of key/values pairs in the identified object (i.e., the third object 208). However, if the new key/value pair (i.e., of the received key number “300”) is added in the identified object (i.e., the third object 208) then the total number of keys will become 101 keys in the identified object (i.e., the third object 208) which is more than the predefined threshold (i.e., the maximum ideal size of each object). Therefore, a new object (i.e., the fourth object 210) including the new key/value pair (i.e., “300”) is generated in the object storage system 202. Finally, after the addition of the key/value pair (i.e., “300”) the object storage system 202 contains the first object 204 that represents the keys from 0 to 99, the second object 206 that represents the keys from 100 to 199, the third object 208 that represents the keys from 200 to 299 and the fourth object 210 that represents the keys from 300 to infinite (i.e., the maximum theoretical value of the key).
Beneficially, in the object storage system, the updating and generation of the new object in the object storage system 202 if the object exceeds the predefined threshold enables the object storage system 202 to maintain the ideal size and the ideal range of the objects in the object storage system 202.
FIG. 3 is an illustration of an object storage system describing the retrieval of a key value from an object, in accordance with an embodiment of the present disclosure. FIG. 3 is described in conjunction with elements from the FIGs. 1 and 2. With reference to FIG. 3, there is shown an object storage system 302 that includes a first object 304. There is further shown a number of key/value pairs along with a plurality of sub-ranges.
The first object 304 may correspond to the Object “100” of FIG. 1. The object storage system 302 corresponds to the object storage system 202 (of FIG. 2).
The object storage system 302 includes the first object 304 (i.e., the object “100”) that stores multiple keys and their corresponding values in the range from 100 to 199 keys. To retrieve the required key and its corresponding value, the object storage system 302 by using the computer- implemented method 100 divides the range of the first object 304 (i.e., the object “100”) into the plurality of sub-ranges, that is ten small sub-ranges. For example, the range of keys from 100 to 199 is divided into 100 to 109, 110 to 119, 120 to 129, 130 to 139, and so on till 190 to 199 in the first object 304. Thereafter, each sub-range of the plurality of subranges is mapped to its start offset (whose size is in bytes) and stored in the object metadata. For example, a start offset of the subrange 100 to 109 is “0”, a start offset of the subrange 110 to 119 is “20”, and a start offset of the subrange 120 to 129 is “40” and so on. To retrieve the value of a received key (e.g., “115”), then, using the computer- implemented method 100, the object (i.e., the first object 304) is identified in which the received key (i.e., “115”) is stored. Thereafter, the metadata of the identified object (i.e., the first object 304) is read and the sub-range (i.e., the range from 110 to 119 keys) is determined. Thereafter, the sub-range through its corresponding start offset (i.e., 20, whose size is 40 bytes) is read. The start offset of the identified object (i.e., the first object 304) contains all the keys and values of the corresponding sub-range (i.e., from 110 to 119 keys). Finally, the value of the key “115” is retrieved.
Beneficially, updating the key/value pair in the object storage system 302 eliminates the requirement of reading the entire object. Further, retrieving the value through metadata reduces the latency and enables to retrieve the value more efficiently.
FIG. 4 is a block diagram that illustrates various exemplary components of an object storage system, in accordance with an embodiment of the present disclosure. FIG. 4 is described in conjunction with elements from the FIGs. 1, 2, and 3. With reference to FIG. 4, there is shown a block diagram 400 that represents the object storage system 202 that includes one or more processors 402, a memory 404, and a network interface 406.
The one or more processors 402 includes suitable logic, circuitry, interfaces, and/or code that is configured to perform the computer-implemented method 100 in the object storage system 202. Examples of implementation of one or more processors 402 may include but are not limited to a central data processing device, a microprocessor, a microcontroller, a complex instruction set computing (CISC) processor, an application- specific integrated circuit (ASIC) processor, a reduced instruction set (RISC) processor, a very long instruction word (VLIW) processor, a state machine, and other processors or control circuitry.
The object storage system 202 (or the object storage system 302) provides updating of key/value pairs in which each of the one or more objects contains the range of key/value pairs. The object storage system 202 includes receiving the key number and the corresponding value. Further, the object storage system 202 includes identification of the subsets of the objects for which the starting key in the range is less than or equal to the received key number and the end key in the range is equal to or greater than the received key number to locate the object having the received key number. Further, the object storage system 202 includes updating the corresponding value of the received key number in the identified object. The memory 404 may include suitable logic, circuitry, interfaces, or code that is configured to store the instructions executable by the one or more processors 402. Examples of memory 404 may include, but are not limited to, an Electrically Erasable Programmable Read-Only Memory (EEPROM), Random Access Memory (RAM), Read-Only Memory (ROM), Hard Disk Drive (HDD), Flash memory, Solid-State Drive (SSD), persistent memory, remote direct memory access (RDMA), or CPU cache memory.
The network interface 406 is communicatively coupled to each of the memory 404 and the one or more processors 402 of the object storage system 202. Examples of the network interface 406 may include, but are not limited to, a computer port, a network socket, a network interface controller (NIC), and any other network interface device.
In an implementation, the object storage system 202 (or the object storage system 302) includes one or more processors 402 configured to perform the computer- implemented method 100.
Beneficially, the object storage system 202 (or the object storage system 302) that includes one or more processors 402 achieves all the advantages and technical effects of the computer-implemented method 100 of the present disclosure.
Modifications to embodiments of the present disclosure described in the foregoing are possible without departing from the scope of the present disclosure as defined by the accompanying claims. Expressions such as "including", "comprising", "incorporating", "have", "is" used to describe and claim the present disclosure are intended to be construed in a non-exclusive manner, namely allowing for items, components or elements not explicitly described also to be present. Reference to the singular is also to be construed to relate to the plural. The word "exemplary" is used herein to mean "serving as an example, instance or illustration". Any embodiment described as “exemplary” is not necessarily to be construed as preferred or advantageous over other embodiments or to exclude the incorporation of features from other embodiments. The word "optionally" is used herein to mean "is provided in some embodiments and not provided in other embodiments". It is appreciated that certain features of the present disclosure, which are, for clarity, described in the context of separate embodiments, may also be provided in combination in a single embodiment. Conversely, various features of the invention, which are, for brevity, described in the context of a single embodiment, may also be provided separately or in any suitable combination or as suitable in any other described embodiment of the disclosure.

Claims

1. A computer-implemented method (100) of updating a key/value pair in an object storage system (202, 302), in which each of the one or more objects contains a range of key/value pairs, the method comprising: receiving a key number and a corresponding value; identifying a subset of objects for which a starting key in the range is less than or equal to the received key number; identifying an object in the subset of objects for which an end key in the range is equal to or greater than the received key number; and updating the corresponding value to the received key number in the identified object.
2. The computer-implemented method (100) of claim 1 , wherein each object has a name corresponding to the starting key in the range.
3. The computer-implemented method (100) of claim 2, further comprising deducing the end key in the range of an object based on the name of the subsequent object.
4. The computer-implemented method (100) of any preceding claim, wherein updating the value includes reading the object and re-writing the object with the updated value.
5. The computer-implemented method (100) of claim 4, wherein reading the object includes: reading metadata relating to the object to determine a plurality of sub-ranges within the object, identifying a sub-range containing the received key number, and reading the identified sub-range.
6. The computer- implemented method (100) of any one of claims 1 to 3, wherein updating the value includes rebuilding the object based on locally stored data.
7. The computer-implemented method (100) of any preceding claim, wherein updating the value comprises adding a new key/value pair if the received key number is not found within the object.
8. The computer- implemented method (100) of claim 7, wherein, if the received key is at the end of the range of key/values pairs in the identified object, updating the value comprises generating a new object including the new key/value pair if the number of key/value pairs in the object is greater than a predefined threshold.
9. The computer-implemented method (100) of any preceding claim, wherein, if the received key is within the range of key/values pairs in the identified object, updating the value comprises dividing the object into two or more objects if the number of key/value pairs in the object is greater than a predefined threshold.
10. The computer- implemented method (100) of any preceding claim, further comprising merging the object with one or more adjacent objects if the number of key/value pairs in the object is below a predefined threshold.
11. A computer-readable medium comprising instructions which, when executed by a processor, cause the processor to perform the method of any preceding claim.
12. An object storage system (202, 302) comprising one or more processors (402) configured to perform the method any one of claims 1 to 10.
PCT/EP2022/055291 2022-03-02 2022-03-02 Method of updating key/value pair in object storage system and object storage system WO2023165691A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/EP2022/055291 WO2023165691A1 (en) 2022-03-02 2022-03-02 Method of updating key/value pair in object storage system and object storage system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/EP2022/055291 WO2023165691A1 (en) 2022-03-02 2022-03-02 Method of updating key/value pair in object storage system and object storage system

Publications (1)

Publication Number Publication Date
WO2023165691A1 true WO2023165691A1 (en) 2023-09-07

Family

ID=80953446

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2022/055291 WO2023165691A1 (en) 2022-03-02 2022-03-02 Method of updating key/value pair in object storage system and object storage system

Country Status (1)

Country Link
WO (1) WO2023165691A1 (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5926807A (en) * 1997-05-08 1999-07-20 Microsoft Corporation Method and system for effectively representing query results in a limited amount of memory
US20220043585A1 (en) * 2020-08-05 2022-02-10 Dropbox, Inc. System and methods for implementing a key-value data store

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5926807A (en) * 1997-05-08 1999-07-20 Microsoft Corporation Method and system for effectively representing query results in a limited amount of memory
US20220043585A1 (en) * 2020-08-05 2022-02-10 Dropbox, Inc. System and methods for implementing a key-value data store

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
ELLIOT K KOLODNER ET AL: "A Cloud Environment for Data-intensive Storage Services", CLOUD COMPUTING TECHNOLOGY AND SCIENCE (CLOUDCOM), 2011 IEEE THIRD INTERNATIONAL CONFERENCE ON, IEEE, 29 November 2011 (2011-11-29), pages 357 - 366, XP032098841, ISBN: 978-1-4673-0090-2, DOI: 10.1109/CLOUDCOM.2011.55 *

Similar Documents

Publication Publication Date Title
US9858303B2 (en) In-memory latch-free index structure
US11899641B2 (en) Trie-based indices for databases
US11182356B2 (en) Indexing for evolving large-scale datasets in multi-master hybrid transactional and analytical processing systems
US8868926B2 (en) Cryptographic hash database
US9031921B2 (en) Calculating deduplication digests for a synthetic backup by a deduplication storage system
US9047301B2 (en) Method for optimizing the memory usage and performance of data deduplication storage systems
US8620884B2 (en) Scalable blob storage integrated with scalable structured storage
US7418544B2 (en) Method and system for log structured relational database objects
US9155320B2 (en) Prefix-based leaf node storage for database system
US9262280B1 (en) Age-out selection in hash caches
US9183129B2 (en) Method and system for managing large write-once tables in shadow page databases
US9916313B2 (en) Mapping of extensible datasets to relational database schemas
US20200210399A1 (en) Signature-based cache optimization for data preparation
CN111832065A (en) Software implemented using circuitry and method for key-value storage
US10509780B2 (en) Maintaining I/O transaction metadata in log-with-index structure
US10503605B2 (en) Method of detecting source change for file level incremental backup
CN113535670B (en) Virtual resource mirror image storage system and implementation method thereof
CN111316255A (en) Data storage system and method for providing a data storage system
CN107209707B (en) Cloud-based staging system preservation
US10740316B2 (en) Cache optimization for data preparation
US10515055B2 (en) Mapping logical identifiers using multiple identifier spaces
US8156126B2 (en) Method for the allocation of data on physical media by a file system that eliminates duplicate data
US20200019539A1 (en) Efficient and light-weight indexing for massive blob/objects
US11620270B2 (en) Representing and managing sampled data in storage systems
WO2023165691A1 (en) Method of updating key/value pair in object storage system and object storage system

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22713339

Country of ref document: EP

Kind code of ref document: A1