WO2020237409A1 - Technologies for memory-efficient key-value lookup - Google Patents

Technologies for memory-efficient key-value lookup Download PDF

Info

Publication number
WO2020237409A1
WO2020237409A1 PCT/CN2019/088262 CN2019088262W WO2020237409A1 WO 2020237409 A1 WO2020237409 A1 WO 2020237409A1 CN 2019088262 W CN2019088262 W CN 2019088262W WO 2020237409 A1 WO2020237409 A1 WO 2020237409A1
Authority
WO
WIPO (PCT)
Prior art keywords
key
entry
value
compact
full
Prior art date
Application number
PCT/CN2019/088262
Other languages
French (fr)
Inventor
Haitao Ji
Sanjeev N. Trika
Fei A. LI
Xiangbin WU
Xinxin Zhang
Zhiyuan Zhang
Qianying Zhu
Original Assignee
Intel Corporation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Intel Corporation filed Critical Intel Corporation
Priority to PCT/CN2019/088262 priority Critical patent/WO2020237409A1/en
Publication of WO2020237409A1 publication Critical patent/WO2020237409A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9014Indexing; Data structures therefor; Storage structures hash tables

Definitions

  • a Key-Value (KV) storage device or system maps arbitrary-length key-strings to arbitrary-length value-strings.
  • the key is typically mapped to a value-location (e.g., a disk address and length pair) using a search structure such as a Log-structured-merge (LSM) tree, a B-tree, or a hash table.
  • LSM Log-structured-merge
  • B-tree e.g., a B-tree
  • ATL address translation table
  • the stored keys depending on the application, can be tens to hundreds of bytes, and significantly impact the ATL size, in part because the number of ATL entries is typically very large. The storing of variable sized keys in memory results in a very large memory-intensive (and therefore costly) ATL data structure.
  • the values are kept on disk or in storage, and their location information is a fixed size in the ATL entries. Values therefore do not significantly impact the size of the ATL.
  • FIG. 1 depicts an exemplary database configuration and its address translation table.
  • FIG. 2 illustrates a scheme that can be used to represent a key-value store.
  • FIG. 3 shows an example of an extended address translation table layout to handle key slice collisions.
  • FIG. 4A illustrates the chaining of both the table-slots and the mapping-entries.
  • FIG. 4B illustrates an example of linking of records.
  • FIGs. 5A-5C depict an example process that can be used to store a key-value pair associated with a key.
  • FIG. 6 depicts an example process that can be used to retrieve a value associated with a key.
  • FIG. 7 depicts a system.
  • FIG. 1 depicts an exemplary database configuration and its address translation table (ATL) .
  • the database holds 1, 073, 741, 824 key-value (KV) pairs with an average key size of 32 bytes and average value size of 800 bytes.
  • KV key-value
  • the ATL size is 45 GB. Accordingly, the ATL memory consumption is the primary expense of this configuration.
  • compression of the key in each ATL entry can be used to attempt to reduce a size of an ATL entry.
  • the effect of compressing the key of ATL entry is not guaranteed and not every type of key can be compressed efficiently.
  • the ATL size for the exemplary database of FIG. 1 reduces to 22.5 GB, which is still quite large.
  • compression and decompression take additional central processing unit (CPU) /other processing resources that can impact overall system performance negatively.
  • Another known technique involves keys in the ATL being replaced by a key’s digest. This is a way to reduce the ATL entry size when the digest is strong enough, e.g., a 32-byte SHA256 digest may be used to replace the variable-sized keys.
  • a 32 byte digest will not reduce the size of the ATL for the example of FIG. 1 at all, because the average key-size is already 32 bytes.
  • Various embodiments dynamically select when to keep the full key in the hash-table as opposed to only a slice or compact version of the full key.
  • a long variable-sized key string can be replaced by a short and potentially collidable key slice or compact version of the full key. Collisions can be resolved in a manner that does not affect the ATL size and performance of the main KV.
  • Various embodiments perform a first hash and calculate a strong digest, such as SHA256 and use a fixed-size slice of that digest (e.g., a portion of the digest or a second hash) and save that slice in the ATL entries. These slices are smaller than the full 32B digests and hence can save a significant amount of memory used to store an ATL entry.
  • Various embodiments handle collisions by packing multiple records into the ATL-slots, multiple entries within a record, along with a “next” pointer per record to chain the records. If both the hash-value as well as the slice-value collide, then those cases are handled by keeping the full key information in a “full” entry structure, which can be CPU cache line aligned. This dynamic selection of simple (slice-only) entries as opposed to full-key entries allows reduction of memory requirements.
  • Various embodiments can reduce the ATL size by more than 4x for the exemplary KV database of FIG. 1. For databases with larger keys, the benefits are even greater as the ATL-entry size is kept constant. However, when average key-sizes are smaller, the space saving benefits may be lower. Various embodiments require no additional disk or storage accesses and do not incur codec performance overheads.
  • KV stores include databases, website keeping track of transactions whereby a key is a transaction number and a value is a record, packet lookup, web hosting server keeping track of a file being accessed whereby a key is a website universal resource locator (URL) and the value is a count, or multimedia image or video storage and retrieval.
  • a key is a transaction number and a value is a record
  • packet lookup web hosting server keeping track of a file being accessed
  • a key is a website universal resource locator (URL) and the value is a count, or multimedia image or video storage and retrieval.
  • URL website universal resource locator
  • Various embodiments could be used by a personal computer, smart phone, tablet computer, server, rack, blade other computing devices, or integrated with a memory or storage device for KV storage and lookup.
  • Various embodiments may be provided as part of storage performance development kit (SPDK) .
  • FIG. 2 illustrates a scheme that can be used to represent a KV store.
  • hash values of keys may collide and refer to the same table slot but the key-slice values of the keys are different values.
  • two distinct keys K1 and K2 may have the same hash values h (K1) and h (K2) , but the key-slices S (K1) and S (K2) are different.
  • a hash-table structure can be used for ATL 202. However, other look-up tables or associative data structures can be used.
  • a hash of a received key can be used to identify a table slot in ATL 202.
  • a slot can include one or more records (e.g., 7 or other number of simple or simplified records) .
  • a record can include a valid indicator (e.g., 7 bits) , pad (e.g., 25 bits) , one or more simplified entries (e.g., 8 bytes /entry) , and pointer to next entry (e.g., 7 bits) .
  • a record can be considered valid if it stores one or more simplified entries.
  • a simplified entry can include a 2-byte key-slice and a 6-byte key-value storage location (or pointer) in disk or in storage.
  • the key-slice value can be determined as a first two bytes of a cryptographic hash of the key (e.g., SHA256, RIPEMD-160, or stronger hash algorithms such as SHA384, or weaker hash algorithms) , the first two bytes of the unmodified key, or using other techniques. Accordingly, a 32 byte average length key can be reduced to 2 bytes /key resulting in substantial savings of memory use to store an ATL slot.
  • a put request (e.g., Put (K, V)
  • the system first computes a hash of the key (e.g., h (Key) ) , calculates or determines a compact representation of the key or key-slice (e.g., S (Key) ) , and writes an entry for the simplified key in a simplified entry of a record in the table-slot associated with the h (Key) . If no open entry is available for the table-slot associated with the h (Key) (e.g., all entries in the slot are already valid) , then a new slot is allocated from a pool of free slots in another record in the same table-slot.
  • a compact representation of the key or key-slice and a 4 byte logical block address (LBA) and a 2 byte offset to a location in storage of a key-value pair can be written to the simplified entry.
  • LBA logical block address
  • a received key is hashed (h (Key)) and a compact representation of the key or key-slice is calculated (S (Key) ) .
  • the table slot associated with the hashed key is retrieved. All records with all simplified entries can be scanned to see that there is only one simplified key that matches the value of S (Key) .
  • the simplified key can be searched for in the entries in the table-slot associated with h (Key) using a linear search in the entry and any pointed-to-entries.
  • the record in the retrieved table slot associated with the matching simplified key (S (Key) ) is retrieved.
  • a simplified entry can include a compact representation of a key (e.g., key slice) , a 4 byte logical block address (LBA) , and a 2 byte offset to a location in storage of a key-value pair.
  • the key from the key-value pair can be used to validate (or not) that the corresponding value is to be provided as a response to the provided key by comparing the provided key against the key in the key-value pair. If there is a match between the provided key and the key in the key-value pair, the value of the key-value pair is provided to a requester.
  • a hash of a received key (e.g., encryption then a modulo operation to identify a table slot number) can identify a Table Slot 2.
  • a compact representation of the key can be calculated. Records in Table Slot 2 are examined to determine if the compact representation of the key is located in a simplified entry of a record. For the simplified entry with the corresponding compact representation of the key, the location of the key-value in storage is retrieved. The key-value can be retrieved from the storage location. The key is compared against the key of the key-value. If there is a match, the value is provided to the requester.
  • a record can be a size which matches a CPU cache line size, for optimized lookups. For example, if a CPU cache line is 64 bytes, the record can be 64 bytes, although other sizes can be used. For a 64 byte record within table slot with multiple 8 byte simplified entries, a single memory read yields 7 simplified entries. A prefetch of a table slot can retrieve multiple entries while processing entries in a table slot. In this example, a table slot can store up to 5 records, although other numbers of records can be stored.
  • the database usage (number of KVs and their sizes) is the same as that for the exemplary database in FIG. 1.
  • memory consumption for the hash-table is 225 slots *5 records/slot *64B/record which is approximately 10GB.
  • the hash-table is sized much smaller, to 225 entries, but allows an entry to hold 5 records with 7 simplified entries each, so that more than 1e9 simplified entries can be stored.
  • Each hash table slot in this configuration can hold 35 KV entries (5 records of 7 entries each) . In practice, it can hold more due in part to the record’s linked-list.
  • a size of a key slice can be a different size than 2 bytes.
  • a longer key slice can result in fewer collisions but incurs more memory use.
  • a number of records per table slot can be changed as can a number of entries in a record.
  • a table slot stores one or more records and a record can store either or both of simplified format entries and enhanced format entries.
  • a record can include “type” bits (e.g., 7 bits) to identify whether an entry in the record is simplified entry format or enhanced entry format. Accordingly, dynamic selection can be made of when full-keys are stored or when key-slices are stored.
  • the enhanced entry format includes a key slice (e.g., 2 bytes) , pointer to a full entry structure (e.g., 4 bytes) , and a pad to ensure the enhanced entry format is 8 bytes in size (e.g., 2 bytes) , although other enhanced entry format sizes can be used.
  • the full-entry structure can include (a) the full variable-length key, (b) KV location in storage, (c) a pointer to the next full-entry structure, and (d) a pad so that the full-entry is 64 bytes or otherwise cache line width aligned.
  • the full-entry structures can be aligned to 64 byes to match the CPU cache line size, for efficient field lookups and entry pre-fetching.
  • full-entries are stored in volatile memory for faster retrieval. Multiple full-entries can be linked together using the pointer to the next full-entry structure.
  • the full variable-length key can be represented or replaced by SHA256 (full-key) , so that the full key structure is a fixed-sized equivalent. However, if the key’s length is smaller than SHA256 (full-key) or some threshold length, the key itself can be saved.
  • the following provides an example of a KV storage operation (e.g., put operation) whereby a simplified entry is replaced with an enhanced entry because of collisions between the hash of multiple keys and collision of simplified keys.
  • a received key is hashed to determine a table slot associated with the key.
  • table slot 2 is identified. Records in table slot 2 are scanned to determine if a record for the key slice is present. If an entry with the same key slice is present in a record, a collision has occurred for both the hashed key and the simplified key (e.g., compact representation of a key) . If a simplified entry is stored in a record of table slot 2, an enhanced entry is formed and replaces the simplified entry.
  • a full entry structure is associated with the stored KV value and another full entry structure is associated with the previously stored KV value with the same compact representation of the key, where both full entry structures are referenced by the enhanced table. If an enhanced entry format is present for the simplified key, then another full entry structure is created for the key-value storage and associated with the enhanced entry structure using for example, the pointer to next full entry structure in a full entry structure. However, if no entry corresponding to the key slice is found in any record of the table slot, a simplified entry is formed for the put request for the key-value.
  • intermediate sized key-slices can be used to further save memory in cases of double collisions.
  • An intermediate key-size could be a longer slice of the hashed key.
  • the “Type bits” of FIG. 3 can be expanded to have additional state information.
  • the Type bits can be expanded to 14 bits (i.e., to 2 bits per entry) and can specify (e.g., up to two levels of) intermediate-sized entry formats.
  • a type 0 may specify a simplified entry format
  • types 1 and 2 may specify intermediate entry formats, intermediate1 and intermediate2.
  • Intermediate1 could indicate that two entries are combined and together hold up to 10 bytes of key-slice and 6 4 bytes of KV location information.
  • Intermediate2 could indicate that three entries are combined allowing for 18 bytes of key-slice and 6 bytes of KV location information.
  • FIG. 4A is an alternate view of the information in FIG. 3 to illustrate the chaining of both the table-slots and the enhanced entries, when needed.
  • FIG. 4A depicts that a record can include chained simplified or enhanced entries to handle both hash (first-level) collisions and key-slice (second level) collisions. Even though pointers are shown in the first level, a record of 7 entries is followed by a single pointer to another record of 7 entries, if needed.
  • a key slice is not found (e.g., empty hash slot or no key-slice corresponding to this key) and the result of “not found” is returned to a requester.
  • Another scenario is the key slice is part of a simplified entry and the entire key is not available in memory to verify that the KV pair in storage at the indicated location is the correct one for the user-specified key so a storage-read for the stored KV pair is performed, and the stored key is compared against the received key. If they match, the value is returned, and otherwise “not found” is returned.
  • An enhanced entry includes full-length keys, which are compared against user-specified keys. If there is a match between full-length key and a user-specified key, then the corresponding value is read from storage and provided to a user. Otherwise, “not found” is returned to a user /requester.
  • FIG. 4B illustrates an example of linking of records.
  • a slot in a table can be a fixed region of memory and can include at least zero or more records.
  • a record can include zero or more entries.
  • a record list can be created whereby a head record allocated to a slot links to record1, record1 links to record2, and so forth. Records other than the slot /header record can be dynamically allocated and flexibly stored in a variety of memory regions or locations.
  • An example of pseudocode for a “Put” (store) operation for a key that is present in an ATL is as follows. If the key is already in the ATL, the KV location is updated to reflect an updated KV storage location. For a simplified entry with a key that matches the received key, the storage location of the KV is updated to reflect a storage location of the KV.
  • FIGs. 5A-5C depict an example process that can be used to store a key-value pair associated with a key.
  • a request is received to store a key-value pair in a database.
  • the key-value is stored in the database at a location in storage.
  • a compact representation of the key is determined.
  • the compact representation can be a value determined using a hash calculation of the key.
  • the compact representation may be non-unique whereby different keys can share the same compact representation.
  • a table slot associated with a compact representation of the key is identified.
  • a second compact representation of the key is determined.
  • the second compact representation of the key can be the first several bytes of the key (e.g., 2 bytes) or another hash of the value from a hashed key.
  • the second compact representation may be non-unique whereby different keys can share the same second compact representation.
  • the process allocates a simple entry for the key-value pair in a record and stores the second compact representation of the key in the allocated simple entry.
  • a simple entry can include a second compact representation of the key and a location of a key-value in storage (e.g., logical block address and offset) . Note that if there is no free entry available in a record, then another record is allocated and linked to a slot or head record (or another record) and a free simple entry in a record is populated with the second compact representation of the key and a KV storage location.
  • An enhanced entry can include a reference (e.g., pointer) to a beginning of a list of one or more full entry structures.
  • a full entry structure can include the full key and a location of the KV in storage. Other examples of a full entry structure are described herein.
  • a full entry structure is created that stores the full key and storage location of the key-value.
  • a previously created full entry structure is linked to the full entry structure created in 530. Linking of the previously created full entry structure to the full entry structure created in 530 can involve setting a pointer to a memory location of a start of the full entry structure created in 530.
  • the previously created full entry structure can be referenced by the enhanced entry such that in a subsequent put or get request, the full entry structure can be retrieved via reading of the enhanced entry.
  • FIG. 6 depicts an example process that can be used to retrieve a value associated with a key.
  • a compact representation of a received key is determined.
  • a compact representation of the key can be a hash of the key. In some cases, a compact representation of multiple received keys can be the same.
  • a second compact representation of the received key is determined.
  • a second compact representation of the key can be a hash of the hashed key or a first several (e.g., 2) bytes of the hashed key or key. In some cases, a second compact representation of multiple received keys can be the same.
  • a determination is made as to whether there is an entry associated with the received key.
  • the compact representation can be used to identify a table slot and the second compact representation can be used to identify an entry within a record of the table slot. If there is an entry associated with the received key, then 608 follows. If there is no entry associated with the received key, 614 follows. At 608, a determination is made as to whether an entry associated with a received key is simple or enhanced. If the entry is enhanced, then 610 follows. At 610, a key from the full entry is read and compared against the received key. In some examples, a key equivalent (e.g., SHA256 (key)) is read and compared against the received key equivalent. If there is a match, then the value is provided to a requester of the value.
  • SHA256 key
  • the entry is simple, then 612 follows.
  • the key-value is read and the value is provided to a requester of the value if the key of the key-value matches the received key.
  • a null value is returned in response to no entry being present that is associated with the received key.
  • the null value can be returned in response to no entry being associated with the received key or the received key not matching the key of the key-value pair.
  • the average size of entry is decided by the simplified entry size (e.g., 8 bytes) and fixed, because arbitrary length key is converted into 2 byte key-slice) .
  • n is the number of entries placed in those bins.
  • m number of different bin-ids that a KV pair can have (the bin-ids for two KV pairs must be the same to be considered a collision)
  • n average number of KV entries placed in a hash-table slot
  • the improvements to reduced DRAM usage are not specific to these examples and apply broadly.
  • the savings are larger if key sizes are larger but can be smaller if the key sizes are smaller.
  • the savings can be primarily due to replacement of full-length keys by a dynamic selection of smaller key-slices (when there is no double-collision) and full-length keys (when there is double collision) .
  • 659k double-collisions and 707k full-entries were measured. In this experiment, memory utilization was improved by >4x compared to a baseline.
  • FIG. 7 depicts a system.
  • the system can use embodiments described herein to determine represent keys using compact representations and store full keys in memory if there is a collision of one or more compact representations.
  • System 700 includes processor 710, which provides processing, operation management, and execution of instructions for system 700.
  • Processor 710 can include any type of microprocessor, central processing unit (CPU) , graphics processing unit (GPU) , processing core, or other processing hardware to provide processing for system 700, or a combination of processors.
  • Processor 710 controls the overall operation of system 700, and can be or include, one or more programmable general-purpose or special-purpose microprocessors, digital signal processors (DSPs) , programmable controllers, application specific integrated circuits (ASICs) , programmable logic devices (PLDs) , or the like, or a combination of such devices.
  • DSPs digital signal processors
  • ASICs application specific integrated circuits
  • PLDs programmable logic devices
  • system 700 includes interface 712 coupled to processor 710, which can represent a higher speed interface or a high throughput interface for system components that needs higher bandwidth connections, such as memory subsystem 720 or graphics interface components 740, or accelerators 742.
  • Interface 712 represents an interface circuit, which can be a standalone component or integrated onto a processor die.
  • graphics interface 740 interfaces to graphics components for providing a visual display to a user of system 700.
  • graphics interface 740 can drive a high definition (HD) display that provides an output to a user.
  • HD high definition
  • High definition can refer to a display having a pixel density of approximately 100 PPI (pixels per inch) or greater and can include formats such as full HD (e.g., 1080p) , retina displays, 4K (ultra-high definition or UHD) , or others.
  • the display can include a touchscreen display.
  • graphics interface 740 generates a display based on data stored in memory 730 or based on operations executed by processor 710 or both. In one example, graphics interface 740 generates a display based on data stored in memory 730 or based on operations executed by processor 710 or both.
  • Accelerators 742 can be a fixed function offload engine that can be accessed or used by a processor 710.
  • an accelerator among accelerators 742 can provide compression (DC) capability, cryptography services such as public key encryption (PKE) , cipher, hash/authentication capabilities, decryption, or other capabilities or services.
  • DC compression
  • PKE public key encryption
  • an accelerator among accelerators 742 provides field select controller capabilities as described herein.
  • accelerators 742 can be integrated into a CPU socket (e.g., a connector to a motherboard or circuit board that includes a CPU and provides an electrical interface with the CPU) .
  • accelerators 742 can include a single or multi-core processor, graphics processing unit, logical execution unit single or multi-level cache, functional units usable to independently execute programs or threads, application specific integrated circuits (ASICs) , neural network processors (NNPs) , programmable control logic, and programmable processing elements such as field programmable gate arrays (FPGAs) .
  • Accelerators 742 can provide multiple neural networks, processor cores, or graphics processing units can be made available for use by artificial intelligence (AI) or machine learning (ML) models.
  • AI artificial intelligence
  • ML machine learning
  • the AI model can use or include any or a combination of a reinforcement learning scheme, Q-learning scheme, deep-Q learning, or Asynchronous Advantage Actor-Critic (A3C) , combinatorial neural network, recurrent combinatorial neural network, or other AI or ML model.
  • A3C Asynchronous Advantage Actor-Critic
  • Multiple neural networks, processor cores, or graphics processing units can be made available for use by AI or ML models.
  • Memory subsystem 720 represents the main memory of system 700 and provides storage for code to be executed by processor 710, or data values to be used in executing a routine.
  • Memory subsystem 720 can include one or more memory devices 730 such as read-only memory (ROM) , flash memory, volatile memory, or a combination of such devices.
  • Memory 730 stores and hosts, among other things, operating system (OS) 732 to provide a software platform for execution of instructions in system 700. Additionally, applications 734 can execute on the software platform of OS 732 from memory 730.
  • Applications 734 represent programs that have their own operational logic to perform execution of one or more functions.
  • Processes 736 represent agents or routines that provide auxiliary functions to OS 732 or one or more applications 734 or a combination.
  • OS 732, applications 734, and processes 736 provide software logic to provide functions for system 700.
  • memory subsystem 720 includes memory controller 722, which is a memory controller to generate and issue commands to memory 730. It will be understood that memory controller 722 could be a physical part of processor 710 or a physical part of interface 712. For example, memory controller 722 can be an integrated memory controller, integrated onto a circuit with processor 710.
  • a volatile memory is memory whose state (and therefore the data stored in it) is indeterminate if power is interrupted to the device. Dynamic volatile memory requires refreshing the data stored in the device to maintain state.
  • DRAM Dynamic Random Access Memory
  • SDRAM Synchronous DRAM
  • a memory subsystem as described herein may be compatible with a number of memory technologies, such as DDR3 (Double Data Rate version 3, original release by JEDEC (Joint Electronic Device Engineering Council) on June 27, 2007) .
  • DDR4 (DDR version 4, initial specification published in September 2012 by JEDEC) , DDR4E (DDR version 4) , LPDDR3 (Low Power DDR version3, JESD209-3B, August 2013 by JEDEC) , LPDDR4) LPDDR version 4, JESD209-4, originally published by JEDEC in August 2014) , WIO2 (Wide Input/Output version 2, JESD229-2 originally published by JEDEC in August 2014, HBM (High Bandwidth Memory, JESD325, originally published by JEDEC in October 2013, LPDDR5 (currently in discussion by JEDEC) , HBM2 (HBM version 2) , currently in discussion by JEDEC, or others or combinations of memory technologies, and technologies based on derivatives or extensions of such specifications.
  • the JEDEC standards are available at www. jedec. org.
  • system 700 can include one or more buses or bus systems between devices, such as a memory bus, a graphics bus, interface buses, or others.
  • Buses or other signal lines can communicatively or electrically couple components together, or both communicatively and electrically couple the components.
  • Buses can include physical communication lines, point-to-point connections, bridges, adapters, controllers, or other circuitry or a combination.
  • Buses can include, for example, one or more of a system bus, a Peripheral Component Interconnect (PCI) bus, a HyperTransport or industry standard architecture (ISA) bus, a small computer system interface (SCSI) bus, a universal serial bus (USB) , or an Institute of Electrical and Electronics Engineers (IEEE) standard 1394 bus.
  • PCI Peripheral Component Interconnect
  • ISA HyperTransport or industry standard architecture
  • SCSI small computer system interface
  • USB universal serial bus
  • IEEE Institute of Electrical and Electronics Engineers
  • system 700 includes interface 714, which can be coupled to interface 712.
  • interface 714 represents an interface circuit, which can include standalone components and integrated circuitry.
  • Network interface 750 provides system 700 the ability to communicate with remote devices (e.g., servers or other computing devices) over one or more networks.
  • Network interface 750 can include an Ethernet adapter, wireless interconnection components, cellular network interconnection components, USB (universal serial bus) , or other wired or wireless standards-based or proprietary interfaces.
  • Network interface 750 can transmit data to a remote device, which can include sending data stored in memory.
  • Network interface 750 can receive data from a remote device, which can include storing received data into memory.
  • Various embodiments can be used in connection with network interface 750, processor 710, and memory subsystem 720.
  • system 700 includes one or more input/output (I/O) interface (s) 760.
  • I/O interface 760 can include one or more interface components through which a user interacts with system 700 (e.g., audio, alphanumeric, tactile/touch, or other interfacing) .
  • Peripheral interface 770 can include any hardware interface not specifically mentioned above. Peripherals refer generally to devices that connect dependently to system 700. A dependent connection is one where system 700 provides the software platform or hardware platform or both on which operation executes, and with which a user interacts.
  • system 700 includes storage subsystem 780 to store data in a nonvolatile manner.
  • storage subsystem 780 includes storage device (s) 784, which can be or include any conventional medium for storing large amounts of data in a nonvolatile manner, such as one or more magnetic, solid state, or optical based disks, or a combination.
  • Storage 784 holds code or instructions and data 786 in a persistent state (i.e., the value is retained despite interruption of power to system 700) .
  • Storage 784 can be generically considered to be a "memory, " although memory 730 is typically the executing or operating memory to provide instructions to processor 710.
  • storage 784 is nonvolatile
  • memory 730 can include volatile memory (i.e., the value or state of the data is indeterminate if power is interrupted to system 700) .
  • storage subsystem 780 includes controller 782 to interface with storage 784.
  • controller 782 is a physical part of interface 714 or processor 710 or can include circuits or logic in both processor 710 and interface 714.
  • a non-volatile memory (NVM) device is a memory whose state is determinate even if power is interrupted to the device.
  • the NVM device can comprise a block addressable memory device, such as NAND technologies, or more specifically, multi-threshold level NAND flash memory (for example, Single-Level Cell ( “SLC” ) , Multi-Level Cell ( “MLC” ) , Quad-Level Cell ( “QLC” ) , Tri-Level Cell ( “TLC” ) , or some other NAND) .
  • SLC Single-Level Cell
  • MLC Multi-Level Cell
  • QLC Quad-Level Cell
  • TLC Tri-Level Cell
  • a NVM device can also comprise a byte-addressable write-in-place three dimensional cross point memory device, or other byte addressable write-in-place NVM device (also referred to as persistent memory) , such as single or multi-level Phase Change Memory (PCM) or phase change memory with a switch (PCMS) , NVM devices that use chalcogenide phase change material (for example, chalcogenide glass) , resistive memory including metal oxide base, oxygen vacancy base and Conductive Bridge Random Access Memory (CB-RAM) , nanowire memory, ferroelectric random access memory (FeRAM, FRAM) , magneto resistive random access memory (MRAM) that incorporates memristor technology, spin transfer torque (STT) -MRAM, a spintronic magnetic junction memory based device, a magnetic tunneling junction (MTJ) based device, a DW (Domain Wall) and SOT (Spin Orbit Transfer) based device, a thyristor based memory device, or a combination of any
  • a power source (not depicted) provides power to the components of system 700. More specifically, power source typically interfaces to one or multiple power supplies in system 700 to provide power to the components of system 700.
  • the power supply includes an AC to DC (alternating current to direct current) adapter to plug into a wall outlet.
  • AC power can be renewable energy (e.g., solar power) power source.
  • power source includes a DC power source, such as an external AC to DC converter.
  • power source or power supply includes wireless charging hardware to charge via proximity to a charging field.
  • power source can include an internal battery, alternating current supply, motion-based power supply, solar power supply, or fuel cell source.
  • system 700 can be implemented using interconnected compute sleds of processors, memories, storages, network interfaces, and other components.
  • High speed interconnects can be used such as PCIe, Ethernet, or optical interconnects (or a combination thereof) .
  • Embodiments herein may be implemented in various types of computing and networking equipment, such as switches, routers, racks, and blade servers such as those employed in a data center and/or server farm environment.
  • the servers used in data centers and server farms comprise arrayed server configurations such as rack-based servers or blade servers. These servers are interconnected in communication via various network provisions, such as partitioning sets of servers into Local Area Networks (LANs) with appropriate switching and routing facilities between the LANs to form a private Intranet.
  • LANs Local Area Networks
  • cloud hosting facilities may typically employ large data centers with a multitude of servers.
  • a blade comprises a separate computing platform that is configured to perform server-type functions, that is, a “server on a card. ” Accordingly, each blade includes components common to conventional servers, including a main printed circuit board (main board) providing internal wiring (i.e., buses) for coupling appropriate integrated circuits (ICs) and other components mounted to the board.
  • main board main printed circuit board
  • ICs integrated circuits
  • hardware elements may include devices, components, processors, microprocessors, circuits, circuit elements (e.g., transistors, resistors, capacitors, inductors, and so forth) , integrated circuits, ASICs, PLDs, DSPs, FPGAs, memory units, logic gates, registers, semiconductor device, chips, microchips, chip sets, and so forth.
  • software elements may include software components, programs, applications, computer programs, application programs, system programs, machine programs, operating system software, middleware, firmware, software modules, routines, subroutines, functions, methods, procedures, software interfaces, APIs, instruction sets, computing code, computer code, code segments, computer code segments, words, values, symbols, or any combination thereof. Determining whether an example is implemented using hardware elements and/or software elements may vary in accordance with any number of factors, such as desired computational rate, power levels, heat tolerances, processing cycle budget, input data rates, output data rates, memory resources, data bus speeds and other design or performance constraints, as desired for a given implementation. It is noted that hardware, firmware and/or software elements may be collectively or individually referred to herein as “module, ” “logic, ” “circuit, ” or “circuitry. ”
  • a computer-readable medium may include a non-transitory storage medium to store logic.
  • the non-transitory storage medium may include one or more types of computer-readable storage media capable of storing electronic data, including volatile memory or non-volatile memory, removable or non-removable memory, erasable or non-erasable memory, writeable or re-writeable memory, and so forth.
  • the logic may include various software elements, such as software components, programs, applications, computer programs, application programs, system programs, machine programs, operating system software, middleware, firmware, software modules, routines, subroutines, functions, methods, procedures, software interfaces, API, instruction sets, computing code, computer code, code segments, computer code segments, words, values, symbols, or any combination thereof.
  • a computer-readable medium may include a non-transitory storage medium to store or maintain instructions that when executed by a machine, computing device or system, cause the machine, computing device or system to perform methods and/or operations in accordance with the described examples.
  • the instructions may include any suitable type of code, such as source code, compiled code, interpreted code, executable code, static code, dynamic code, and the like.
  • the instructions may be implemented according to a predefined computer language, manner or syntax, for instructing a machine, computing device or system to perform a certain function.
  • the instructions may be implemented using any suitable high-level, low-level, object-oriented, visual, compiled and/or interpreted programming language.
  • IP cores may be stored on a tangible, machine readable medium and supplied to various customers or manufacturing facilities to load into the fabrication machines that actually make the logic or processor.
  • Coupled and “connected” along with their derivatives. These terms are not necessarily intended as synonyms for each other. For example, descriptions using the terms “connected” and/or “coupled” may indicate that two or more elements are in direct physical or electrical contact with each other. The term “coupled, ” however, may also mean that two or more elements are not in direct contact with each other, but yet still co-operate or interact with each other.
  • first, ” “second, ” and the like, herein do not denote any order, quantity, or importance, but rather are used to distinguish one element from another.
  • the terms “a” and “an” herein do not denote a limitation of quantity, but rather denote the presence of at least one of the referenced items.
  • asserted used herein with reference to a signal denote a state of the signal, in which the signal is active, and which can be achieved by applying any logic level either logic 0 or logic 1 to the signal.
  • follow or “after” can refer to immediately following or following after some other event or events. Other sequences of steps may also be performed according to alternative embodiments. Furthermore, additional steps may be added or removed depending on the particular applications. Any combination of changes can be used and one of ordinary skill in the art with the benefit of this disclosure would understand the many variations, modifications, and alternative embodiments thereof.
  • Disjunctive language such as the phrase “at least one of X, Y, or Z, ” unless specifically stated otherwise, is otherwise understood within the context as used in general to present that an item, term, etc., may be either X, Y, or Z, or any combination thereof (e.g., X, Y, and/or Z) . Thus, such disjunctive language is not generally intended to, and should not, imply that certain embodiments require at least one of X, at least one of Y, or at least one of Z to each be present.
  • An embodiment of the devices, systems, and methods disclosed herein are provided below.
  • An embodiment of the devices, systems, and methods may include any one or more, and any combination of, the examples described below.
  • Example 1 includes a computer-readable medium, comprising instructions stored thereon, that if executed by one or more processors, cause the one or more processors to: receive a request to store a first key-value pair; form an entry associated with the first key-value pair, the entry comprising a compact representation of the key of the first key-value pair and an identifier of a storage location of the key-value pair, wherein the compact representation of the key is non-unique; and store the entry in a record of a table slot into memory.
  • Example 2 includes any example, wherein the compact representation of the key comprises multiple bytes of a hashed version of the key.
  • Example 3 includes any example and comprising instructions stored thereon, that if executed by one or more processors, cause the one or more processors to: receive a request to store a second key-value pair and based on a determination that a compact representation of the key of the second key-value pair that matches the compact representation the key of the first key-value pair: form a full entry that includes a full version of the key of the key-value pair and an identifier of a storage location of the first key-value pair, form a second full entry that includes a full version of the key of the second key-value pair and an identifier of a storage location of the second key-value pair, form a second entry that includes the compact representation of the key and a reference to the full entry, and replace the entry with the second entry.
  • Example 4 includes any example and comprising instructions stored thereon, that if executed by one or more processors, cause the one or more processors to: receive a request to retrieve a requested key-value pair; determine a compact representation of the key of the requested key-value pair; identify the second entry based on the compact representation of the key of the requested key-value pair; and provide a value from the full entry or the second full entry based on the key of the requested key-value pair matching a key in the full entry or the second full entry.
  • Example 5 includes any example and comprising instructions stored thereon, that if executed by one or more processors, cause the one or more processors to: perform a hash on the key of the first key-value pair; determine an index of the table slot associated with the hashed key; generate a first compact representation of the hash of the key of the first key-value pair; inspect the determined table slot to identify any entry associated with the first compact representation; identify any entry with a compact representation that matches the first compact representation; retrieve a key-value associated with the identified entry; and provide the value from the retrieved key-value based on the key matching a key in the retrieved key-value.
  • Example 6 includes any example, wherein: the table slot comprises one or more records, the one or more records include one or more entries, and a record size matches a cache line width and one or more entries are pre-fetched in a single record read.
  • Example 7 includes any example and comprising instructions stored thereon, that if executed by one or more processors, cause the one or more processors to: store the first key-value pair into storage.
  • Example 8 includes a method comprising: receiving a request to store a key-value pair; storing the key-value pair in a storage device; determining a hash value of the key of the key-value; generating a compact key from the key for inclusion in a first entry, wherein the compact key is shorter than the key and the compact key is non-unique and is capable of association with multiple different keys; associating a table slot with the hash value, wherein the table slot comprises one or more records, the one or more records comprise one or more entries, and a first entry comprises the compact key and a pointer to the key-value pair in the storage; and storing the first entry in memory.
  • Example 9 includes any example and comprising: receiving a second key; determining a hash of the second key; identifying a table slot associated with the hash of the second key; determining a second compact key for the second key; retrieving a second entry associated with the second compact key; and retrieving a key-value from storage based on a pointer in the retrieved second entry.
  • Example 10 includes any example and comprising: receiving a request to store a second key-value pair; determining a table slot based on a hash of the second key of the second key-value pair; determining a second compact key of the second key; identifying the first entry as associated with the second compact key; forming an enhanced entry that includes the second compact key of the second key and identifies a list of one or more full entries; and replacing the first entry with the enhanced entry.
  • Example 11 includes any example, wherein the enhanced entry comprises the second compact key and a pointer to a full entry.
  • Example 12 includes any example and comprising: storing the one or more full entries in memory, wherein the one or more full entries comprise a key, pointer to location of the key-value in storage, and pointer to a next full entry.
  • Example 13 includes any example and comprising: receiving a request to store a second key-value pair; determining a table slot based on a hash of a second key of the second key-value pair; determining a second compact key based on the second key; determining that the second compact key is associated with an enhanced entry in an existing record; forming a full entry that refers to the second key-value pair; and updating a pointer associated with the enhanced entry to identify the formed full entry.
  • Example 14 includes any example and comprising: receiving a request to provide a value associated with a second key; determining a table slot based on a hash of the second key; determining a compact second key based on the second key; identifying the compact second key is associated with a simple or enhanced entry in an existing record; for an associated simple entry: retrieve a key-value from storage associated with the second compact key or for an associated enhanced entry: retrieve a full entry associated with the compact second key, retrieve a pointer to the value associated with the second key, and retrieve the value associated with the second key from storage.
  • Example 15 includes any example, wherein the table slot comprises multiple records, a record comprises multiple entries, and an entry is simple or enhanced.
  • Example 16 includes any example, wherein a record is cache line aligned.
  • Example 17 includes any example and comprising prefetching one or more entries in a record in addition to retrieving an entry associated with the compact key.
  • Example 18 includes a system comprising: at least one processor; at least one memory; and at least one storage device, wherein the at least one processor is to: receive a request to store key-value entries; store the key-value in a storage device; determine a hash value of the key of the key-value; generate a compact key from the key for inclusion in an entry, wherein the compact key is shorter that the key and the compact key is non-unique and is capable of association with multiple different keys; associate a table slot stored in a memory with the hash value, wherein the table slot comprises one or more records, a comprises one or more entries, and an entry comprises the compact key and a pointer to the key-value in a storage device; and store the entry in the table slot in the memory.
  • Example 19 includes any example, wherein the compact key comprises multiple bytes of the hash value of the key.
  • Example 20 includes any example, wherein the at least one processor is to: receive a request to store a second key-value pair; determine a table slot based on a hash of the second key; determine a second compact key of the second key; identify a simple entry associated with the second compact key; form an enhanced entry that includes the second compact key and identifies a list of one or more full entries; and replace the simple entry with the enhanced entry.
  • Example 21 includes any example, wherein the at least one processor is to: prefetch one or more entries in a record in addition to retrieval of an entry associated with the compact key.

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

In connection with a key-value store, an address translation table (ATL) can be formed that includes table slots with multiple cache-aligned records. A record can include multiple entries. An entry can include a compact representation of a key, instead of the full key value, and a reference to a storage location of a key-value. When another entry has the same compact representation of a key, the entry can be replaced with an enhanced entry that includes the compact representation of the key but includes a reference to multiple full entries that include the full key. In a subsequent key-value read involving the same compact representation of the key, the full keys of the full entries can be checked against the key provided with the key-value to retrieve the proper value. In a subsequent key-value write involving the same compact representation of the key, another full entry can be formed and referenced.

Description

TECHNOLOGIES FOR MEMORY-EFFICIENT KEY-VALUE LOOKUP TECHNICAL FIELD
Various examples described herein relate to key-value lookup techniques.
BACKGROUND
A Key-Value (KV) storage device or system maps arbitrary-length key-strings to arbitrary-length value-strings. The key is typically mapped to a value-location (e.g., a disk address and length pair) using a search structure such as a Log-structured-merge (LSM) tree, a B-tree, or a hash table. The search structure is generically called an address translation table (ATL) . The stored keys, depending on the application, can be tens to hundreds of bytes, and significantly impact the ATL size, in part because the number of ATL entries is typically very large. The storing of variable sized keys in memory results in a very large memory-intensive (and therefore costly) ATL data structure. On the other hand, even though the value-sizes are typically even larger (e.g., kilobytes) , the values are kept on disk or in storage, and their location information is a fixed size in the ATL entries. Values therefore do not significantly impact the size of the ATL.
BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1 depicts an exemplary database configuration and its address translation table.
FIG. 2 illustrates a scheme that can be used to represent a key-value store.
FIG. 3 shows an example of an extended address translation table layout to handle key slice collisions.
FIG. 4A illustrates the chaining of both the table-slots and the mapping-entries.
FIG. 4B illustrates an example of linking of records.
FIGs. 5A-5C depict an example process that can be used to store a key-value pair associated with a key.
FIG. 6 depicts an example process that can be used to retrieve a value associated with a key.
FIG. 7 depicts a system.
DETAILED DESCRIPTION
FIG. 1 depicts an exemplary database configuration and its address translation table (ATL) . In this example, the database holds 1, 073, 741, 824 key-value (KV) pairs with an average key size of 32 bytes and average value size of 800 bytes. For a logical disk size of 2 TB and assuming 5%database overprovisioning, the logical database size is 0.95 *2TB = 1.9 TB. Assuming a 10%hash-table oversizing in the hash-table, 5%database overprovisioning, and 46 bytes per hash-table entry as shown, the ATL size is 45 GB. Accordingly, the ATL memory consumption is the primary expense of this configuration.
The following techniques are used in known products or described in documents. Compression of the key in each ATL entry can be used to attempt to reduce a size of an ATL entry. However, the effect of compressing the key of ATL entry is not guaranteed and not every type of key can be compressed efficiently. Moreover, even when the average key compression ratio is 2x, the ATL size for the exemplary database of FIG. 1 reduces to 22.5 GB, which is still quite large. In addition, compression and decompression take additional central processing unit (CPU) /other processing resources that can impact overall system performance negatively.
Another known technique involves keys in the ATL being replaced by a key’s digest. This is a way to reduce the ATL entry size when the digest is strong enough, e.g., a 32-byte SHA256 digest may be used to replace the variable-sized keys. However, a 32 byte digest will not reduce the size of the ATL for the example of FIG. 1 at all, because the average key-size is already 32 bytes.
Another known technique places the ATL on disk and keeps a cached-version of the ATL in memory can reduce memory consumption of the ATL by arbitrary amounts. However, KV lookups are frequently random, resulting in very low cache-hit rates, and therefore these systems invariably end up requiring extra disk lookup (s) for every KV lookup. This typically doubles the number of disk I/Os (or solid state drive or other storage device) and can provide unacceptable performance.
Various embodiments dynamically select when to keep the full key in the hash-table as opposed to only a slice or compact version of the full key. A long variable-sized key string can be replaced by a short and potentially collidable key slice or compact version of the full key. Collisions can be resolved in a manner that does not affect the ATL size and performance of the main KV. Various embodiments perform a first hash and calculate a strong digest, such as SHA256 and use a fixed-size slice of that digest (e.g., a portion of the digest or a second hash) and save that slice in the ATL entries. These slices are smaller than the full 32B digests and hence can save a significant amount of memory used to store an ATL entry.
Various embodiments handle collisions by packing multiple records into the ATL-slots, multiple entries within a record, along with a “next” pointer per record to chain the records. If both the hash-value as well as the slice-value collide, then those cases are handled by keeping the full key information in a “full” entry structure, which can be CPU cache line aligned. This  dynamic selection of simple (slice-only) entries as opposed to full-key entries allows reduction of memory requirements.
Various embodiments can reduce the ATL size by more than 4x for the exemplary KV database of FIG. 1. For databases with larger keys, the benefits are even greater as the ATL-entry size is kept constant. However, when average key-sizes are smaller, the space saving benefits may be lower. Various embodiments require no additional disk or storage accesses and do not incur codec performance overheads.
Examples of KV stores include databases, website keeping track of transactions whereby a key is a transaction number and a value is a record, packet lookup, web hosting server keeping track of a file being accessed whereby a key is a website universal resource locator (URL) and the value is a count, or multimedia image or video storage and retrieval.
Various embodiments could be used by a personal computer, smart phone, tablet computer, server, rack, blade other computing devices, or integrated with a memory or storage device for KV storage and lookup. Various embodiments may be provided as part of storage performance development kit (SPDK) .
FIG. 2 illustrates a scheme that can be used to represent a KV store. In this example, hash values of keys may collide and refer to the same table slot but the key-slice values of the keys are different values. In other words, two distinct keys K1 and K2 may have the same hash values h (K1) and h (K2) , but the key-slices S (K1) and S (K2) are different.
A hash-table structure can be used for ATL 202. However, other look-up tables or associative data structures can be used. A hash of a received key can be used to identify a table slot in ATL 202. A slot can include one or more records (e.g., 7 or other number of simple or simplified records) . A record can include a valid indicator (e.g., 7 bits) , pad (e.g., 25 bits) , one  or more simplified entries (e.g., 8 bytes /entry) , and pointer to next entry (e.g., 7 bits) . A record can be considered valid if it stores one or more simplified entries. A simplified entry can include a 2-byte key-slice and a 6-byte key-value storage location (or pointer) in disk or in storage. The key-slice value can be determined as a first two bytes of a cryptographic hash of the key (e.g., SHA256, RIPEMD-160, or stronger hash algorithms such as SHA384, or weaker hash algorithms) , the first two bytes of the unmodified key, or using other techniques. Accordingly, a 32 byte average length key can be reduced to 2 bytes /key resulting in substantial savings of memory use to store an ATL slot.
For a storage of a value associated with a key, a put request (e.g., Put (K, V)) , the system first computes a hash of the key (e.g., h (Key) ) , calculates or determines a compact representation of the key or key-slice (e.g., S (Key) ) , and writes an entry for the simplified key in a simplified entry of a record in the table-slot associated with the h (Key) . If no open entry is available for the table-slot associated with the h (Key) (e.g., all entries in the slot are already valid) , then a new slot is allocated from a pool of free slots in another record in the same table-slot. A compact representation of the key or key-slice and a 4 byte logical block address (LBA) and a 2 byte offset to a location in storage of a key-value pair can be written to the simplified entry.
For a retrieve operation (e.g., Get (K) request) , a received key is hashed (h (Key)) and a compact representation of the key or key-slice is calculated (S (Key) ) . The table slot associated with the hashed key is retrieved. All records with all simplified entries can be scanned to see that there is only one simplified key that matches the value of S (Key) . The simplified key can be searched for in the entries in the table-slot associated with h (Key) using a linear search in the entry and any pointed-to-entries. The record in the retrieved table slot associated with the  matching simplified key (S (Key) ) is retrieved. A simplified entry can include a compact representation of a key (e.g., key slice) , a 4 byte logical block address (LBA) , and a 2 byte offset to a location in storage of a key-value pair. The key from the key-value pair can be used to validate (or not) that the corresponding value is to be provided as a response to the provided key by comparing the provided key against the key in the key-value pair. If there is a match between the provided key and the key in the key-value pair, the value of the key-value pair is provided to a requester.
In an example of a retrieve operation, a hash of a received key (e.g., encryption then a modulo operation to identify a table slot number) can identify a Table Slot 2. A compact representation of the key can be calculated. Records in Table Slot 2 are examined to determine if the compact representation of the key is located in a simplified entry of a record. For the simplified entry with the corresponding compact representation of the key, the location of the key-value in storage is retrieved. The key-value can be retrieved from the storage location. The key is compared against the key of the key-value. If there is a match, the value is provided to the requester.
In some examples, a record can be a size which matches a CPU cache line size, for optimized lookups. For example, if a CPU cache line is 64 bytes, the record can be 64 bytes, although other sizes can be used. For a 64 byte record within table slot with multiple 8 byte simplified entries, a single memory read yields 7 simplified entries. A prefetch of a table slot can retrieve multiple entries while processing entries in a table slot. In this example, a table slot can store up to 5 records, although other numbers of records can be stored.
In this example, the database usage (number of KVs and their sizes) is the same as that for the exemplary database in FIG. 1. However, memory consumption for the hash-table is  225 slots *5 records/slot *64B/record which is approximately 10GB. The hash-table is sized much smaller, to 225 entries, but allows an entry to hold 5 records with 7 simplified entries each, so that more than 1e9 simplified entries can be stored. Each hash table slot in this configuration, as shown, can hold 35 KV entries (5 records of 7 entries each) . In practice, it can hold more due in part to the record’s linked-list.
A size of a key slice can be a different size than 2 bytes. A longer key slice can result in fewer collisions but incurs more memory use. A number of records per table slot can be changed as can a number of entries in a record.
FIG. 3 shows an example of a system that can be used to handle collisions. If there are two keys K1 and K2 for which there is a match of the hash values of keys K1 and K2 and a match of simplified keys generated from keys K1 and K2 (e.g., h (K1) =h (K2) and S (K1) =S (K2)) , then an enhanced entry format can be used instead of a simplified entry format. In this example, a table slot stores one or more records and a record can store either or both of simplified format entries and enhanced format entries. A record can include “type” bits (e.g., 7 bits) to identify whether an entry in the record is simplified entry format or enhanced entry format. Accordingly, dynamic selection can be made of when full-keys are stored or when key-slices are stored.
In an example, the enhanced entry format includes a key slice (e.g., 2 bytes) , pointer to a full entry structure (e.g., 4 bytes) , and a pad to ensure the enhanced entry format is 8 bytes in size (e.g., 2 bytes) , although other enhanced entry format sizes can be used. In an example, the full-entry structure can include (a) the full variable-length key, (b) KV location in storage, (c) a pointer to the next full-entry structure, and (d) a pad so that the full-entry is 64 bytes or otherwise cache line width aligned. The full-entry structures can be aligned to 64 byes to match the CPU cache line size, for efficient field lookups and entry pre-fetching. In some examples, full-entries  are stored in volatile memory for faster retrieval. Multiple full-entries can be linked together using the pointer to the next full-entry structure. In some examples, because full key is a variant string, the full variable-length key can be represented or replaced by SHA256 (full-key) , so that the full key structure is a fixed-sized equivalent. However, if the key’s length is smaller than SHA256 (full-key) or some threshold length, the key itself can be saved.
The following provides an example of a KV storage operation (e.g., put operation) whereby a simplified entry is replaced with an enhanced entry because of collisions between the hash of multiple keys and collision of simplified keys. A received key is hashed to determine a table slot associated with the key. In this example, table slot 2 is identified. Records in table slot 2 are scanned to determine if a record for the key slice is present. If an entry with the same key slice is present in a record, a collision has occurred for both the hashed key and the simplified key (e.g., compact representation of a key) . If a simplified entry is stored in a record of table slot 2, an enhanced entry is formed and replaces the simplified entry. A full entry structure is associated with the stored KV value and another full entry structure is associated with the previously stored KV value with the same compact representation of the key, where both full entry structures are referenced by the enhanced table. If an enhanced entry format is present for the simplified key, then another full entry structure is created for the key-value storage and associated with the enhanced entry structure using for example, the pointer to next full entry structure in a full entry structure. However, if no entry corresponding to the key slice is found in any record of the table slot, a simplified entry is formed for the put request for the key-value.
In some embodiments, intermediate sized key-slices can be used to further save memory in cases of double collisions. An intermediate key-size could be a longer slice of the hashed key. The “Type bits” of FIG. 3 can be expanded to have additional state information. In  one instantiation, the Type bits can be expanded to 14 bits (i.e., to 2 bits per entry) and can specify (e.g., up to two levels of) intermediate-sized entry formats. For example, a type = 0 may specify a simplified entry format, a type=3 may specify a full entry structure, and  types  1 and 2 may specify intermediate entry formats, intermediate1 and intermediate2. Intermediate1 could indicate that two entries are combined and together hold up to 10 bytes of key-slice and 6 4 bytes of KV location information. Intermediate2 could indicate that three entries are combined allowing for 18 bytes of key-slice and 6 bytes of KV location information.
FIG. 4A is an alternate view of the information in FIG. 3 to illustrate the chaining of both the table-slots and the enhanced entries, when needed. FIG. 4A depicts that a record can include chained simplified or enhanced entries to handle both hash (first-level) collisions and key-slice (second level) collisions. Even though pointers are shown in the first level, a record of 7 entries is followed by a single pointer to another record of 7 entries, if needed.
When looking up a KV object using a key slice, there are several possible scenarios. A key slice is not found (e.g., empty hash slot or no key-slice corresponding to this key) and the result of “not found” is returned to a requester. Another scenario is the key slice is part of a simplified entry and the entire key is not available in memory to verify that the KV pair in storage at the indicated location is the correct one for the user-specified key so a storage-read for the stored KV pair is performed, and the stored key is compared against the received key. If they match, the value is returned, and otherwise “not found” is returned.
Another scenario is a key slice is part of an enhanced entry. An enhanced entry includes full-length keys, which are compared against user-specified keys. If there is a match between full-length key and a user-specified key, then the corresponding value is read from storage and provided to a user. Otherwise, “not found” is returned to a user /requester.
FIG. 4B illustrates an example of linking of records. In this example, a slot in a table can be a fixed region of memory and can include at least zero or more records. A record can include zero or more entries. A record list can be created whereby a head record allocated to a slot links to record1, record1 links to record2, and so forth. Records other than the slot /header record can be dynamically allocated and flexibly stored in a variety of memory regions or locations.
An example of pseudocode for a “Put” operation for a key that is not present in an ATL is as follows.
Put (K, V) //this version assumes that K is not in ATL already.
h←Hash (K) //index between 0 and #of ATL slots
S←KeySlice (K) //two-byte (or other size) value
Loc←WriteKVtoDisk (K, V) //write key value (KV) to storage, and get location info (Loc)
Execute for each Record that is in ATL [h] , or is included in the list at ATL [h] //Record can be a 64-byte structure or otherwise cache line aligned
Figure PCTCN2019088262-appb-000001
Figure PCTCN2019088262-appb-000002
In addition, set corresponding valid bit for E to be true in the associated Record, and corresponding type bit to ‘Simple’ .
An example of pseudocode for a “Put” (store) operation for a key that is present in an ATL is as follows. If the key is already in the ATL, the KV location is updated to reflect an updated KV storage location. For a simplified entry with a key that matches the received key, the storage location of the KV is updated to reflect a storage location of the KV.
Put (K, V)
h←Hash (K) //index between 0 and #of ATL slots
S←KeySlice (K) //For example, a key slice is a two-byte value
Loc←WriteKVtoDisk (K, V) //write key value (KV) to storage, and get location info
For each Record that is in ATL [h] , or is included in the list at ATL [h] //Record is a 64-byte structure
Figure PCTCN2019088262-appb-000003
Figure PCTCN2019088262-appb-000004
The following provides an example pseudocode for a “Get” (retrieve) operation.
Get (K)
h←Hash (K) //index between 0 and #of ATL slots
S←KeySlice (K) //For example, a two-byte value
For each Record that is ATL [h] , or is included in the list at ATL [h] //Record can be a 64-byte structure or otherwise cache line width aligned
Figure PCTCN2019088262-appb-000005
Additional examples for Put () and the Get () operations are provided with respect to FIGs. 5A-5C and 6 respectively. FIGs. 5A-5C depict an example process that can be used to store a key-value pair associated with a key. At 502, a request is received to store a key-value pair in a database. At 504, the key-value is stored in the database at a location in storage. At 506,  a compact representation of the key is determined. The compact representation can be a value determined using a hash calculation of the key. The compact representation may be non-unique whereby different keys can share the same compact representation. At 508, a table slot associated with a compact representation of the key is identified.
At 510, a second compact representation of the key is determined. The second compact representation of the key can be the first several bytes of the key (e.g., 2 bytes) or another hash of the value from a hashed key. The second compact representation may be non-unique whereby different keys can share the same second compact representation.
At 512, a determination is made as to whether an entry associated with the key is present in any record of the table slot based on the second compact representation of the key. If the entry is present, then the process continues to 514, where a determination is made as to whether the entry is a simple or enhanced entry. If no entry is present associated with the second compact representation of the key, the process continues to 520.
At 520, the process allocates a simple entry for the key-value pair in a record and stores the second compact representation of the key in the allocated simple entry. A simple entry can include a second compact representation of the key and a location of a key-value in storage (e.g., logical block address and offset) . Note that if there is no free entry available in a record, then another record is allocated and linked to a slot or head record (or another record) and a free simple entry in a record is populated with the second compact representation of the key and a KV storage location.
Referring next to FIG. 5B. If at 514, the entry is determined to be a simple entry, the process continues to 516, where an enhanced entry is created and replaces the simple entry in the record. An enhanced entry can include a reference (e.g., pointer) to a beginning of a list of one  or more full entry structures. A full entry structure can include the full key and a location of the KV in storage. Other examples of a full entry structure are described herein.
If at 514, the entry is determined to be an enhanced entry, the process continues to 530 (FIG. 5C) . Referring to FIG. 5C, at 530, a full entry structure is created that stores the full key and storage location of the key-value. At 532, a previously created full entry structure is linked to the full entry structure created in 530. Linking of the previously created full entry structure to the full entry structure created in 530 can involve setting a pointer to a memory location of a start of the full entry structure created in 530. The previously created full entry structure can be referenced by the enhanced entry such that in a subsequent put or get request, the full entry structure can be retrieved via reading of the enhanced entry.
FIG. 6 depicts an example process that can be used to retrieve a value associated with a key. At 602, a compact representation of a received key is determined. A compact representation of the key can be a hash of the key. In some cases, a compact representation of multiple received keys can be the same. At 604, a second compact representation of the received key is determined. A second compact representation of the key can be a hash of the hashed key or a first several (e.g., 2) bytes of the hashed key or key. In some cases, a second compact representation of multiple received keys can be the same. At 606, a determination is made as to whether there is an entry associated with the received key. The compact representation can be used to identify a table slot and the second compact representation can be used to identify an entry within a record of the table slot. If there is an entry associated with the received key, then 608 follows. If there is no entry associated with the received key, 614 follows. At 608, a determination is made as to whether an entry associated with a received key is simple or enhanced. If the entry is enhanced, then 610 follows. At 610, a key from the full entry is read  and compared against the received key. In some examples, a key equivalent (e.g., SHA256 (key)) is read and compared against the received key equivalent. If there is a match, then the value is provided to a requester of the value.
If the entry is simple, then 612 follows. At 612, the key-value is read and the value is provided to a requester of the value if the key of the key-value matches the received key.
At 614, a null value is returned in response to no entry being present that is associated with the received key. In some cases, the null value can be returned in response to no entry being associated with the received key or the received key not matching the key of the key-value pair.
The following describes potential savings of memory usage. First note that if the percentage of the enhanced entries in the ATL is very small, the average size of entry is decided by the simplified entry size (e.g., 8 bytes) and fixed, because arbitrary length key is converted into 2 byte key-slice) . An expected number of enhanced entries, and space requirement, for the exemplary database of FIG. 1, when optimized by the layout of FIG. 3 can be: number of random KV entries = 10 9 and number of hash-slots = 2 25. If the keys have perfect distribution across hash-table slots and no key-slice collisions, there are no additional entries required and the five 64 byte records in the slots hold all the simplified entries and the table thus consumes 2 25 slots *5 records/slot *64B/record = 10 GB.
However, when there are both hash-collisions and key-slice collisions, then higher-space enhanced entries are required. Expected number of such dual-collisions per slot, and then multiply that with number of slots can be determined as follows:
Figure PCTCN2019088262-appb-000006
where m is the number of possible bin-ids that an entry can have, and n is the number of  entries placed in those bins.
With the presented layout, for any given hash-table slot:
m = number of different bin-ids that a KV pair can have (the bin-ids for two KV pairs must be the same to be considered a collision)
= 2 number  of  bits  per  key-slice = 2 16 in some examples
n = average number of KV entries placed in a hash-table slot
= total number of KV entries in the DB divided by number of hashslots
= 10 9/2 25 ~= 30 in some examples.
Applying the formula above, the expected dual-collisions per slot are ~0.00655, and therefore the total number of dual-collisions for the entire hash-table are 2 25*0.00655 ~= 220k. Whenever a dual-collision happens, worse-case, there is 1 enhanced-entry and 2 full-entries required per dual-collision. The enhanced entry size is already included since the enhanced entry replaces the simple-entry (in fact, an enhanced entry replaces >=2 simple entries) and is not additive. Each enhanced entry points to average 2 full-entries of 64B each, or 128B *220k ~= 26 MB. The total expected structure size for various embodiments is therefore 10 GB + 26 MB = 10.026 GB, compared to the baseline 45GB, thereby providing ~4.5x improvement in memory use.
The improvements to reduced DRAM usage are not specific to these examples and apply broadly. The savings are larger if key sizes are larger but can be smaller if the key sizes are smaller. The savings can be primarily due to replacement of full-length keys by a dynamic selection of smaller key-slices (when there is no double-collision) and full-length keys (when there is double collision) . For an experimental database of 1.07 billion KV entries, 659k double-collisions and 707k full-entries were measured. In this experiment, memory utilization was  improved by >4x compared to a baseline.
FIG. 7 depicts a system. The system can use embodiments described herein to determine represent keys using compact representations and store full keys in memory if there is a collision of one or more compact representations. System 700 includes processor 710, which provides processing, operation management, and execution of instructions for system 700. Processor 710 can include any type of microprocessor, central processing unit (CPU) , graphics processing unit (GPU) , processing core, or other processing hardware to provide processing for system 700, or a combination of processors. Processor 710 controls the overall operation of system 700, and can be or include, one or more programmable general-purpose or special-purpose microprocessors, digital signal processors (DSPs) , programmable controllers, application specific integrated circuits (ASICs) , programmable logic devices (PLDs) , or the like, or a combination of such devices.
In one example, system 700 includes interface 712 coupled to processor 710, which can represent a higher speed interface or a high throughput interface for system components that needs higher bandwidth connections, such as memory subsystem 720 or graphics interface components 740, or accelerators 742. Interface 712 represents an interface circuit, which can be a standalone component or integrated onto a processor die. Where present, graphics interface 740 interfaces to graphics components for providing a visual display to a user of system 700. In one example, graphics interface 740 can drive a high definition (HD) display that provides an output to a user. High definition can refer to a display having a pixel density of approximately 100 PPI (pixels per inch) or greater and can include formats such as full HD (e.g., 1080p) , retina displays, 4K (ultra-high definition or UHD) , or others. In one example, the display can include a touchscreen display. In one example, graphics interface 740 generates a display based on data  stored in memory 730 or based on operations executed by processor 710 or both. In one example, graphics interface 740 generates a display based on data stored in memory 730 or based on operations executed by processor 710 or both.
Accelerators 742 can be a fixed function offload engine that can be accessed or used by a processor 710. For example, an accelerator among accelerators 742 can provide compression (DC) capability, cryptography services such as public key encryption (PKE) , cipher, hash/authentication capabilities, decryption, or other capabilities or services. In some embodiments, in addition or alternatively, an accelerator among accelerators 742 provides field select controller capabilities as described herein. In some cases, accelerators 742 can be integrated into a CPU socket (e.g., a connector to a motherboard or circuit board that includes a CPU and provides an electrical interface with the CPU) . For example, accelerators 742 can include a single or multi-core processor, graphics processing unit, logical execution unit single or multi-level cache, functional units usable to independently execute programs or threads, application specific integrated circuits (ASICs) , neural network processors (NNPs) , programmable control logic, and programmable processing elements such as field programmable gate arrays (FPGAs) . Accelerators 742 can provide multiple neural networks, processor cores, or graphics processing units can be made available for use by artificial intelligence (AI) or machine learning (ML) models. For example, the AI model can use or include any or a combination of a reinforcement learning scheme, Q-learning scheme, deep-Q learning, or Asynchronous Advantage Actor-Critic (A3C) , combinatorial neural network, recurrent combinatorial neural network, or other AI or ML model. Multiple neural networks, processor cores, or graphics processing units can be made available for use by AI or ML models.
Memory subsystem 720 represents the main memory of system 700 and provides  storage for code to be executed by processor 710, or data values to be used in executing a routine. Memory subsystem 720 can include one or more memory devices 730 such as read-only memory (ROM) , flash memory, volatile memory, or a combination of such devices. Memory 730 stores and hosts, among other things, operating system (OS) 732 to provide a software platform for execution of instructions in system 700. Additionally, applications 734 can execute on the software platform of OS 732 from memory 730. Applications 734 represent programs that have their own operational logic to perform execution of one or more functions. Processes 736 represent agents or routines that provide auxiliary functions to OS 732 or one or more applications 734 or a combination. OS 732, applications 734, and processes 736 provide software logic to provide functions for system 700. In one example, memory subsystem 720 includes memory controller 722, which is a memory controller to generate and issue commands to memory 730. It will be understood that memory controller 722 could be a physical part of processor 710 or a physical part of interface 712. For example, memory controller 722 can be an integrated memory controller, integrated onto a circuit with processor 710.
A volatile memory is memory whose state (and therefore the data stored in it) is indeterminate if power is interrupted to the device. Dynamic volatile memory requires refreshing the data stored in the device to maintain state. One example of dynamic volatile memory incudes DRAM (Dynamic Random Access Memory) , or some variant such as Synchronous DRAM (SDRAM) . A memory subsystem as described herein may be compatible with a number of memory technologies, such as DDR3 (Double Data Rate version 3, original release by JEDEC (Joint Electronic Device Engineering Council) on June 27, 2007) . DDR4 (DDR version 4, initial specification published in September 2012 by JEDEC) , DDR4E (DDR version 4) , LPDDR3 (Low Power DDR version3, JESD209-3B, August 2013 by JEDEC) ,  LPDDR4) LPDDR version 4, JESD209-4, originally published by JEDEC in August 2014) , WIO2 (Wide Input/Output version 2, JESD229-2 originally published by JEDEC in August 2014, HBM (High Bandwidth Memory, JESD325, originally published by JEDEC in October 2013, LPDDR5 (currently in discussion by JEDEC) , HBM2 (HBM version 2) , currently in discussion by JEDEC, or others or combinations of memory technologies, and technologies based on derivatives or extensions of such specifications. The JEDEC standards are available at www. jedec. org.
While not specifically illustrated, it will be understood that system 700 can include one or more buses or bus systems between devices, such as a memory bus, a graphics bus, interface buses, or others. Buses or other signal lines can communicatively or electrically couple components together, or both communicatively and electrically couple the components. Buses can include physical communication lines, point-to-point connections, bridges, adapters, controllers, or other circuitry or a combination. Buses can include, for example, one or more of a system bus, a Peripheral Component Interconnect (PCI) bus, a HyperTransport or industry standard architecture (ISA) bus, a small computer system interface (SCSI) bus, a universal serial bus (USB) , or an Institute of Electrical and Electronics Engineers (IEEE) standard 1394 bus.
In one example, system 700 includes interface 714, which can be coupled to interface 712. In one example, interface 714 represents an interface circuit, which can include standalone components and integrated circuitry. In one example, multiple user interface components or peripheral components, or both, couple to interface 714. Network interface 750 provides system 700 the ability to communicate with remote devices (e.g., servers or other computing devices) over one or more networks. Network interface 750 can include an Ethernet adapter, wireless interconnection components, cellular network interconnection components, USB (universal serial  bus) , or other wired or wireless standards-based or proprietary interfaces. Network interface 750 can transmit data to a remote device, which can include sending data stored in memory. Network interface 750 can receive data from a remote device, which can include storing received data into memory. Various embodiments can be used in connection with network interface 750, processor 710, and memory subsystem 720.
In one example, system 700 includes one or more input/output (I/O) interface (s) 760. I/O interface 760 can include one or more interface components through which a user interacts with system 700 (e.g., audio, alphanumeric, tactile/touch, or other interfacing) . Peripheral interface 770 can include any hardware interface not specifically mentioned above. Peripherals refer generally to devices that connect dependently to system 700. A dependent connection is one where system 700 provides the software platform or hardware platform or both on which operation executes, and with which a user interacts.
In one example, system 700 includes storage subsystem 780 to store data in a nonvolatile manner. In one example, in certain system implementations, at least certain components of storage 780 can overlap with components of memory subsystem 720. Storage subsystem 780 includes storage device (s) 784, which can be or include any conventional medium for storing large amounts of data in a nonvolatile manner, such as one or more magnetic, solid state, or optical based disks, or a combination. Storage 784 holds code or instructions and data 786 in a persistent state (i.e., the value is retained despite interruption of power to system 700) . Storage 784 can be generically considered to be a "memory, " although memory 730 is typically the executing or operating memory to provide instructions to processor 710. Whereas storage 784 is nonvolatile, memory 730 can include volatile memory (i.e., the value or state of the data is indeterminate if power is interrupted to system 700) . In one example, storage subsystem 780  includes controller 782 to interface with storage 784. In one example controller 782 is a physical part of interface 714 or processor 710 or can include circuits or logic in both processor 710 and interface 714.
A non-volatile memory (NVM) device is a memory whose state is determinate even if power is interrupted to the device. In one embodiment, the NVM device can comprise a block addressable memory device, such as NAND technologies, or more specifically, multi-threshold level NAND flash memory (for example, Single-Level Cell ( “SLC” ) , Multi-Level Cell ( “MLC” ) , Quad-Level Cell ( “QLC” ) , Tri-Level Cell ( “TLC” ) , or some other NAND) . A NVM device can also comprise a byte-addressable write-in-place three dimensional cross point memory device, or other byte addressable write-in-place NVM device (also referred to as persistent memory) , such as single or multi-level Phase Change Memory (PCM) or phase change memory with a switch (PCMS) , NVM devices that use chalcogenide phase change material (for example, chalcogenide glass) , resistive memory including metal oxide base, oxygen vacancy base and Conductive Bridge Random Access Memory (CB-RAM) , nanowire memory, ferroelectric random access memory (FeRAM, FRAM) , magneto resistive random access memory (MRAM) that incorporates memristor technology, spin transfer torque (STT) -MRAM, a spintronic magnetic junction memory based device, a magnetic tunneling junction (MTJ) based device, a DW (Domain Wall) and SOT (Spin Orbit Transfer) based device, a thyristor based memory device, or a combination of any of the above, or other memory.
A power source (not depicted) provides power to the components of system 700. More specifically, power source typically interfaces to one or multiple power supplies in system 700 to provide power to the components of system 700. In one example, the power supply includes an AC to DC (alternating current to direct current) adapter to plug into a wall outlet.  Such AC power can be renewable energy (e.g., solar power) power source. In one example, power source includes a DC power source, such as an external AC to DC converter. In one example, power source or power supply includes wireless charging hardware to charge via proximity to a charging field. In one example, power source can include an internal battery, alternating current supply, motion-based power supply, solar power supply, or fuel cell source.
In an example, system 700 can be implemented using interconnected compute sleds of processors, memories, storages, network interfaces, and other components. High speed interconnects can be used such as PCIe, Ethernet, or optical interconnects (or a combination thereof) .
Embodiments herein may be implemented in various types of computing and networking equipment, such as switches, routers, racks, and blade servers such as those employed in a data center and/or server farm environment. The servers used in data centers and server farms comprise arrayed server configurations such as rack-based servers or blade servers. These servers are interconnected in communication via various network provisions, such as partitioning sets of servers into Local Area Networks (LANs) with appropriate switching and routing facilities between the LANs to form a private Intranet. For example, cloud hosting facilities may typically employ large data centers with a multitude of servers. A blade comprises a separate computing platform that is configured to perform server-type functions, that is, a “server on a card. ” Accordingly, each blade includes components common to conventional servers, including a main printed circuit board (main board) providing internal wiring (i.e., buses) for coupling appropriate integrated circuits (ICs) and other components mounted to the board.
Various examples may be implemented using hardware elements, software elements, or a combination of both. In some examples, hardware elements may include devices,  components, processors, microprocessors, circuits, circuit elements (e.g., transistors, resistors, capacitors, inductors, and so forth) , integrated circuits, ASICs, PLDs, DSPs, FPGAs, memory units, logic gates, registers, semiconductor device, chips, microchips, chip sets, and so forth. In some examples, software elements may include software components, programs, applications, computer programs, application programs, system programs, machine programs, operating system software, middleware, firmware, software modules, routines, subroutines, functions, methods, procedures, software interfaces, APIs, instruction sets, computing code, computer code, code segments, computer code segments, words, values, symbols, or any combination thereof. Determining whether an example is implemented using hardware elements and/or software elements may vary in accordance with any number of factors, such as desired computational rate, power levels, heat tolerances, processing cycle budget, input data rates, output data rates, memory resources, data bus speeds and other design or performance constraints, as desired for a given implementation. It is noted that hardware, firmware and/or software elements may be collectively or individually referred to herein as “module, ” “logic, ” “circuit, ” or “circuitry. ”
Some examples may be implemented using or as an article of manufacture or at least one computer-readable medium. A computer-readable medium may include a non-transitory storage medium to store logic. In some examples, the non-transitory storage medium may include one or more types of computer-readable storage media capable of storing electronic data, including volatile memory or non-volatile memory, removable or non-removable memory, erasable or non-erasable memory, writeable or re-writeable memory, and so forth. In some examples, the logic may include various software elements, such as software components, programs, applications, computer programs, application programs, system programs, machine programs, operating system software, middleware, firmware, software modules, routines,  subroutines, functions, methods, procedures, software interfaces, API, instruction sets, computing code, computer code, code segments, computer code segments, words, values, symbols, or any combination thereof.
According to some examples, a computer-readable medium may include a non-transitory storage medium to store or maintain instructions that when executed by a machine, computing device or system, cause the machine, computing device or system to perform methods and/or operations in accordance with the described examples. The instructions may include any suitable type of code, such as source code, compiled code, interpreted code, executable code, static code, dynamic code, and the like. The instructions may be implemented according to a predefined computer language, manner or syntax, for instructing a machine, computing device or system to perform a certain function. The instructions may be implemented using any suitable high-level, low-level, object-oriented, visual, compiled and/or interpreted programming language.
One or more aspects of at least one example may be implemented by representative instructions stored on at least one machine-readable medium which represents various logic within the processor, which when read by a machine, computing device or system causes the machine, computing device or system to fabricate logic to perform the techniques described herein. Such representations, known as “IP cores” may be stored on a tangible, machine readable medium and supplied to various customers or manufacturing facilities to load into the fabrication machines that actually make the logic or processor.
The appearances of the phrase “one example” or “an example” are not necessarily all referring to the same example or embodiment. Any aspect described herein can be combined with any other aspect or similar aspect described herein, regardless of whether the aspects are described with respect to the same figure or element. Division, omission or inclusion of block  functions depicted in the accompanying figures does not infer that the hardware components, circuits, software and/or elements for implementing these functions would necessarily be divided, omitted, or included in embodiments.
Some examples may be described using the expression "coupled" and "connected" along with their derivatives. These terms are not necessarily intended as synonyms for each other. For example, descriptions using the terms “connected” and/or “coupled” may indicate that two or more elements are in direct physical or electrical contact with each other. The term "coupled, ” however, may also mean that two or more elements are not in direct contact with each other, but yet still co-operate or interact with each other.
The terms “first, ” “second, ” and the like, herein do not denote any order, quantity, or importance, but rather are used to distinguish one element from another. The terms “a” and “an” herein do not denote a limitation of quantity, but rather denote the presence of at least one of the referenced items. The term “asserted” used herein with reference to a signal denote a state of the signal, in which the signal is active, and which can be achieved by applying any logic level either logic 0 or logic 1 to the signal. The terms “follow” or “after” can refer to immediately following or following after some other event or events. Other sequences of steps may also be performed according to alternative embodiments. Furthermore, additional steps may be added or removed depending on the particular applications. Any combination of changes can be used and one of ordinary skill in the art with the benefit of this disclosure would understand the many variations, modifications, and alternative embodiments thereof.
Disjunctive language such as the phrase “at least one of X, Y, or Z, ” unless specifically stated otherwise, is otherwise understood within the context as used in general to present that an item, term, etc., may be either X, Y, or Z, or any combination thereof (e.g., X, Y,  and/or Z) . Thus, such disjunctive language is not generally intended to, and should not, imply that certain embodiments require at least one of X, at least one of Y, or at least one of Z to each be present. Additionally, conjunctive language such as the phrase “at least one of X, Y, and Z, ” unless specifically stated otherwise, should also be understood to mean X, Y, Z, or any combination thereof, including “X, Y, and/or Z. ” ’
Illustrative examples of the devices, systems, and methods disclosed herein are provided below. An embodiment of the devices, systems, and methods may include any one or more, and any combination of, the examples described below.
Example 1 includes a computer-readable medium, comprising instructions stored thereon, that if executed by one or more processors, cause the one or more processors to: receive a request to store a first key-value pair; form an entry associated with the first key-value pair, the entry comprising a compact representation of the key of the first key-value pair and an identifier of a storage location of the key-value pair, wherein the compact representation of the key is non-unique; and store the entry in a record of a table slot into memory.
Example 2 includes any example, wherein the compact representation of the key comprises multiple bytes of a hashed version of the key.
Example 3 includes any example and comprising instructions stored thereon, that if executed by one or more processors, cause the one or more processors to: receive a request to store a second key-value pair and based on a determination that a compact representation of the key of the second key-value pair that matches the compact representation the key of the first key-value pair: form a full entry that includes a full version of the key of the key-value pair and an identifier of a storage location of the first key-value pair, form a second full entry that includes a full version of the key of the second key-value pair and an identifier of a storage location of the  second key-value pair, form a second entry that includes the compact representation of the key and a reference to the full entry, and replace the entry with the second entry.
Example 4 includes any example and comprising instructions stored thereon, that if executed by one or more processors, cause the one or more processors to: receive a request to retrieve a requested key-value pair; determine a compact representation of the key of the requested key-value pair; identify the second entry based on the compact representation of the key of the requested key-value pair; and provide a value from the full entry or the second full entry based on the key of the requested key-value pair matching a key in the full entry or the second full entry.
Example 5 includes any example and comprising instructions stored thereon, that if executed by one or more processors, cause the one or more processors to: perform a hash on the key of the first key-value pair; determine an index of the table slot associated with the hashed key; generate a first compact representation of the hash of the key of the first key-value pair; inspect the determined table slot to identify any entry associated with the first compact representation; identify any entry with a compact representation that matches the first compact representation; retrieve a key-value associated with the identified entry; and provide the value from the retrieved key-value based on the key matching a key in the retrieved key-value.
Example 6 includes any example, wherein: the table slot comprises one or more records, the one or more records include one or more entries, and a record size matches a cache line width and one or more entries are pre-fetched in a single record read.
Example 7 includes any example and comprising instructions stored thereon, that if executed by one or more processors, cause the one or more processors to: store the first key-value pair into storage.
Example 8 includes a method comprising: receiving a request to store a key-value pair; storing the key-value pair in a storage device; determining a hash value of the key of the key-value; generating a compact key from the key for inclusion in a first entry, wherein the compact key is shorter than the key and the compact key is non-unique and is capable of association with multiple different keys; associating a table slot with the hash value, wherein the table slot comprises one or more records, the one or more records comprise one or more entries, and a first entry comprises the compact key and a pointer to the key-value pair in the storage; and storing the first entry in memory.
Example 9 includes any example and comprising: receiving a second key; determining a hash of the second key; identifying a table slot associated with the hash of the second key; determining a second compact key for the second key; retrieving a second entry associated with the second compact key; and retrieving a key-value from storage based on a pointer in the retrieved second entry.
Example 10 includes any example and comprising: receiving a request to store a second key-value pair; determining a table slot based on a hash of the second key of the second key-value pair; determining a second compact key of the second key; identifying the first entry as associated with the second compact key; forming an enhanced entry that includes the second compact key of the second key and identifies a list of one or more full entries; and replacing the first entry with the enhanced entry.
Example 11 includes any example, wherein the enhanced entry comprises the second compact key and a pointer to a full entry.
Example 12 includes any example and comprising: storing the one or more full entries in memory, wherein the one or more full entries comprise a key, pointer to location of the  key-value in storage, and pointer to a next full entry.
Example 13 includes any example and comprising: receiving a request to store a second key-value pair; determining a table slot based on a hash of a second key of the second key-value pair; determining a second compact key based on the second key; determining that the second compact key is associated with an enhanced entry in an existing record; forming a full entry that refers to the second key-value pair; and updating a pointer associated with the enhanced entry to identify the formed full entry.
Example 14 includes any example and comprising: receiving a request to provide a value associated with a second key; determining a table slot based on a hash of the second key; determining a compact second key based on the second key; identifying the compact second key is associated with a simple or enhanced entry in an existing record; for an associated simple entry: retrieve a key-value from storage associated with the second compact key or for an associated enhanced entry: retrieve a full entry associated with the compact second key, retrieve a pointer to the value associated with the second key, and retrieve the value associated with the second key from storage.
Example 15 includes any example, wherein the table slot comprises multiple records, a record comprises multiple entries, and an entry is simple or enhanced.
Example 16 includes any example, wherein a record is cache line aligned.
Example 17 includes any example and comprising prefetching one or more entries in a record in addition to retrieving an entry associated with the compact key.
Example 18 includes a system comprising: at least one processor; at least one memory; and at least one storage device, wherein the at least one processor is to: receive a request to store key-value entries; store the key-value in a storage device; determine a hash value  of the key of the key-value; generate a compact key from the key for inclusion in an entry, wherein the compact key is shorter that the key and the compact key is non-unique and is capable of association with multiple different keys; associate a table slot stored in a memory with the hash value, wherein the table slot comprises one or more records, a comprises one or more entries, and an entry comprises the compact key and a pointer to the key-value in a storage device; and store the entry in the table slot in the memory.
Example 19 includes any example, wherein the compact key comprises multiple bytes of the hash value of the key.
Example 20 includes any example, wherein the at least one processor is to: receive a request to store a second key-value pair; determine a table slot based on a hash of the second key; determine a second compact key of the second key; identify a simple entry associated with the second compact key; form an enhanced entry that includes the second compact key and identifies a list of one or more full entries; and replace the simple entry with the enhanced entry.
Example 21 includes any example, wherein the at least one processor is to: prefetch one or more entries in a record in addition to retrieval of an entry associated with the compact key.

Claims (21)

  1. A computer-readable medium, comprising instructions stored thereon, that if executed by one or more processors, cause the one or more processors to:
    receive a request to store a first key-value pair;
    form an entry associated with the first key-value pair, the entry comprising a compact representation of the key of the first key-value pair and an identifier of a storage location of the key-value pair, wherein the compact representation of the key is non-unique; and
    store the entry in a record of a table slot into memory.
  2. The computer-readable medium of claim 1, wherein the compact representation of the key comprises multiple bytes of a hashed version of the key.
  3. The computer-readable medium of claim 1, comprising instructions stored thereon, that if executed by one or more processors, cause the one or more processors to:
    receive a request to store a second key-value pair and
    based on a determination that a compact representation of the key of the second key-value pair that matches the compact representation the key of the first key-value pair:
    form a full entry that includes a full version of the key of the key-value pair and an identifier of a storage location of the first key-value pair,
    form a second full entry that includes a full version of the key of the second key-value pair and an identifier of a storage location of the second key-value pair,
    form a second entry that includes the compact representation of the key and a reference to the full entry, and
    replace the entry with the second entry.
  4. The computer-readable medium of claim 3, comprising instructions stored thereon, that if executed by one or more processors, cause the one or more processors to:
    receive a request to retrieve a requested key-value pair;
    determine a compact representation of the key of the requested key-value pair;
    identify the second entry based on the compact representation of the key of the requested key-value pair; and
    provide a value from the full entry or the second full entry based on the key of the requested key-value pair matching a key in the full entry or the second full entry.
  5. The computer-readable medium of claim 1, comprising instructions stored thereon, that if executed by one or more processors, cause the one or more processors to:
    perform a hash on the key of the first key-value pair;
    determine an index of the table slot associated with the hashed key;
    generate a first compact representation of the hash of the key of the first key-value pair;
    inspect the determined table slot to identify any entry associated with the first compact representation;
    identify any entry with a compact representation that matches the first compact representation;
    retrieve a key-value associated with the identified entry; and
    provide the value from the retrieved key-value based on the key matching a key in the retrieved key-value.
  6. The computer-readable medium of claim 5, wherein:
    the table slot comprises one or more records,
    the one or more records include one or more entries, and
    a record size matches a cache line width and one or more entries are pre-fetched in a single record read.
  7. The computer-readable medium of claim 1, comprising instructions stored thereon, that if executed by one or more processors, cause the one or more processors to:
    store the first key-value pair into storage.
  8. A method comprising:
    receiving a request to store a key-value pair;
    storing the key-value pair in a storage device;
    determining a hash value of the key of the key-value;
    generating a compact key from the key for inclusion in a first entry, wherein the compact key is shorter than the key and the compact key is non-unique and is capable of association with multiple different keys;
    associating a table slot with the hash value, wherein the table slot comprises one or more records, the one or more records comprise one or more entries, and a first entry comprises the compact key and a pointer to the key-value pair in the storage; and
    storing the first entry in memory.
  9. The method of claim 8, comprising:
    receiving a second key;
    determining a hash of the second key;
    identifying a table slot associated with the hash of the second key;
    determining a second compact key for the second key;
    retrieving a second entry associated with the second compact key; and
    retrieving a key-value from storage based on a pointer in the retrieved second entry.
  10. The method of claim 8, comprising:
    receiving a request to store a second key-value pair;
    determining a table slot based on a hash of the second key of the second key-value pair;
    determining a second compact key of the second key;
    identifying the first entry as associated with the second compact key;
    forming an enhanced entry that includes the second compact key of the second key and
    identifies a list of one or more full entries; and
    replacing the first entry with the enhanced entry.
  11. The method of claim 10, wherein the enhanced entry comprises the second compact key and a pointer to a full entry.
  12. The method of claim 10, comprising:
    storing the one or more full entries in memory, wherein the one or more full entries comprise a key, pointer to location of the key-value in storage, and pointer to a next full entry.
  13. The method of claim 8, comprising:
    receiving a request to store a second key-value pair;
    determining a table slot based on a hash of a second key of the second key-value pair;
    determining a second compact key based on the second key;
    determining that the second compact key is associated with an enhanced entry in an existing record;
    forming a full entry that refers to the second key-value pair; and
    updating a pointer associated with the enhanced entry to identify the formed full entry.
  14. The method of claim 8, comprising:
    receiving a request to provide a value associated with a second key;
    determining a table slot based on a hash of the second key;
    determining a compact second key based on the second key;
    identifying the compact second key is associated with a simple or enhanced entry in an existing record;
    for an associated simple entry:
    retrieve a key-value from storage associated with the second compact key or for an associated enhanced entry:
    retrieve a full entry associated with the compact second key,
    retrieve a pointer to the value associated with the second key, and
    retrieve the value associated with the second key from storage.
  15. The method of claim 8, wherein the table slot comprises multiple records, a record comprises multiple entries, and an entry is simple or enhanced.
  16. The method of claim 8, wherein a record is cache line aligned.
  17. The method of claim 8, comprising prefetching one or more entries in a record in addition to retrieving an entry associated with the compact key.
  18. A system comprising:
    at least one processor;
    at least one memory; and
    at least one storage device, wherein the at least one processor is to:
    receive a request to store key-value entries;
    store the key-value in a storage device;
    determine a hash value of the key of the key-value;
    generate a compact key from the key for inclusion in an entry, wherein the compact key is shorter that the key and the compact key is non-unique and is capable of association with multiple different keys;
    associate a table slot stored in a memory with the hash value, wherein the table slot comprises one or more records, a comprises one or more entries, and an entry comprises the compact key and a pointer to the key-value in a storage device; and
    store the entry in the table slot in the memory.
  19. The system of claim 18, wherein the compact key comprises multiple bytes of the hash value of the key.
  20. The system of claim 18, wherein the at least one processor is to:
    receive a request to store a second key-value pair;
    determine a table slot based on a hash of the second key;
    determine a second compact key of the second key;
    identify a simple entry associated with the second compact key;
    form an enhanced entry that includes the second compact key and identifies a list of one or more full entries; and
    replace the simple entry with the enhanced entry.
  21. The system of claim 18, wherein the at least one processor is to:
    prefetch one or more entries in a record in addition to retrieval of an entry associated with the compact key.
PCT/CN2019/088262 2019-05-24 2019-05-24 Technologies for memory-efficient key-value lookup WO2020237409A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2019/088262 WO2020237409A1 (en) 2019-05-24 2019-05-24 Technologies for memory-efficient key-value lookup

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2019/088262 WO2020237409A1 (en) 2019-05-24 2019-05-24 Technologies for memory-efficient key-value lookup

Publications (1)

Publication Number Publication Date
WO2020237409A1 true WO2020237409A1 (en) 2020-12-03

Family

ID=73553349

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/088262 WO2020237409A1 (en) 2019-05-24 2019-05-24 Technologies for memory-efficient key-value lookup

Country Status (1)

Country Link
WO (1) WO2020237409A1 (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101833541A (en) * 2010-04-26 2010-09-15 华为技术有限公司 Hash data processing method and device
CN104468648A (en) * 2013-09-13 2015-03-25 腾讯科技(深圳)有限公司 Data processing system and method
CN106354774A (en) * 2016-08-22 2017-01-25 东北大学 Real-time industrial process big data compression and storage system and method

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101833541A (en) * 2010-04-26 2010-09-15 华为技术有限公司 Hash data processing method and device
CN104468648A (en) * 2013-09-13 2015-03-25 腾讯科技(深圳)有限公司 Data processing system and method
CN106354774A (en) * 2016-08-22 2017-01-25 东北大学 Real-time industrial process big data compression and storage system and method

Similar Documents

Publication Publication Date Title
CN107239230B (en) Optimized hop-house multiple hash table for efficient memory inline deduplication applications
US10706101B2 (en) Bucketized hash tables with remap entries
US10678768B2 (en) Logical band-based key-value storage structure
US11132300B2 (en) Memory hierarchy using page-based compression
US9021189B2 (en) System and method for performing efficient processing of data stored in a storage node
US7447870B2 (en) Device for identifying data characteristics for flash memory
US10359954B2 (en) Method and system for implementing byte-alterable write cache
US8694737B2 (en) Persistent memory for processor main memory
US10572378B2 (en) Dynamic memory expansion by data compression
TWI698745B (en) Cache memory, method for operating the same and non-transitory computer-readable medium thereof
US20180107598A1 (en) Cluster-Based Migration in a Multi-Level Memory Hierarchy
TWI744289B (en) A central processing unit (cpu)-based system and method for providing memory bandwidth compression using multiple last-level cache (llc) lines
US10296250B2 (en) Method and apparatus for improving performance of sequential logging in a storage device
US10503647B2 (en) Cache allocation based on quality-of-service information
US9477605B2 (en) Memory hierarchy using row-based compression
US20170344490A1 (en) Using Multiple Memory Elements in an Input-Output Memory Management Unit for Performing Virtual Address to Physical Address Translations
CN105117351A (en) Method and apparatus for writing data into cache
EP4287031A2 (en) Flexible dictionary sharing for compressed caches
CN107688436A (en) Memory module and its method of control
US10599579B2 (en) Dynamic cache partitioning in a persistent memory module
US9323774B2 (en) Compressed pointers for cell structures
KR20170085951A (en) Versioning storage devices and methods
US10936500B1 (en) Conditional cache persistence in database systems
US10216445B2 (en) Key-value deduplication
US20180004668A1 (en) Searchable hot content cache

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19931354

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19931354

Country of ref document: EP

Kind code of ref document: A1