WO2023160115A1 - 键值对检索方法、装置及存储介质 - Google Patents

键值对检索方法、装置及存储介质 Download PDF

Info

Publication number
WO2023160115A1
WO2023160115A1 PCT/CN2022/137906 CN2022137906W WO2023160115A1 WO 2023160115 A1 WO2023160115 A1 WO 2023160115A1 CN 2022137906 W CN2022137906 W CN 2022137906W WO 2023160115 A1 WO2023160115 A1 WO 2023160115A1
Authority
WO
WIPO (PCT)
Prior art keywords
value
target
bit
byte
difference
Prior art date
Application number
PCT/CN2022/137906
Other languages
English (en)
French (fr)
Inventor
徐云
王鹏程
陈飞
闫龙
韩磊
Original Assignee
华为技术有限公司
中国科学技术大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司, 中国科学技术大学 filed Critical 华为技术有限公司
Publication of WO2023160115A1 publication Critical patent/WO2023160115A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/52Program synchronisation; Mutual exclusion, e.g. by means of semaphores

Definitions

  • the present application relates to the communication field, in particular to a key-value pair retrieval method, device and storage medium.
  • a key-value storage system data is organized as key-value pairs.
  • the value in a key-value pair can be data with any type, structure and size, and the key corresponding to the value is used to uniquely identify the value.
  • data retrieval performance mainly depends on efficient indexing technologies.
  • current indexing technologies such as vEB indexing, prefix tree indexing, and full-text indexing all have the problem that the storage space of the index is too high.
  • Embodiments of the present application provide a key-value pair retrieval method, device, device, storage medium, and program product, which can reduce the storage space ratio of indexes in a key-value storage system. Described technical scheme is as follows:
  • a key-value pair retrieval method includes: obtaining a processing request, the processing request includes a target key and a subscript index value of a target value corresponding to the target key, and the target key includes A first key field and a second key field, the first key field is located before the second key field; based on the first key field, the first key is obtained from a top-level bitmap (bitmap top, BT) The bit value on the first bit corresponding to the field; in the case where the bit value on the first bit is the first value, based on the second key field, multiple The target BB corresponding to the target key is determined in the underlying bitmap (bitmap bottom, BB), and each BB is used to store multiple values; based on the subscript index value, value operations are performed in the target BB.
  • the key is divided into multiple key fields, and then the corresponding first bit is determined from the BT based on the previous first key field, and then the corresponding first bit is determined from the second key field based on the subsequent second key field.
  • the target BB is determined from the multiple BBs corresponding to one bit. In this way, for keys with the same first key field, the same multiple BBs can be located through BT, that is, the embodiment of the present application integrates the indexes of the common parts of the keys, thereby reducing the space occupied by the index. Compare.
  • the implementation process of determining the target BB corresponding to the target key in the underlying bitmap BB corresponding to the first bit may be: obtaining the BB corresponding to the first bit A first hash table; determining the target BB among the plurality of BBs based on the second key field and the first hash table.
  • each entry of the first hash table stores indication information of a BB, and based on this, the corresponding entry in the first hash table is determined according to the second key field, The indication information of the target BB is acquired from the determined entry, and then the target BB is determined among the plurality of BBs according to the acquired indication information of the target BB.
  • the second key field includes a first subfield and a second subfield, and based on the second key field and the first hash table, in the multiple BBs
  • the implementation process of determining the target BB in is: based on the first subfield and the first hash table, determine the target middle layer bitmap BM corresponding to the target key, and the second hash is stored in the target BM
  • the indication information of the table; based on the indication information of the second hash table, the second hash table is obtained, and the indication information of the multiple BBs is stored in the second hash table; based on the second hash table
  • Two subfields and the second hash table determine the target BB.
  • each entry in the first hash table corresponding to the BT stores indication information of a BM.
  • Each BM stores indication information of a middle-level hash table corresponding to the corresponding BM, and each entry of the middle-level hash table stores indication information of a BB.
  • the middle-level hash table corresponding to the target BM that is, the second hash table is determined based on the indication information of the hash table in the target BM.
  • the corresponding entry is determined in the second hash table based on the second subfield, and the target BB is determined based on the indication information of the BB in the determined entry.
  • the second key field can be further divided, and then the corresponding BM can be retrieved based on the previous first subfield and the first hash table, so as to obtain the second key field in the BM. hash table.
  • the BBs corresponding to these keys can be retrieved through a hash table (that is, the second hash table), which reduces the hash rate.
  • the data size of the table improves the operation efficiency of the hash table.
  • the indexes of more common parts of the key can be integrated to further reduce the space ratio of the index.
  • the target BB stores a reference value and a value compression table, and the value compression table is used to store difference information between other values and the reference value.
  • multiple values can be stored in BB by storing the reference value, the difference information between other values and the reference value, so that the storage of redundant fields in multiple values can be reduced, thereby reducing the The storage space is occupied, and the value compression rate is improved.
  • the value compression table When a value compression table is used to store multiple values, in a possible situation, the value compression table includes multiple single difference entries, one single difference entry is used to store one item of difference information, and the difference
  • the information includes a mutation position, a mutation byte and a corresponding bit vector, the mutation position is used to indicate the position of the byte where the other value changes compared to the reference value, and the mutation byte is the other value
  • the bit vector includes a plurality of bits, one bit corresponds to a value, and the bit on the bit corresponding to the other value in the plurality of bits
  • the value is the second numeric value.
  • the value compression table further includes at least one aggregated difference entry, wherein, one aggregated difference entry is obtained by comparing multiple single The difference table items are aggregated, and the aggregated difference table items include the aggregation identifier, the first mutation position, and a byte storage field, and the byte storage field is in the order of the values corresponding to the bits in the bit vector , sequentially storing the bytes at the first mutation position in each value. That is, in this application, multiple single difference entries including the same mutation position and different mutation bytes can be aggregated to obtain an aggregated difference entry, thereby reducing the storage overhead of the entry.
  • the processing request obtained above may be a data insertion request, and the data insertion request further includes the target value, and on this basis, performing the processing in the target BB based on the subscript index value
  • the implementation process of the value operation may be: comparing each byte of the target value with the corresponding byte in the reference value to obtain the mutation position and mutation byte of the target value; based on the subscript index value , the mutation position and the mutation byte of the target value, inserting the difference information between the target value and the reference value into the value compression table.
  • the bit value of the nth bit in the bit vector in the first single-difference entry Updating to the second value to implement the insertion of the target value, wherein the n is determined based on the subscript index value, and the first single difference entry is the mutation position and the mutation position including the target value Byte single difference table entry.
  • the value compression table does not include the mutation byte of the target value, and the number of second single-difference entries included in the value compression table is less than the first threshold , generating a bit vector of the target value based on the subscript index value, the bit value of the nth bit in the bit vector of the target value is the second value, and the n is based on the subscript index value It is determined that the second single difference entry is a single difference entry including the mutation position of the target value; a single difference entry is generated based on the mutation position of the target value, the mutation byte and the bit vector, and The generated single difference entry is inserted into the value compression table, so as to realize the insertion of the target value.
  • the value compression table does not include the mutation byte of the target value
  • the number of second single difference entries included in the value compression table is not less than the first threshold
  • the item is a single-difference entry including the mutation position of the target value
  • the first threshold is greater than 1
  • the storage space occupied by the target aggregated difference entry is not greater than the storage space occupied by multiple second single-difference entries .
  • the target aggregated difference table can be obtained by aggregating multiple single-difference entries that include the same mutation position as the target value but with different mutation bytes and the target value items, so as to reduce the space occupied by the table items, thereby reducing the storage overhead of the value compression table.
  • the processing request obtained above may be a data query request.
  • the implementation process of performing value operations in the target BB based on the subscript index value may be:
  • the compression table includes a third single difference entry, based on the mutation position and mutation byte in the third single difference entry, in the byte storage field included in each aggregation difference entry in the value compression table
  • the nth byte of the nth byte and the reference value, to obtain the target value the third single difference entry is a single difference entry with the bit value of the nth bit of the bit vector being the second value,
  • the n is determined based on the subscript index value.
  • the processing request obtained above may be a data deletion request.
  • the implementation process of performing value operations in the target BB based on the subscript index value may be:
  • the compression table includes a third single-difference entry, updating the bit value of the nth bit in the bit vector of the third single-difference entry to a third value
  • the third single-difference entry is The bit value of the nth bit of the bit vector is a single difference entry of the second value, the n is determined based on the subscript index value, and the third value is different from the second value;
  • the value compression table includes the first aggregated difference entry, based on the second mutation position included in the first aggregated difference entry, store the nth bit in the byte storage field of the first aggregated difference entry bytes are updated to the byte at the second mutation position in the reference value, and the nth byte in the byte storage field of the first aggregation difference entry is the target value in the The mutation byte at the second mutation position.
  • the The first aggregated difference entry is split into multiple single difference entries. That is, if the number of types of the mutated bytes is less than the first threshold, it means that even if the updated aggregated difference entry is split, the number of single difference entries obtained will be less than the first threshold, less than the Storage space occupied by aggregation difference entries. In this case, the computing device may restore the updated aggregated difference entry to multiple single difference entries.
  • the BT corresponds to a BT read-write lock
  • the BT read-write lock is used to indicate that when the BT is accessed based on the processing request, a read-write lock is performed on the BT.
  • BT read-write lock Through the BT read-write lock, concurrent access to the BT can be controlled.
  • the target BB before performing the value operation in the target BB, it also includes: based on the read-write lock label of the target BB, acquiring the read-write lock corresponding to the target BB from the lock pool, and the corresponding read-write lock of the target BB
  • the read-write lock is used to indicate that during the process of performing value operations in the target BB based on the subscript index value, perform read-write lock on the target BB
  • the lock pool includes multiple read-write locks
  • At least one read-write lock among the plurality of read-write locks corresponds to at least two BBs. That is, the concurrent access to the BB can be controlled through the BB read-write lock.
  • multiple BBs can share the same read-write lock, thus reducing the space consumption of the read-write lock.
  • the read-write lock corresponding to the target BB may also be released.
  • a key-value pair retrieval device in the second aspect, is provided, and the key-value pair retrieval device has the function of realizing the key-value pair retrieval behavior in the first aspect above.
  • the key-value pair retrieval device includes at least one module, and the at least one module is used to implement the key-value pair retrieval method provided in the first aspect above.
  • the key-value pair retrieval device includes a first obtaining module, a second obtaining module, a determining module and a processing module.
  • the first acquisition module is configured to acquire a processing request, the processing request includes a target key and a subscript index value of a target value corresponding to the target key, and the target key includes a first key field and a second key field, The first key field is located before the second key field; the second obtaining module is configured to obtain the first bit corresponding to the first key field from the top-level bitmap BT based on the first key field bit value; a determination module, configured to, in the case where the bit value on the first bit is the first value, based on the second key field, in the plurality of underlying bitmaps corresponding to the first bit A target BB corresponding to the target key is determined in the BB, and each BB is used to store multiple values; a processing module is configured to perform value operations in the target BB based on the subscript index value.
  • the determination module is mainly used to: obtain the first hash table corresponding to the first bit; based on the second key field and the first hash table, among the multiple BBs Determine the target BB.
  • the second key field includes a first subfield and a second subfield
  • the determining module is mainly used to: determine the target key based on the first subfield and the first hash table Corresponding target middle layer bitmap BM, the indication information of the second hash table is stored in the target BM; Based on the indication information of the second hash table, the second hash table is obtained, and the second hash table The indication information of the multiple BBs is stored in the hash table; and the target BB is determined based on the second subfield and the second hash table.
  • the target BB stores a reference value and a value compression table, and the value compression table is used to store difference information between other values and the reference value.
  • the value compression table includes multiple single-difference entries, and one single-difference entry is used to store one item of difference information, and the difference information includes a mutation position, a mutation byte, and a corresponding bit vector, so
  • the mutation position is used to indicate the position of the byte where the other value changes compared to the reference value
  • the mutation byte is the byte where the other value changes compared to the reference value
  • the The bit vector includes a plurality of bits, one bit corresponds to one value, and the bit value of the bit corresponding to the other value among the plurality of bits is a second value.
  • the value compression table further includes at least one aggregated difference entry, wherein one aggregated difference entry is obtained by aggregating multiple single difference entries that include the same first mutation position and different mutation bytes , and the aggregation difference entry includes an aggregation identifier, the first mutation position, and a byte storage field, and the byte storage field sequentially stores each The byte in the value at the position of the first mutation.
  • the processing request is a data insertion request
  • the data insertion request further includes the target value
  • the processing module is mainly used to: compare each byte of the target value with the corresponding byte in the reference value bytes are compared to obtain the mutation position and mutation byte of the target value; based on the subscript index value, the mutation position and mutation byte of the target value, insert the target value into the value compression table Difference information from the reference value.
  • the processing module is mainly configured to: in the case that the value compression table includes a first single-difference entry, set the bit of the nth bit in the bit vector in the first single-difference entry to The value is updated to the second value, the n is determined based on the subscript index value, and the first single difference entry is a single difference entry including a mutation position and a mutation byte of the target value.
  • the processing module is mainly used for: the mutation byte of the target value is not included in the value compression table, and the number of second single-difference entries included in the value compression table is less than the first threshold
  • the bit vector of the target value is generated based on the subscript index value
  • the bit value of the nth bit in the bit vector of the target value is the second value
  • the n is based on the subscript
  • the index value is determined
  • the second single difference entry is a single difference entry including the mutation position of the target value
  • a single difference entry is generated based on the mutation position, mutation byte and bit vector of the target value, And insert the generated single difference entry into the value compression table.
  • the processing module is mainly used for: the mutation byte of the target value is not included in the value compression table, and the number of second single difference entries included in the value compression table is not less than the first threshold
  • the second single difference entry, the mutation byte of the target value and the subscript index value are aggregated to obtain the target aggregation difference entry, and the second single difference entry includes the target
  • the first threshold is greater than 1
  • the storage space occupied by the target aggregation difference entry is not greater than the storage space occupied by multiple second single difference entries.
  • the processing request is a data query request
  • the processing module is mainly used for: when the value compression table includes a third single difference entry, based on the mutation in the third single difference entry
  • the position and mutation byte, the nth byte in the byte storage field included in each aggregation difference entry in the value compression table and the reference value obtain the target value
  • the third single difference table The item is a single-difference entry of the second value whose bit value is the nth bit of the bit vector, and the n is determined based on the subscript index value.
  • the processing request is a data deletion request
  • the processing module is mainly configured to: when the value compression table includes a third single-difference entry, convert the bit vector of the third single-difference entry to The bit value of the nth bit in is updated to a third value, the third single difference entry is a single difference entry of the second value in the bit value of the nth bit of the bit vector, the n is determined based on the subscript index value, and the third value is different from the second value; when the value compression table includes a first aggregation difference entry, based on the first aggregation difference entry Including the second mutation position, updating the nth byte in the byte storage field of the first aggregation difference entry to the byte at the second mutation position in the reference value, the first The nth byte in the byte storage field of the aggregation difference entry is the mutation byte of the target value at the second mutation position.
  • the processing module is further configured to: when the number of mutation bytes in the byte storage field of the updated first aggregated difference entry is less than a first threshold, split the first aggregated difference entry Divided into multiple single difference entries.
  • the BT corresponds to a BT read-write lock
  • the BT read-write lock is used to indicate that when the BT is accessed based on the processing request, a read-write lock is performed on the BT.
  • the device is further configured to: acquire the read-write lock corresponding to the target BB from the lock pool based on the read-write lock label of the target BB, and the read-write lock corresponding to the target BB is used to indicate the Based on the subscript index value, during the value operation process in the target BB, perform read-write lock on the target BB, the lock pool includes multiple read-write locks, and the multiple read-write locks There is at least one read-write lock corresponding to at least two BBs.
  • the device is further configured to: release the read-write lock corresponding to the target BB.
  • a key-value pair retrieval device in a third aspect, includes a processor and a memory, and the memory is used to store and support the key-value pair retrieval device to execute the key value provided in the first aspect above A program for the retrieval method, and storing the data involved in realizing the key-value pair retrieval method provided in the first aspect above.
  • the processor is configured to execute programs stored in the memory.
  • a computer-readable storage medium wherein instructions are stored in the computer-readable storage medium, and when the computer-readable storage medium is run on a computer, the computer executes the key-value pair retrieval method described in the above-mentioned first aspect.
  • a fifth aspect provides a computer program product containing instructions, which when run on a computer, causes the computer to execute the key-value pair retrieval method described in the first aspect above.
  • the key is divided into multiple key fields, and then the corresponding first bit is determined from the BT based on the previous first key field, and then the corresponding first bit is determined from the second key field based on the subsequent second key field.
  • the target BB is determined from the multiple BBs corresponding to one bit. In this way, for keys with the same first key field, the same multiple BBs can be located through BT, that is, the embodiment of the present application integrates the indexes of the common parts of the keys, thereby reducing the space occupied by the index. Compare.
  • Fig. 1 is a schematic structural diagram of a key-value pair retrieval device provided by an embodiment of the present application
  • Fig. 2 is a flow chart of a key-value pair retrieval method provided by the embodiment of the present application
  • Fig. 3 is a schematic diagram of a value compression table provided by an embodiment of the present application.
  • Fig. 4 is a schematic diagram of another value compression table provided by the embodiment of the present application.
  • Fig. 5 is a schematic diagram of another value compression table provided by the embodiment of the present application.
  • Fig. 6 is a schematic diagram of read-write locks of BT, BM and BB provided by the embodiment of the present application;
  • Fig. 7 is a schematic structural diagram of a key-value pair retrieval device provided by an embodiment of the present application.
  • FIG. 1 is a schematic structural diagram of a key-value pair retrieval device provided by an embodiment of the present application.
  • the key-value pair retrieval method provided in the following embodiments can be executed by the key-value pair retrieval device.
  • the key-value pair retrieval device may include one or more processors 101 , a communication bus 102 , a memory 103 and one or more communication interfaces 104 .
  • the processor 101 may be a general-purpose central processing unit (central processing unit, CPU), a network processor (network processor, NP), a microprocessor, or may be one or more integrated circuits for implementing the scheme of the present application, such as , application-specific integrated circuit (ASIC), programmable logic device (programmable logic device, PLD) or a combination thereof.
  • the aforementioned PLD may be a complex programmable logic device (complex programmable logic device, CPLD), a field-programmable gate array (field-programmable gate array, FPGA), a general array logic (generic array logic, GAL) or any combination thereof.
  • the communication bus 102 is used to transfer information between the aforementioned components.
  • the communication bus 102 can be divided into an address bus, a data bus, a control bus, and the like. For ease of representation, only one thick line is used in the figure, but it does not mean that there is only one bus or one type of bus.
  • Memory 103 can be read-only memory (read-only memory, ROM), also can be random access memory (random access memory, RAM), also can be electrically erasable programmable read-only memory (electrically erasable programmable read-only memory) , EEPROM), optical discs (including compact disc read-only memory, CD-ROM), compact discs, laser discs, digital versatile discs, Blu-ray discs, etc.), magnetic disk storage media or other magnetic storage devices, or capable of Any other medium for carrying or storing desired program code in the form of instructions or data structures and capable of being accessed by a computer, but not limited thereto.
  • the memory 103 may exist independently, and is connected to the processor 101 through the communication bus 102 . Alternatively, the memory 103 can also be integrated with the processor 101 .
  • the Communication interface 104 utilizes any transceiver-like device for communicating with other devices or a communication network.
  • the communication interface 104 includes a wired communication interface, and may also include a wireless communication interface.
  • the wired communication interface may be an Ethernet interface, for example.
  • the Ethernet interface can be an optical interface, an electrical interface or a combination thereof.
  • the wireless communication interface may be a wireless local area network (wireless local area networks, WLAN) interface, a cellular network communication interface or a combination thereof.
  • the key-value pair retrieval device may include multiple processors, such as processor 101 and processor 105 shown in FIG. 1 . Each of these processors can be a single-core processor or a multi-core processor.
  • a processor herein may refer to one or more devices, circuits, and/or processing cores for processing data such as computer program instructions.
  • the key-value pair retrieval device may further include an output device 106 and an input device 107 .
  • Output device 106 is in communication with processor 101 and may display information in a variety of ways.
  • the output device 106 may be a liquid crystal display (liquid crystal display, LCD), a light emitting diode (light emitting diode, LED) display device, a cathode ray tube (cathode ray tube, CRT) display device, or a projector (projector), etc.
  • the input device 107 communicates with the processor 101 and can receive user input in various ways.
  • the input device 107 may be a mouse, a keyboard, a touch screen device, or a sensor device, among others.
  • the memory 103 is used to store the program code 108 for implementing the solutions of the present application, and the processor 101 can execute the program code 108 stored in the memory 103 .
  • the program code may include one or more software modules, and the key-value pair retrieval device may implement the key-value pair retrieval method provided in the embodiment of FIG. 2 below through the processor 101 and the program code 108 in the memory 103 .
  • a key-value storage system may be deployed on the key-value pair retrieval device, and the key-value storage system refers to a database that stores data through key-value pairs.
  • the index of the key-value storage system can be stored in the memory, and the processor can perform data storage in the key-value storage system based on the index and other related data stored in the memory through the following reduced key-value pair retrieval method: fetch and delete.
  • the key-value pair retrieval device may be a server, a terminal device, a cloud device, etc., which is not limited in this embodiment of the present application.
  • Fig. 2 is a flow chart of a key-value pair retrieval method provided by the embodiment of the present application, which can be applied to the aforementioned key-value pair retrieval device, and in the following embodiments, the key-value pair retrieval device is simply referred to as retrieval Equipment, referring to Fig. 2, the method comprises the following steps:
  • Step 201 Obtain a processing request, the processing request includes a target key and a subscript index value of a target value corresponding to the target key, the target key includes a first key field and a second key field, and the first key field is located before the second key field.
  • the retrieval device may receive a processing request from outside the device, or generate a processing request according to a user operation.
  • the processing request may be a data insertion request, a data query request or a data deletion request.
  • the data query request or data deletion request includes the target key and the subscript index value of the target value corresponding to the target key.
  • the data query request is used to request Query the target value corresponding to the target key
  • the data deletion request is used to request to delete the target value corresponding to the target key.
  • the data insertion request includes a target key, a target value, and a subscript index value of the target value.
  • the target key and the target value are a key-value pair to be inserted, and the subscript index value of the target value is used to indicate the corresponding bit of the target value in the bit vector in the BB.
  • the retrieval device includes multiple BBs, and each BB is used to store multiple values, for example, each BB can store 64 values.
  • a bit vector is stored in each BB, and the bit vector includes a plurality of bits, and each bit corresponds to a value stored in the BB.
  • the bit vector in the BB will include 64 bits, and each bit corresponds to a value in the BB.
  • the retrieval device may perform field division on the target key included in the processing request, so as to obtain the first key field and the second key field.
  • the first key field is located before the second key field.
  • the retrieval device may use the first three bytes of the target key as the first key field, and use the remaining bytes as the second key field.
  • Step 202 Based on the first key field, obtain the bit value of the first bit corresponding to the first key field from the BT.
  • the retrieval device can retrieve the corresponding first bit in the BT through the first key field.
  • BT includes multiple bits, and each bit corresponds to a key field and multiple BBs, where one bit corresponds to the value of the key in multiple BBs
  • the prefix is the key field corresponding to the bit.
  • the bit value of each bit can be the first value or other values.
  • the bit value of a certain bit is the first value, it is used to indicate that the key value storage system includes a key whose prefix is the key field corresponding to the bit, and when the bit value of a certain bit is other values , which is used to indicate that the key value storage system does not include a key whose prefix is the key field corresponding to the bit.
  • the first numerical value is 1 and other numerical values are 0, or the first numerical value is 0 and other numerical values are 1.
  • the retrieval device may determine the first bit corresponding to the first key field from the multiple bits included in the BT based on the first key field, and obtain a bit value on the first bit. Afterwards, the retrieval device may determine whether the bit value on the first bit is the first value, and then determine whether to continue subsequent retrieval based on the second key field based on the processing request.
  • the BT may also correspond to a BT read-write lock, and the BT read-write lock is used to indicate that the BT is locked for reading and writing when accessing the BT based on a certain processing request.
  • the BT read-write lock can be realized through a semaphore.
  • the thread can acquire the semaphore, so that other threads will not be able to access the BT while the thread is accessing the BT based on the processing request.
  • the BT is accessed.
  • the thread can release the semaphore so that other subsequent threads that want to access the BT obtain the semaphore. It can be seen that, through the BT read-write lock, concurrent access to the BT can be controlled.
  • Step 203 In the case that the bit value on the first bit is the first value, based on the second key field, determine the target BB corresponding to the target key among the multiple BBs corresponding to the first bit, and each BB is used for Store multiple values.
  • the retrieval device determines that the bit value on the first bit is the first value, it means that the key-value storage system includes a key whose prefix is the first key field. In this case, the retrieval device can continue to field, and determine the target BB corresponding to the target key among multiple BBs corresponding to the first bit.
  • each bit in the BT corresponds to multiple BBs.
  • each bit in the BT may correspond to a hash table, and the hash table may be used to retrieve multiple BBs. In this way, each bit corresponds to multiple BBs through the corresponding hash table.
  • the retrieval device first obtains the first hash table corresponding to the first bit, and then determines the target BB among the multiple BBs based on the second key field and the first hash table.
  • each entry of the hash table corresponding to each bit of BT stores a BB indication information, so that each bit is associated with the hash table through its own corresponding hash table.
  • the multiple BBs indicated by the indication information stored in the table correspond to each other.
  • the retrieval device can perform a hash operation on the second key field based on the hash function corresponding to the first hash table to obtain the corresponding hash value, Afterwards, the indication information of the BB in the entry indicated by the hash value in the first hash table is acquired.
  • the indication information of the acquired BB is the indication information of the target BB, and the retrieval device may base on the indication of the acquired BB information to determine the target BB.
  • the BB indication information stored in the hash table corresponding to each bit in the BT can be a pointer to the BB, or other information that can uniquely determine the BB, which is not limited in this embodiment of the present application .
  • the retrieval device may further divide the second key field to obtain the first subfield and the second subfield.
  • the retrieval device further includes multiple BMs, each BM is used to store indication information of a middle-level hash table, and each entry in the middle-level hash table is used to store indication information of a BB, correspondingly Specifically, each bit of BT corresponds to a top-level hash table, and each entry in the top-level hash table stores a BM indication information, so that each bit in BT passes through its corresponding top-level hash table
  • the multiple middle-level hash tables corresponding to the multiple BMs indicated by the indication information stored in the top-level hash tables correspond to the multiple BBs indicated by the indication information in the multiple middle-level hash tables.
  • the retrieval device can determine the target BM corresponding to the target key based on the first subfield and the first hash table.
  • the indication information of the second hash table is stored in the BM; based on the indication information of the second hash table, the second hash table is obtained, and the indication information of multiple BBs is stored in the second hash table; based on the second subfield and a second hash table to determine the target BB.
  • the retrieval device first performs a hash operation on the first subfield based on the hash function corresponding to the first hash table to obtain the first hash value, and then obtains the first hash value indicated by the first hash value in the first hash table.
  • the indication information of the BM in the entry of the BM at this time, the indication information of the acquired BM is the indication information of the target BM.
  • the retrieval device can determine the target BM based on the indication information of the target BM, and then obtain the indication information of the second hash table from the target BM.
  • the retrieval device obtains the second hash table based on the indication information of the second hash table, and performs a hash operation on the second subfield based on a hash function corresponding to the second hash table to obtain a second hash value.
  • the retrieval device acquires the indication information of the BB in the entry indicated by the second hash value in the second hash table, and at this time, the acquired indication information of the BB is the indication information of the target BB.
  • the retrieval device can determine the target BB based on the indication information of the target BB.
  • the indication information of the BM stored in the above-mentioned top-level hash table and the indication information of the middle-level hash table stored in the BM can be pointers to the corresponding BM and the middle-level hash table, or other uniquely determined
  • the information of the BM and the middle-level hash table is not limited in this embodiment of the present application.
  • each BM in the retrieval device can also have a corresponding BM read-write lock, so that when the retrieval device accesses a certain BM based on a certain processing request, the BM can be Locking is performed to prevent the retrieval device from accessing the BM based on other processing requests.
  • the implementation manner of the read-write lock of the BM can refer to the implementation manner of the BT read-write lock described above, which will not be repeated in this embodiment of the present application.
  • the bit value on the first bit may not be the first value, that is, the key value storage system has not yet stored a key prefixed with the first key field.
  • the retrieval device first creates a corresponding first hash table for the first bit, and sets the bit value on the first bit as the first value. Afterwards, the retrieval device performs a hash operation on the first subfield based on the hash function corresponding to the first hash table to obtain the first hash value, and in the entry indicated by the first hash value in the first hash table
  • the indication information of the target BM is stored, and the target BM is created based on the indication information of the target BM.
  • the indication information of the second hash table is stored in the target BM, and the second hash table is created based on the indication information of the second hash table.
  • a hash operation is performed on the second subfield to obtain a second hash value, and a current hash value is stored in the entry indicated by the second hash value in the second hash table.
  • the indication information of a BB that has not yet stored a value is the target BB.
  • the retrieval device can refer to the above method and directly store the indication information of a BB that has no value currently stored in the entry indicated by the first hash value in the first hash table,
  • the BB is the target BB.
  • the retrieval device ends the operation.
  • Step 204 Perform value operations in the target BB based on the subscript index value.
  • the retrieval device may perform the value operation corresponding to the processing request in the target BB based on the subscript index value in the processing request.
  • BB is used to store multiple values.
  • BBs can store values in two different ways.
  • the retrieval device implements value operations in target BBs based on subscript index values in different ways.
  • the BB stores multiple values by storing a reference value and a value compression table, where the reference value is any value stored in the BB.
  • the value compression table is used to store the difference information between other values and the reference value. In this way, multiple values can be recovered based on the difference information and the reference value stored in the value compression table.
  • the retrieval device may perform the value operation corresponding to the processing request in the value compression table based on the subscript index value, the reference value, and the value compression table.
  • the value compression table may include multiple single-difference entries, one single-difference entry is used to store a piece of difference information, and the difference information includes a mutation position, a mutation byte and a corresponding bit vector, and the mutation position is used for Indicates the position of the byte where other values have changed compared to the reference value, the mutation byte is the byte where other values have changed compared to the reference value, the bit vector includes multiple bits, and one bit corresponds to one value, and The bit value of the bit corresponding to the other value in the plurality of bits is the second value.
  • the second value may be 1
  • the initial bit value of the plurality of bits may be 0.
  • the value corresponding to the first bit of the bit vector may be a reference value.
  • the lengths of the values stored in one BB are the same.
  • the number of bits included in the bit vectors in each single difference entry in the value compression table in a BB is the same, which is equal to the number of values that the BB can store.
  • the bit vector in each single difference entry in the value compression table of the target BB will include 64 bits, and each bit is used to indicate a value. Among them, for any value, the bit corresponding to this value can be indicated by the subscript index value of this value.
  • the bit corresponding to this value is is the first bit in the bit vector
  • the subscript index value of a value is 4, it corresponds to the fifth bit, and so on.
  • the bit value of a bit in the bit vector in a single difference entry is the second value, it means that the word at the mutation position included in the single difference entry in the value corresponding to the bit section is different from the byte at the mutation position in the reference value, that is, the byte at the mutation position in the value corresponding to the bit is the mutation byte included in the single difference entry.
  • a single-difference entry in the value compression table may include three fields, respectively A field, a second field, and a third field, wherein the first field is used to store mutation positions, the second field is used to store mutation bytes, and the third field is used to store bit vectors.
  • FIG. 3 is an example of a single difference entry in a value compression table given in an embodiment of the present application. Assuming that the lengths of the 64 values stored in the target BB are all 8 bytes in length, compared with the reference value, the position of the mutated byte for other values can be in the first to eighth bytes In this way, the mutation position in the single difference entry can be represented by an 8-bit value, that is, the length of the first field is 8 bits.
  • the 8-bit value can range from 0 to 7, where 0 is used to indicate that the mutation position is the first byte, 7 is used to indicate that the mutation position is the eighth byte, and so on.
  • the second field of the single difference entry is used to store mutation bytes, and the length of one mutation byte is also 8 bits.
  • the third field of the single difference entry is used to store a bit vector. Since the target BB can store 64 values, the bit vector includes 64 bits, that is, the length of the bit vector is 64 bits. . Thus, a single difference table entry will occupy 10 bytes. As shown in Figure 3, the mutation position in the single-difference entry A is 0, and the mutation byte is 0x02.
  • bit value of a certain bit in the bit vector in the single-difference entry A is the second value
  • bit value of a certain bit in the bit vector in the single-difference entry A when it is 1, it means that the first byte of the value corresponding to the bit is different from the first byte of the reference value, wherein the first byte of the value corresponding to the bit is 0x02.
  • the mutation position in single-difference entry B is 3, and the mutation byte is 0x03.
  • bit value of a bit in the bit vector in single-difference entry B is 1, it means that the bit corresponds to The fourth byte of the value of is different from the fourth byte of the reference value, wherein the fourth byte of the value corresponding to this bit is 0x03.
  • the value compression table in this embodiment of the application may include at least one aggregated difference entry in addition to a single difference entry.
  • at least one Includes one or more, at least n includes n or more.
  • an aggregated difference entry is obtained by aggregating multiple single difference entries including the same first mutation position and different mutation bytes, and the aggregated difference entry includes the aggregation identifier, the first mutation position, and the byte A storage field, the byte storage field sequentially stores the bytes at the first mutation position in each value according to the order of the values corresponding to the bits in the bit vector.
  • the byte of the value corresponding to the bit at the first mutation position is the byte at the first mutation position of the reference value.
  • the mutation bytes of each value at the same mutation position may be different. For example, there are 8 values, and the first byte of each value is relative to the first byte of the reference value. Mutations have occurred in all sections, that is, they are all different from the first byte of the reference value, and the first bytes of these 8 values are also different. In this way, the mutations of these 8 values need to be stored separately There are 8 single-difference entries. When there are many single-difference entries with the same mutation position but different mutation bytes, the storage overhead will increase. Based on this, in the embodiment of the present application, multiple single-difference entries including the same mutation position and different mutation bytes can be aggregated to obtain an aggregated difference entry, thereby reducing the storage of entries overhead.
  • the retrieval device may count whether the number of the multiple single-difference entries is less than a first threshold, where the first A threshold may be determined according to the required byte length of an aggregated difference entry and the required byte length of a single difference entry.
  • an aggregation difference entry may include aggregation identifier, mutation position and byte storage fields.
  • the aggregation identifier can be realized by a 1-bit mark bit, and the byte length occupied by the mutation position can be determined according to the byte length of the value stored in the target BB.
  • the value stored in the target BB is an 8-byte value.
  • the byte length occupied by the mutation position may be 7 bits.
  • the position corresponding to the bit will store the byte of the reference value at the mutation position, so that the byte length occupied by the byte storage field will be the length of the value stored in the target BB, for example, the target BB
  • the value stored in is an 8-byte value, and the byte storage field will occupy 8 bytes.
  • the bytes occupied by the mutation position in a single difference entry can also be determined according to the byte length of the value stored in the target BB.
  • the retrieval device may calculate a ratio between the byte length required for an aggregated difference entry and the byte length required for a single difference entry, and determine the first threshold based on the ratio.
  • the retrieval device determines that the ratio between the two is 6.5 , in this case, the retrieval device may use an integer greater than 6.5 as the first threshold, for example, the first threshold is 7 or 8 or other numerical values.
  • the first threshold is 7 or 8 or other numerical values.
  • the aggregate difference entry may be implemented by occupying fields included in multiple single difference entries.
  • the aggregation difference entry may store the aggregation identifier and the mutation position through the occupied first field in the first single difference entry.
  • the most significant bit of the first field in the first single difference entry can be used as a flag bit of the aggregation identifier, and when the flag bit takes a value of 1, it is used to indicate that the entry is an aggregate difference entry.
  • the remaining bits of this first field can be used to indicate the location of the mutation.
  • the second field in the first single difference entry may be empty, or the second field may be used to store the types and quantities of mutation bytes included in the subsequent byte storage fields. Since the bytes stored in the byte storage field may be much larger than the byte length occupied by the bit vector in the single difference entry, the byte storage field in the aggregated difference entry can be passed through the first The third field of a single difference entry and the remaining fields included in other single difference entries.
  • the target BB can store 64 values and the length of each value is 8 bytes
  • the first field of a single difference entry occupies 8 bits
  • the second field occupies 8 bits
  • the third field occupies 64 bits.
  • the bit value of the highest bit in the first field of the first single difference entry occupied by the aggregated difference entry is 1, which is used to identify the entry as an aggregated difference entry
  • the remaining bits of the first field are used to indicate that the first mutation position is 0, that is, the first byte of each value or reference value is stored in the subsequent byte storage field.
  • the second field of the first occupied single difference entry is blank.
  • the first bit of the bit vector corresponds to the reference value
  • the first byte of the reference value is 0x01.
  • the first byte of the value corresponding to the 2nd to 4th bit is 0x02, 0x03, 0x04 respectively, and the value corresponding to the 5th to 7th bit has not been stored in the target BB, therefore, the three positions are
  • the first byte of the reference value is stored, and the first byte of the value corresponding to the 8th bit is 0x05 respectively.
  • the first byte of the value corresponding to the ninth bit of the bit vector is stored in the first field of the next occupied single difference entry, for example, the first byte of the value corresponding to the ninth bit is 0x06, store 0x06 in the first field of the next single-difference entry, and store the first byte of the value corresponding to the 10th bit in the second field of the next single-difference entry, and so on until the Until the first byte of the value corresponding to each bit of the bit vector is stored.
  • an aggregated difference entry group can also be obtained, that is, the value compression table in the embodiment of the present application includes not only the single-difference entries, but also Can include at least one group of aggregated difference entries.
  • an aggregated difference entry group is obtained by aggregating multiple single difference entries that include the same mutation position and different mutation bytes, an aggregated difference entry group includes multiple aggregated difference entries, and each aggregated difference The entry includes an aggregation identifier, a mutation position, and a byte storage field.
  • the mutation positions in the multiple aggregation difference entries are all the first mutation positions, and the multiple byte storage fields in the multiple aggregation difference entries are in accordance with The order of the values corresponding to each bit in the bit vector stores the byte at the first mutation position in each value in sequence. Wherein, if the value corresponding to a certain bit has not been stored in the target BB, the byte of the value corresponding to the bit at the first mutation position is the byte at the first mutation position of the reference value.
  • the retrieval device may count whether the number of the multiple single-difference entries is less than a first threshold, where the first A threshold may be the number of aggregated difference entries required to be occupied when the mutation bytes at the same position are different in all values that the target BB can store except the reference value, or a value greater than the number.
  • the first threshold may be the number of aggregated difference entries required to be occupied when the mutation bytes at the same position are different in all values that the target BB can store except the reference value, or a value greater than the number.
  • the target BB can store 64 values
  • the mutation bytes at the same position among the 63 values are all different, the mutation position will correspond to 63 mutation bytes. The structure remains unchanged, and the 63 bytes can be stored through the original bit vector field of the single difference entry.
  • the field of the bit vector is used to store 63 bytes, that is, the aggregated difference entry group obtained through aggregation will include 8 aggregated difference entries.
  • the first threshold may be set to 8 or a value greater than 8.
  • the aggregated difference entry Under the condition that the number of multiple single-difference entries that include the same mutation position but different mutation bytes is not less than the first threshold, after aggregating the multiple single-difference entries, the aggregated difference entry The number of aggregated difference entries included in the group will not be greater than the number of multiple single difference entries, and the storage space occupied will be smaller than the storage space occupied by multiple single difference entries, which is conducive to reducing the storage overhead of entries .
  • each aggregation difference entry in order to ensure that the structure of the value compression table remains unchanged, each aggregation difference entry also includes a first field, a second field and a third field.
  • the difference is that the highest bit of the first field can be used as a flag, and the value of the flag indicates that the entry is an aggregated difference entry. For example, when the value of the flag is 1, it indicates that the table The entry is an aggregated difference entry, and when the value of the flag bit is 0, it indicates that the entry is a single difference entry.
  • the remaining bits of this first field can be used to indicate the location of the mutation.
  • the second field in each aggregation difference entry can be empty, and the third field can be used to store the bytes of each value at the mutation position according to the order of the value corresponding to each bit in the bit vector, that is, , the third field is the byte storage field.
  • the first mutation position stored in the first field is 0, and the first byte used to indicate the value is different from the reference value
  • the second field is empty
  • the third field that is, the byte storage field, stores 8 bytes. These 8 bytes are the first of the values corresponding to the first 8 bits in the bit vector in the single difference entry. byte. It should be noted that if the value corresponding to a certain bit in the first 8 bits of the bit vector has not been stored in the target BB, the reference value can be stored in the position corresponding to the bit in the byte storage field the first byte of .
  • the first bit of the bit vector corresponds to the reference value
  • the first byte of the reference value is 0x01
  • the first byte of the values corresponding to the second to fourth bits of the bit vector are respectively 0x02, 0x03, 0x04
  • the values corresponding to the 5th to 7th bits have not been stored in the target BB
  • the first byte of the value corresponding to the 8th bit is 0x05
  • the first aggregation difference table The byte stored in the item's byte storage field will be 0x0102030401010105.
  • the first field in the second aggregated difference entry of the aggregated difference entry group still stores the first mutation position
  • the second field is still empty
  • the third field is still a byte storage field for storing bit vectors The first byte in the value corresponding to the next 8 bits in , and so on.
  • the first threshold number of single-difference entries is required.
  • the mutation byte of the inserted value at the mutation position is different from the multiple single-difference entries, so it is necessary to continue to add single-difference entries, so that multiple single-difference tables with the mutation position but different mutation bytes are included The number of items will exceed the first threshold.
  • the number of aggregate difference entries included in the obtained group of aggregate difference entries is at most equal to the first threshold, even if the subsequent re-inserted value is within this
  • the mutation byte at the mutation position is different from the multiple single difference entries, and it can also be realized by changing the byte at the position corresponding to the value in the byte storage field in the aggregation difference entry group.
  • the table Since the byte length occupied by a single difference entry is the same as the byte length occupied by an aggregated difference entry, after aggregating multiple single difference entries, when the number of entries is reduced, the table The storage space occupied by the items will also be reduced, that is, it is beneficial to reduce the storage overhead of the value compression table.
  • the aggregated difference entries included in the aggregated difference entry group may also include the types and quantities of mutation bytes stored in the multiple byte storage fields included in the aggregated difference entry group.
  • the type and quantity of the mutated bytes can be stored in the second field.
  • m can be stored in the second field of the first aggregated difference entry in the aggregated difference entry group, of course, it can also be stored in any or every Store m in the second field of the aggregated difference entry.
  • the processing request can be any one of data insertion request, data query request and data deletion request.
  • the retrieval device can perform different value operations.
  • Case 1 When the processing request is a data insertion request, the retrieval device compares each byte of the target value included in the data insertion request with the corresponding byte in the reference value in the target BB to obtain the mutation position and A mutation byte; after that, based on the subscript index value, the mutation position of the target value, and the mutation byte, insert the difference information between the target value and the reference value in the value compression table.
  • the retrieval device can compare the target value in the data insertion request with the reference value in the target BB byte by byte, that is, compare the first byte of the target value with the first byte of the reference value , compare the second byte of the target value with the second byte of the reference value, and so on, so as to obtain the byte that is different from the byte corresponding to the reference value among the multiple bytes included in the target value,
  • These bytes that are different from the bytes corresponding to the reference value are the mutation bytes in the target value
  • the retrieval device can record each mutation byte in the target value and the position of the mutation byte in the target value , taking this position as the mutation position.
  • there may be one or more mutation bytes in the target value and correspondingly, there may be one or more mutation positions where the mutation bytes are located.
  • the retrieval device may search in the value compression table of the target BB whether there is a first single difference entry containing the position A and the byte a, and if the first single difference entry is found, it indicates that the current target BB stores Some of the values already have this type of mutation compared with the reference value.
  • the retrieval device can use the subscript index value to convert the n-th
  • the bit value of bits is updated to the second value, wherein, when the minimum value of the subscript index value is 0, then n is equal to the subscript index value plus 1, if the minimum value of the subscript index value is 1, then n is equal to the subscript index value.
  • the subscript index value of the target value can indicate the corresponding bit of the target value in the bit vector. Based on this, based on the subscript index value of the target value, the retrieval device can determine that the nth bit in the bit vector in the first single difference entry is the bit corresponding to the target value. In this case, The retrieval device may update the bit value of the nth bit in the bit vector in the first single difference entry to a second value, so as to indicate that the byte at position A of the target value is byte a, Different from the byte at position A of the reference value.
  • the retrieval device may directly generate the bit vector of the target value, at this time, The bit value of the bit corresponding to the target value in the bit vector of the target value is the second value, and the other values are 0. Afterwards, the retrieval device may generate a single difference entry based on the position A, byte a and the bit vector of the target value, and then insert the single difference entry into the value compression table.
  • the retrieval device when inserting the generated single-difference entry into the value compression table, can search for the single-difference entry whose mutation position is position A included in the target BB, if there is a single-difference table whose mutation position is position A item, insert the generated single-difference entry after the found single-difference entry.
  • the retrieval device can also decide whether to generate a single difference table entry directly based on position A and byte a or based on multiple second single difference table entries based on the number of second single difference table entries included in the value compression table entry, the position A and the byte a to get an aggregate entry set.
  • the retrieval device may compare the number of second single-difference entries with the first threshold, and if the number of second single-difference entries included in the value compression table is less than the first threshold, based on the subscript index value Generate a bit vector of the target value, the bit value of the nth bit in the bit vector of the target value is the second value, wherein, when the minimum value of the subscript index value is 0, then n is equal to the subscript index value plus 1. If the minimum value of the subscript index value is 1, then n is equal to the subscript index value; a single difference entry is generated based on the mutation position, mutation byte and bit vector of the target value, and the generated single difference table Items are inserted into the value compression table.
  • the number of second single-difference entries included in the value compression table is less than the first threshold, it means that the mutation position included in the value compression table is position A but there are fewer single-difference entries with different mutation bytes , at this time, even if multiple second single-difference entries are aggregated, the number of aggregated difference entries in the obtained aggregated difference entry group may not be less than the number of the multiple second single-difference entries, In this case, the benefit of storage overhead brought by the aggregation of multiple second single-difference entries is not obvious or even non-existent.
  • the retrieval device can directly generate the bit vector of the target value based on the subscript index value , after that, generate a single-difference entry based on the position A, byte a and the bit vector, and insert the generated single-difference entry after the last second single-difference entry among multiple second single-difference entries .
  • the retrieval device can aggregate multiple second single difference entries, the mutation byte of the target value, and the subscript index value to obtain the target aggregation difference whose number of aggregation difference entries is not greater than the first threshold Table item groups, so as to improve the utilization rate of table items, reduce the number of table items, and then reduce the storage overhead of the value compression table.
  • the retrieval device can refer to the method for aggregating multiple single difference entries described above to generate a target aggregation difference entry group, and then, the retrieval device can, based on the subscript index value, create The nth byte is determined among the multiple bytes stored in the multiple byte storage field included in the item group. Change the nth byte to the mutated byte whose target value is at the position A, that is, to byte a.
  • the retrieval device may also directly insert the byte a into the aggregation difference entry or the aggregation difference group based on the subscript index value.
  • the retrieval device may store multiple bytes in the byte storage field included in the aggregated difference entry or the aggregated difference group. Determine the nth byte. Change the nth byte to the mutated byte whose target value is at the position A, that is, to byte a.
  • the retrieval device can re-count the number of types of mutation bytes that are different from the byte at position A of the reference value among the multiple bytes stored in the byte storage field. If the number of types remains unchanged, the operation ends. If the number of types Increment, that is, if there is no mutation byte such as byte a in the previous byte storage field, the retrieval device will add 1 to the number of types of mutation bytes included to complete the update.
  • the above mainly introduces how the retrieval device inserts the target value in the data insertion request into the value compression table based on the value compression table when the processing request is a data insertion request.
  • the following describes how the retrieval device obtains the target value corresponding to the target key in the data query request when the processing request is a data query request or a data deletion request.
  • the second case when the processing request is a data query request, in the case that the value compression table in the target BB includes the third single difference entry, based on the mutation position and the mutation byte in the third single difference entry, the The nth byte and the reference value in the multi-byte storage field included in each aggregation difference entry group in the value compression table are used to obtain the target value, and the third single difference entry is the nth bit of the bit vector A single difference entry whose bit value is the second value.
  • the retrieval device can obtain the reference value first. Afterwards, start traversing from the first entry of the value compression table, and for any entry, the retrieval device can judge that the entry is a single difference table according to the bit value of the highest bit in the first field of the entry item is still an aggregate difference entry, if the entry is a single difference entry, the retrieval device detects whether the bit value of the nth bit of the bit vector of the single difference entry is the second value based on the subscript index value, If the bit value of the nth bit of the bit vector of the single-difference entry is the second value, it indicates that the byte at the mutation position included in the single-difference entry is the byte in the single-difference entry.
  • the retrieval device can modify the byte at the mutation position included in the single difference entry in the reference value to the mutation byte in the single difference entry.
  • the bit value of the nth bit of the bit vector of the single-difference entry is not the second value, it means that the target value is not at the mutation position included in the single-difference entry compared to the reference value. change, that is, the byte at the mutation position of the target value is the same as the byte at the mutation position of the reference value, in this case, the retrieval device keeps the byte at the mutation position in the reference value Change, and then continue to view the next entry.
  • the retrieval device may obtain other aggregated difference entries including the same mutation position as the aggregated difference entry, so as to determine the aggregated difference entry group to which the aggregated difference entry belongs. Afterwards, the retrieval device can obtain the nth byte stored in the multiple byte storage fields of the aggregated difference entry group according to the subscript index value. Afterwards, based on the mutation position included in the aggregated difference entry group, the byte at the mutation position in the reference value is replaced with the acquired byte.
  • the retrieval device determines that the entry is an aggregated difference entry according to the bit value of the highest bit in the first field of the entry , then the retrieval device can obtain the nth byte stored in the byte storage field of the aggregated difference entry, and then, based on the mutation position included in the aggregated difference entry, the byte at the mutation position in the reference value Replaced with the fetched bytes.
  • the entries of the value compression table are traversed one by one through the above method, and the final value obtained after modifying the reference value is the target value.
  • Case 3 when the processing request is a data deletion request, if the value compression table includes a third single difference table entry, update the bit value of the nth bit in the bit vector of the third single difference table entry is the third numerical value, the third single difference entry is a single difference entry whose bit value of the nth bit of the bit vector is the second numerical value, the third numerical value is different from the second numerical value; the value compression table includes the first aggregation In the case of a difference entry group, based on the second mutation position included in the first aggregation difference entry group, update the nth byte in the multiple byte storage fields of the first aggregation difference entry group to the reference value For the byte at the second mutation position, the nth byte in the multiple byte storage fields of the first aggregation difference entry group is the mutation byte of the target value at the second mutation position.
  • the retrieval device may refer to the aforementioned manner of processing the data query request, and traverse the entries in the value compression table. If an entry is a single-difference entry, and the bit value of the nth bit in the single-difference entry is the second value, it indicates the byte at the mutation position included in the single-difference entry That is, the mutation byte in the single difference entry, that is, the byte at the mutation position of the target value is different from the byte at the mutation position of the reference value. At this time, the retrieval device can use the single difference table The bit value of the nth bit of the bit vector in the item is changed to the third value, so as to delete the difference between the target value and the reference value at the mutation position. Wherein, when the second value is 1, the third value is 0, and when the second value is 0, the third value is 1.
  • the retrieval device can delete the single difference entry.
  • the retrieval device can refer to the method described above, and based on the subscript index value, obtain the subscript of the aggregated difference entry or the first aggregated difference entry group to which the aggregated difference entry belongs.
  • the section stores the nth byte in the field, and obtains the byte at the second mutation position in the reference value based on the second mutation position included in the aggregation entry. If the two acquired bytes are different, it means that the nth byte in the byte storage field is the mutation byte of the target value at the second mutation position. In this case, the retrieval device can store the byte The nth byte in the field is changed to the byte at the second mutation position in the reference value, so as to delete the difference between the target value and the reference value at the second mutation position.
  • the retrieval device can also refer to the aforementioned method to re-count the number of types of mutated bytes included in the byte storage field, so as to decide whether to update the number of types of mutated bytes currently stored.
  • the retrieval device uses the method described above to record the target value and the reference value recorded in the aggregation difference entry or the first aggregation entry group to which the aggregation difference entry belongs in the second mutation After the difference in position is deleted, if the number of types of mutation bytes in the byte storage field of the updated aggregated difference entry or the first aggregated difference entry group is less than the first threshold, then the aggregated difference entry or The first aggregated difference entry group to which the aggregated difference entry belongs is split into multiple single difference entries.
  • the aggregated difference entry or the group of aggregated difference entries may include the types and quantities of the mutation bytes included in the byte storage field.
  • the retrieval device can obtain the number of types of mutation bytes, if the number of types of mutation bytes is less than the first threshold, it means that even if the updated aggregated difference entry or the first aggregated difference entry group Splitting, the number of single difference entries obtained will also be less than the first threshold, and less than the storage space occupied by the aggregation difference entry or the first aggregation entry group.
  • the retrieval device may restore the updated aggregated difference entry or the first aggregated difference entry group into multiple single difference entries.
  • the retrieval device can start from the first aggregated difference entry in the updated first aggregated difference entry group, and use the mutation position of the aggregated difference entry as a single difference table
  • the mutation position of the entry, the first mutation byte in the byte storage field of the aggregated difference entry is used as the mutation byte of the single difference entry, and then, according to the mutation byte, the aggregation difference entry group includes The positions in the multiple bytes of the multi-byte storage field, determine the corresponding bit of the value corresponding to the mutation byte in the bit vector, and then based on the corresponding value of the mutation byte in the bit vector Bits, generating a bit vector of the single difference entry, wherein the bit value of the bit corresponding to the value corresponding to the mutation byte in the bit vector of the single difference entry is the second value, and the remaining bits are the third value .
  • the retrieval device After splitting to obtain the first single difference entry, the retrieval device continues to obtain the second mutation byte in the byte storage field of the first aggregate difference entry. If the second mutation byte is the same as the If the mutation bytes in the single difference entry are the same, the second mutation byte is determined according to the position of the second mutation byte in the multiple bytes of the multiple byte storage fields included in the aggregated difference entry group. The value corresponding to the mutation byte corresponds to the bit in the bit vector, and then, based on the bit corresponding to the value corresponding to the second mutation byte in the bit vector, the bit vector in the single difference table entry obtained by splitting The bit value on the corresponding bit of is updated to the second value.
  • the second mutation byte is different from the mutation byte in the single difference entry obtained by splitting, you can refer to the method described above and split to obtain a single difference entry based on the second mutation byte .
  • the retrieval device can traverse the mutation bytes in the multiple byte storage fields, so as to sequentially split and obtain multiple single difference entries whose quantity is less than the first threshold.
  • the retrieval device can also refer to the above method, based on the multiple bytes stored in the byte storage field of the aggregated difference entry, sequentially decompose Multiple single-difference entries are obtained.
  • the aggregated difference entry may not record the number of types of mutated bytes.
  • the retrieval device may also obtain the byte at the second mutation position in the reference value. Afterwards, based on the obtained bytes, count the number of types of bytes stored in the byte storage field of the updated aggregated difference entry or the first aggregated difference entry group that are different from the obtained bytes. The number of types is the number of types of mutation bytes in the updated aggregation difference entry or the first aggregation difference entry group. If the number of types of mutated bytes is less than the first threshold, the retrieval device may split the updated aggregated difference entry or the first aggregated difference entry group by referring to the above method, so as to obtain multiple single difference entries.
  • the target BB may become empty, that is, no value is stored in the target BB.
  • the retrieval device may also delete the indication information of the target BB stored in the entry indicated by the second hash value in the second hash table determined in step 203 above. Further, if all entries of the second hash table are empty, the retrieval device can delete the second hash table, and delete the indication information of the second hash table stored in the target BM, and then delete the first The indication information of the target BM stored in the entry indicated by the first hash value in the hash table is deleted. Further, if all the entries in the first hash table are empty, the retrieval device deletes the first hash table, and sets the bit value of the first bit to a value other than the first value.
  • the retrieval device may store the indication of the target BB in the entry indicated by the hash value of the second key field in the hash table corresponding to the first bit determined in step 203 Information deleted. If all entries in the hash table corresponding to the first bit are empty, the retrieval device deletes the hash table, and sets the bit value of the first bit to a value other than the first value.
  • the above mainly introduces the realization method that when the target BB stores multiple values through the reference value and the value compression table, the retrieval device performs the value operation corresponding to the processing request in the target BB based on the value compression table.
  • the target BB may directly store a bit vector and multiple values, where the bit vector includes multiple bits, and each bit is used to indicate a value, where, When the bit value of a certain bit is the second value, it is used to indicate that the value corresponding to the bit is stored in the target BB. At this time, the bit will correspond to a value. When a certain bit is the third value , it is used to indicate that the target BB does not store the value corresponding to the bit, and at this time, there will be no corresponding value for the bit.
  • the retrieval device first determines the bit corresponding to the target value in the bit vector based on the subscript index value in the data insertion request. If the bit value of the bit corresponding to the target value in the bit vector is the second value, it means that the target BB has already stored the value corresponding to the target key. In this case, the retrieval device can store the target key in the target BB The value corresponding to the bit is replaced with the target value. If the bit value of the bit corresponding to the target value in the bit vector is the third value, it means that the value corresponding to the target key is not stored in the target BB. In this case, the retrieval device can store the target value is the value corresponding to the bit, and then, the bit value of the bit is changed to a second value.
  • the retrieval device first determines the corresponding bit in the bit vector based on the subscript index value in the data query request. If the bit value of the determined bit is the second value, the value corresponding to the bit is acquired, and the acquired value is the target value. If the determined bit value of the bit is the third value, a query result for indicating that the value corresponding to the target key is not found is generated.
  • the retrieval device first determines the corresponding bit in the bit vector based on the subscript index value in the data deletion request. If the bit value of the determined bit is the second value, the value corresponding to the bit is deleted. If the bit value of the determined bit is the third value, a deletion failure message is generated.
  • the retrieval device further includes a lock pool.
  • the lock pool may include multiple read-write locks, and among the multiple read-write locks, at least one read-write lock corresponds to at least two BBs. Wherein, at least one includes one or more, and at least two includes two or more than two.
  • a certain read-write lock among the plurality of read-write locks corresponds to two or more BBs
  • a certain read-write lock corresponds to one BB. That is, among the plurality of read-write locks, there are read-write locks shared by multiple BBs.
  • each read-write lock among the plurality of read-write locks can be distinguished by the read-write lock label, so that the retrieval device can store the corresponding relationship between BB and the corresponding read-write lock label. Based on this, before the retrieval device performs value operations in the target BB, it can also obtain the read-write lock corresponding to the target BB from the lock pool based on the target BB's read-write lock label, so that during the value operation in the target BB, Take a read-write lock on the target BB. Subsequently, after the value operation in the target BB is completed, the retrieval device can release the read-write lock corresponding to the target BB, so that other BBs corresponding to the read-write lock can use it.
  • FIG. 6 is a schematic diagram of read-write locks of BT, BM, and BB shown in the embodiment of the present application.
  • BT corresponds to a BT read-write lock
  • each of the multiple BMs corresponds to a BM read-write lock
  • the BM read-write locks corresponding to each BM are different.
  • the read-write locks of multiple BBs can be obtained from the lock pool, and a read-write lock in the lock pool can be shared by multiple BBs, or a BB can use an independent read-write lock.
  • BB1 and BB3 share read-write lock 2
  • BB5 uses read-write lock 3 alone
  • BB8 uses read-write lock 4 alone.
  • the embodiment of the present application simulates the indexing performance of the key-value storage database using this method.
  • the keys are divided into 6-byte lengths, and up to 64 values are stored in each BB.
  • the initial key-value pair is (0, 0), and according to the mode that the key is incremented by 1 and the value is incremented by 0x2000, 6 billion key-value pairs are generated.
  • Table 1 The storage space occupied by each part of the index in the simulation test
  • the average delay of data insertion operation, the average delay of data deletion operation and the average delay of data query operation are shown in Table 2 below.
  • the delay of various operations can be controlled within 1 microsecond as much as possible.
  • the key is divided into multiple key fields, and the corresponding first bit is determined from the BT based on the previous first key field, and then, based on the second key field behind field determines the target BB from the multiple BBs corresponding to the first bit.
  • the same multiple BBs can be located through BT, that is, the embodiment of the present application integrates the indexes of the common parts of the keys, thereby reducing the space occupied by the index. Compare.
  • the retrieval device when the retrieval device also includes BM, the retrieval device can further divide the second key field, and then retrieve the corresponding BM based on the previous first subfield and the first hash table, so as to obtain the The second hash table.
  • the BBs corresponding to these keys can be retrieved through a hash table (that is, the second hash table), which reduces the hash rate.
  • the data size of the table improves the operation efficiency of the hash table.
  • the indexes of more common parts of the key can be integrated to further reduce the space ratio of the index.
  • the corresponding data when performing data insertion operations, for unstored keys, the corresponding data can be created and inserted in the BT, top-level hash table or middle-level hash table during the retrieval process, and then in The target value is inserted into the target BB, and when the data deletion operation is performed, the corresponding elements recorded in BT, BM, BB or hash table can be deleted during the retrieval process. It can be seen that, in the embodiment of the present application, when performing data insertion or deletion operations, there is no need to reconstruct the index structure, and the dynamic update performance of the index is better.
  • multiple values can be stored in BB by storing the reference value and the difference information between other values and the reference value.
  • the storage of redundant fields in multiple values can be reduced, thereby It can reduce the storage space occupied by the value and improve the compression rate of the value.
  • Fig. 7 is a schematic structural diagram of a key-value pair retrieval device provided by an embodiment of the present application.
  • the key-value pair retrieval device 700 includes a first obtaining module 701, a second obtaining module 702, a determining module 703 and a processing module 704, wherein:
  • the first obtaining module 701 is configured to execute step 201 in the foregoing embodiment
  • the second obtaining module 702 is configured to execute step 202 in the foregoing embodiment
  • the processing module 704 is configured to execute step 204 in the foregoing embodiment.
  • each of the above-mentioned modules can be realized by the processor in the aforementioned key-value pair retrieval device executing the computer instructions stored in the memory.
  • the determining module 703 is mainly used for:
  • the target BB is determined among the plurality of BBs.
  • the second key field includes a first subfield and a second subfield
  • the determining module 703 is mainly used for:
  • the target BB is determined.
  • the target BB stores a reference value and a value compression table, and the value compression table is used to store difference information between other values and the reference value.
  • the value compression table includes multiple single-difference entries, one single-difference entry is used to store a piece of difference information, the difference information includes mutation positions, mutation bytes and corresponding bit vectors, and the mutation positions are used to indicate other values Compared with the position of the byte whose reference value has changed, the mutation byte is a byte whose value has changed compared with the reference value.
  • the bit vector includes multiple bits, one bit corresponds to one value, and multiple bits Bit values corresponding to other values in the bits are the second value.
  • the value compression table further includes at least one aggregated difference entry, where one aggregated difference entry is obtained by aggregating multiple single difference entries that include the same first mutation position and different mutation bytes, and
  • the aggregation difference entry includes the aggregation identifier, the first mutation position, and the byte storage field.
  • the byte storage field according to the order of the values corresponding to each bit in the bit vector, each value located at the first mutation position is sequentially stored. bytes.
  • the processing request is a data insertion request
  • the data insertion request also includes a target value
  • the processing module 704 is mainly used for:
  • the difference information between the target value and the reference value is inserted into the value compression table.
  • the processing module 704 is mainly used for:
  • the bit value of the nth bit in the bit vector in the first single difference entry is updated to a second value, and n is determined based on the subscript index value
  • the first single difference entry is a single difference entry including the mutation position and the mutation byte of the target value.
  • the processing module 704 is mainly used for:
  • a bit vector of the target value is generated based on the subscript index value, and the target value
  • the bit value of the nth bit in the bit vector is a second value, n is determined based on the subscript index value, and the second single difference entry is a single difference entry including the mutation position of the target value;
  • the processing module 704 is mainly used for:
  • the value compression table does not include the mutation byte of the target value
  • the number of the second single-difference entries included in the value compression table is not less than the first threshold
  • the mutation byte of the target value Aggregate with the subscript index value to obtain the target aggregation difference entry
  • the second single difference entry is a single difference entry including the mutation position of the target value
  • the first threshold is greater than 1
  • the storage space occupied by the target aggregation difference entry is less than greater than the storage space occupied by multiple second single difference entries.
  • the processing request is a data query request
  • the processing module 704 is mainly used for:
  • the value compression table includes a third single difference entry, based on the mutation position and mutation byte in the third single difference entry, the first byte in the byte storage field included in each aggregation difference entry in the value compression table n bytes and a reference value to obtain a target value, the third single difference entry is a single difference entry whose bit value of the nth bit of the bit vector is the second value, and n is determined based on the subscript index value.
  • the processing request is a data deletion request
  • the processing module 704 is mainly used for:
  • the value compression table includes a third single difference table entry, update the bit value of the nth bit in the bit vector of the third single difference table entry to a third value, and the third single difference table entry is a bit vector
  • the bit value of the nth bit of is a single difference entry with the second value, n is determined based on the subscript index value, and the third value is different from the second value;
  • the value compression table includes the first aggregation difference entry
  • the nth byte in the byte storage field of the first aggregation difference entry is updated as The byte at the second mutation position in the reference value
  • the nth byte in the byte storage field of the first aggregation difference entry is the mutation byte at the second mutation position of the target value.
  • processing module 704 is also used for:
  • the first aggregated difference entry is split into multiple single difference entries.
  • the BT corresponds to a BT read-write lock
  • the BT read-write lock is used to indicate that when the BT is accessed based on a processing request, a read-write lock is performed on the BT.
  • the device 700 is also used for:
  • the read-write lock corresponding to the target BB is obtained from the lock pool.
  • the read-write lock corresponding to the target BB is used to indicate that during the value operation in the target BB based on the subscript index value, A read-write lock is performed on the target BB.
  • the lock pool includes multiple read-write locks, and at least one read-write lock among the multiple read-write locks corresponds to at least two BBs.
  • the device 700 is also used for:
  • the key is divided into multiple key fields, and the corresponding first bit is determined from the BT based on the previous first key field, and then, based on the subsequent second key field Determine the target BB from the multiple BBs corresponding to the first bit.
  • the same multiple BBs can be located through BT, that is, the embodiment of the present application integrates the indexes of the common parts of the keys, thereby reducing the space occupied by the index. Compare.
  • the key-value pair retrieval device when the key-value pair retrieval device provided in the above-mentioned embodiments performs retrieval, the division of the above-mentioned functional modules is used as an example for illustration. In practical applications, the above-mentioned function allocation can be completed by different functional modules according to needs. , which divides the internal structure of the device into different functional modules to complete all or part of the functions described above.
  • the key-value pair retrieval device and the key-value pair retrieval method embodiment provided in the above embodiments belong to the same idea, and the specific implementation process thereof is detailed in the method embodiment, and will not be repeated here.
  • all or part may be implemented by software, hardware, firmware or any combination thereof.
  • software When implemented using software, it may be implemented in whole or in part in the form of a computer program product.
  • the computer program product includes one or more computer instructions. When the computer instructions are loaded and executed on the computer, the processes or functions according to the embodiments of the present application will be generated in whole or in part.
  • the computer can be a general purpose computer, a special purpose computer, a computer network, or other programmable devices.
  • the computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from a website, computer, server or data center Transmission to another website site, computer, server or data center via wired (eg coaxial cable, optical fiber, Digital Subscriber Line (DSL)) or wireless (eg infrared, wireless, microwave, etc.).
  • the computer-readable storage medium may be any available medium that can be accessed by a computer, or a data storage device such as a server or a data center integrated with one or more available media.
  • the available medium may be a magnetic medium (for example: floppy disk, hard disk, magnetic tape), an optical medium (for example: Digital Versatile Disc (Digital Versatile Disc, DVD)), or a semiconductor medium (for example: Solid State Disk (Solid State Disk, SSD) )wait.
  • a magnetic medium for example: floppy disk, hard disk, magnetic tape
  • an optical medium for example: Digital Versatile Disc (Digital Versatile Disc, DVD)
  • a semiconductor medium for example: Solid State Disk (Solid State Disk, SSD)
  • the program can be stored in a computer-readable storage medium.
  • the above-mentioned The storage medium mentioned may be a read-only memory, a magnetic disk or an optical disk, and the like.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本申请实施例公开了一种键值对检索方法、装置及存储介质。在本申请实施例中,将键划分为多个键字段,之后,基于在前的第一键字段从BT中确定对应的第一比特位,之后,基于在后的第二键字段从该第一比特位对应的多个BB中确定目标BB。这样,对于具有相同的第一键字段的键,均能够通过BT定位到相同的多个BB,也即,本申请实施例对键的共有部分的索引进行了整合,从而降低了索引的空间占比。

Description

键值对检索方法、装置及存储介质
本申请要求于2022年02月28日提交的申请号为202210188396.1、发明名称为“键值对检索方法、装置及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及通信领域,特别涉及一种键值对检索方法、装置及存储介质。
背景技术
随着网络技术、社交媒体、物联网设备的不断发展,产生了海量的数据。这些数据可以通过键值存储系统来进行存储和管理。在键值存储系统中,数据的组织方式表现为键值对。其中,一个键值对中的值可以为具有任意类型、结构和大小的数据,值对应的键用于唯一标识该值。在键值存储系统中,数据的检索性能主要依赖于高效的索引技术,然而,目前诸如vEB索引、前缀树索引、全文本索引等索引技术,均存在索引的存储空间占比过高的问题。
发明内容
本申请实施例提供了一种键值对检索方法、装置、设备、存储介质以及程序产品,可以降低键值存储系统中索引的存储空间占比。所述技术方案如下:
第一方面,提供了一种键值对检索方法,所述方法包括:获取处理请求,所述处理请求包括目标键和所述目标键对应的目标值的下标索引值,所述目标键包括第一键字段和第二键字段,所述第一键字段位于所述第二键字段之前;基于所述第一键字段,从顶层位图(bitmap top,BT)中获取所述第一键字段对应的第一比特位上的比特值;在所述第一比特位上的比特值为第一数值的情况下,基于所述第二键字段,在所述第一比特位对应的多个底层位图(bitmap bottom,BB)中确定所述目标键对应的目标BB,每个BB用于存储多个值;基于所述下标索引值,在所述目标BB中进行值操作。
在本申请实施例中,将键划分为多个键字段,之后,基于在前的第一键字段从BT中确定对应的第一比特位,之后,基于在后的第二键字段从该第一比特位对应的多个BB中确定目标BB。这样,对于具有相同的第一键字段的键,均能够通过BT定位到相同的多个BB,也即,本申请实施例对键的共有部分的索引进行了整合,从而降低了索引的空间占比。
可选地,基于所述第二键字段,在所述第一比特位对应的底层位图BB中确定所述目标键对应的目标BB的实现过程可以为:获取所述第一比特位对应的第一哈希表;基于所述第二键字段和所述第一哈希表,在所述多个BB中确定所述目标BB。
其中,在一种可能的情况中,第一哈希表的每个表项中存储有一个BB的指示信息,基于此,根据第二键字段确定出第一哈希表中对应的表项,从确定出的表项中获取目标BB的指示信息,进而根据获取到的目标BB的指示信息在多个BB中确定目标BB。
在另一种可能的情况中,所述第二键字段包括第一子字段和第二子字段,所述基于所述 第二键字段和所述第一哈希表,在所述多个BB中确定所述目标BB的实现过程为:基于所述第一子字段和所述第一哈希表,确定所述目标键对应的目标中层位图BM,所述目标BM中存储有第二哈希表的指示信息;基于所述第二哈希表的指示信息,获取所述第二哈希表,所述第二哈希表中存储有所述多个BB的指示信息;基于所述第二子字段和所述第二哈希表,确定所述目标BB。
在该种情况中,BT对应的第一哈希表中的每个表项中存储有一个BM的指示信息。每个BM中存储有相应BM对应的中层哈希表的指示信息,每个中层哈希表的表项中存储有一个BB的指示信息。基于此,在确定出目标BM之后,基于目标BM中的哈希表的指示信息确定该目标BM对应的中层哈希表,也即第二哈希表。之后,基于第二子字段在第二哈希表中确定对应的表项,基于确定出的表项中的BB的指示信息确定目标BB。
由此可见,在本申请中,还可以再对第二键字段进行进一步的划分,进而基于在前的第一子字段和第一哈希表检索对应的BM,从而获得该BM中的第二哈希表。这样,对于同样包含有第一键字段和第一子字段的前缀的键,这些键所对应的BB将能够通过一个哈希表(也即第二哈希表)来检索得到,降低了哈希表的数据规模,提升了哈希表的操作效率。并且,通过BT和BM,能够对键的更多共有部分的索引进行整合,进一步降低索引的空间占比。
可选地,所述目标BB存储有参考值和值压缩表,所述值压缩表用于存储其他值与所述参考值之间的差异信息。
在本申请中,在BB中可以通过存储参考值、其他值与参考值之间的差异信息来存储多个值,这样,可以减少多个值中的冗余字段的存储,从而能够降低值的存储空间占用,提高值压缩率。
在通过值压缩表存储多个值时,在一种可能的情况中,所述值压缩表包括多个单差异表项,一个单差异表项用于存储一项所述差异信息,所述差异信息包括突变位置、突变字节和对应的位向量,所述突变位置用于指示所述其他值相较于所述参考值发生变化的字节的位置,所述突变字节为所述其他值相较于所述参考值发生变化的字节,所述位向量包括多个比特位,一个比特位与一个值对应,且所述多个比特位中所述其他值对应的比特位上的比特值为第二数值。
在另一种可能的情况中,所述值压缩表还包括至少一个聚合差异表项,其中,一个聚合差异表项通过对包括有相同的第一突变位置、不同的突变字节的多个单差异表项聚合得到,且所述聚合差异表项包括聚合标识、所述第一突变位置以及字节存放字段,所述字节存放字段中按照位向量中的各个比特位所对应的值的顺序,依次存储有各个值中位于所述第一突变位置上的字节。也即,在本申请中,对多个包括有相同的突变位置、不同的突变字节的单差异表项可以进行聚合,从而得到一个聚合差异表项,以此来减少表项的存储开销。
可选地,前述获取到的处理请求可以为数据插入请求,所述数据插入请求还包括所述目标值,在此基础上,所述基于所述下标索引值,在所述目标BB中进行值操作的实现过程可以为:将所述目标值的各个字节与所述参考值中对应的字节进行比较,得到所述目标值的突变位置和突变字节;基于所述下标索引值、所述目标值的突变位置和突变字节,在所述值压缩表中插入所述目标值与所述参考值之间的差异信息。
其中,在一种可能的实现方式中,在所述值压缩表包括第一单差异表项的情况下,将所述第一单差异表项中的位向量中第n个比特位的比特值更新为所述第二数值,以此实现目标 值的插入,其中,所述n基于所述下标索引值确定得到,所述第一单差异表项为包括所述目标值的突变位置和突变字节的单差异表项。
在另一种可能的实现方式中,在所述值压缩表不包括所述目标值的突变字节,且所述值压缩表包括的第二单差异表项的数目小于第一阈值的情况下,基于所述下标索引值生成所述目标值的位向量,所述目标值的位向量中第n个比特位的比特值为所述第二数值,所述n基于所述下标索引值确定得到,所述第二单差异表项为包括所述目标值的突变位置的单差异表项;基于所述目标值的突变位置、突变字节和位向量生成一个单差异表项,并将生成的单差异表项插入至所述值压缩表,以此实现目标值的插入。
在又一种可能的实现方式中,在所述值压缩表不包括所述目标值的突变字节,且所述值压缩表包括的第二单差异表项的数目不小于第一阈值的情况下,对所述第二单差异表项、所述目标值的突变字节和下标索引值进行聚合,得到目标聚合差异表项,以此实现目标值的插入,所述第二单差异表项为包括所述目标值的突变位置的单差异表项,所述第一阈值大于1,所述目标聚合差异表项占用的存储空间不大于多个第二单差异表项所占用的存储空间。也即,在该种实现方式中,可以通过将包括有与目标值的突变位置相同的突变位置但突变字节均不相同的多个单差异表项与目标值进行聚合,得到目标聚合差异表项,以此来减少表项所占用的空间,进而减少值压缩表的存储开销。
可选地,上述获取到的处理请求可以为数据查询请求,在此基础上,所述基于所述下标索引值,在所述目标BB中进行值操作的实现过程可以为:在所述值压缩表包括第三单差异表项的情况下,基于所述第三单差异表项中的突变位置和突变字节、所述值压缩表中每个聚合差异表项包括的字节存放字段中的第n个字节和所述参考值,获取所述目标值,所述第三单差异表项为位向量的第n个比特位的比特值为所述第二数值的单差异表项,所述n基于所述下标索引值确定得到。
可选地,上述获取到的处理请求可以为数据删除请求,在此基础上,所述基于所述下标索引值,在所述目标BB中进行值操作的实现过程可以为:在所述值压缩表包括第三单差异表项的情况下,将所述第三单差异表项的位向量中的第n个比特位的比特值更新为第三数值,所述第三单差异表项为位向量的第n个比特位的比特值为所述第二数值的单差异表项,所述n基于所述下标索引值确定得到,所述第三数值与所述第二数值不同;在所述值压缩表包括第一聚合差异表项的情况下,基于所述第一聚合差异表项包括的第二突变位置,将所述第一聚合差异表项的字节存放字段中的第n个字节更新为所述参考值中所述第二突变位置上的字节,所述第一聚合差异表项的字节存放字段中的第n个字节为所述目标值在所述第二突变位置上的突变字节。
可选地,在将第一聚合差异表项进行更新之后,当更新后的第一聚合差异表项的字节存放字段中的突变字节的数量小于第一阈值时,还可以再次将所述第一聚合差异表项拆分为多个单差异表项。也即是,如果该突变字节的种类数量小于第一阈值,则说明即使将该更新后的聚合差异表项拆分,得到的单差异表项的数量也将小于第一阈值,少于该聚合差异表项所占用的存储空间。在这种情况下,计算设备可以将更新后的聚合差异表项恢复为多个单差异表项。
可选地,所述BT对应有BT读写锁,所述BT读写锁用于指示在基于所述处理请求访问所述BT时,对所述BT进行读写锁定。通过该BT读写锁,可以控制对该BT的并发访问。
可选地,在所述目标BB中进行值操作之前,还包括:基于所述目标BB的读写锁标号,从锁池中获取所述目标BB对应的读写锁,所述目标BB对应的读写锁用于指示在基于所述下标索引值,在所述目标BB中进行值操作的过程中,对所述目标BB进行读写锁定,所述锁池包括多个读写锁,且所述多个读写锁中存在至少一个读写锁对应有至少两个BB。也即,可以通过BB读写锁来控制对BB的并发访问。并且,由于BB的数量较多,所以多个BB可以共享同一个读写锁,这样,能够降低读写锁的空间消耗。
可选地,在基于所述下标索引值,在所述目标BB中进行值操作之后,还可以释放所述目标BB对应的读写锁。
第二方面,提供了一种键值对检索装置,所述键值对检索装置具有实现上述第一方面中键值对检索行为的功能。所述键值对检索装置包括至少一个模块,该至少一个模块用于实现上述第一方面所提供的键值对检索方法。
示例性地,该键值对检索装置包括第一获取模块、第二获取模块、确定模块和处理模块。
其中,第一获取模块,用于获取处理请求,所述处理请求包括目标键和所述目标键对应的目标值的下标索引值,所述目标键包括第一键字段和第二键字段,所述第一键字段位于所述第二键字段之前;第二获取模块,用于基于所述第一键字段,从顶层位图BT中获取所述第一键字段对应的第一比特位上的比特值;确定模块,用于在所述第一比特位上的比特值为第一数值的情况下,基于所述第二键字段,在所述第一比特位对应的多个底层位图BB中确定所述目标键对应的目标BB,每个BB用于存储多个值;处理模块,用于基于所述下标索引值,在所述目标BB中进行值操作。
可选地,所述确定模块主要用于:获取所述第一比特位对应的第一哈希表;基于所述第二键字段和所述第一哈希表,在所述多个BB中确定所述目标BB。
可选地,所述第二键字段包括第一子字段和第二子字段,所述确定模块主要用于:基于所述第一子字段和所述第一哈希表,确定所述目标键对应的目标中层位图BM,所述目标BM中存储有第二哈希表的指示信息;基于所述第二哈希表的指示信息,获取所述第二哈希表,所述第二哈希表中存储有所述多个BB的指示信息;基于所述第二子字段和所述第二哈希表,确定所述目标BB。
可选地,所述目标BB存储有参考值和值压缩表,所述值压缩表用于存储其他值与所述参考值之间的差异信息。
可选地,所述值压缩表包括多个单差异表项,一个单差异表项用于存储一项所述差异信息,所述差异信息包括突变位置、突变字节和对应的位向量,所述突变位置用于指示所述其他值相较于所述参考值发生变化的字节的位置,所述突变字节为所述其他值相较于所述参考值发生变化的字节,所述位向量包括多个比特位,一个比特位与一个值对应,且所述多个比特位中所述其他值对应的比特位上的比特值为第二数值。
可选地,所述值压缩表还包括至少一个聚合差异表项,其中,一个聚合差异表项通过对包括有相同的第一突变位置、不同的突变字节的多个单差异表项聚合得到,且所述聚合差异表项包括聚合标识、所述第一突变位置以及字节存放字段,所述字节存放字段中按照位向量中的各个比特位所对应的值的顺序,依次存储有各个值中位于所述第一突变位置上的字节。
可选地,所述处理请求为数据插入请求,所述数据插入请求还包括所述目标值,所述处 理模块主要用于:将所述目标值的各个字节与所述参考值中对应的字节进行比较,得到所述目标值的突变位置和突变字节;基于所述下标索引值、所述目标值的突变位置和突变字节,在所述值压缩表中插入所述目标值与所述参考值之间的差异信息。
可选地,所述处理模块主要用于:在所述值压缩表包括第一单差异表项的情况下,将所述第一单差异表项中的位向量中第n个比特位的比特值更新为所述第二数值,所述n基于所述下标索引值确定得到,所述第一单差异表项为包括所述目标值的突变位置和突变字节的单差异表项。
可选地,所述处理模块主要用于:在所述值压缩表不包括所述目标值的突变字节,且所述值压缩表包括的第二单差异表项的数目小于第一阈值的情况下,基于所述下标索引值生成所述目标值的位向量,所述目标值的位向量中第n个比特位的比特值为所述第二数值,所述n基于所述下标索引值确定得到,所述第二单差异表项为包括所述目标值的突变位置的单差异表项;基于所述目标值的突变位置、突变字节和位向量生成一个单差异表项,并将生成的单差异表项插入至所述值压缩表。
可选地,所述处理模块主要用于:在所述值压缩表不包括所述目标值的突变字节,且所述值压缩表包括的第二单差异表项的数目不小于第一阈值的情况下,对所述第二单差异表项、所述目标值的突变字节和下标索引值进行聚合,得到目标聚合差异表项,所述第二单差异表项为包括所述目标值的突变位置的单差异表项,所述第一阈值大于1,所述目标聚合差异表项占用的存储空间不大于多个第二单差异表项所占用的存储空间。
可选地,所述处理请求为数据查询请求,所述处理模块主要用于:在所述值压缩表包括第三单差异表项的情况下,基于所述第三单差异表项中的突变位置和突变字节、所述值压缩表中每个聚合差异表项包括的字节存放字段中的第n个字节和所述参考值,获取所述目标值,所述第三单差异表项为位向量的第n个比特位的比特值为所述第二数值的单差异表项,所述n基于所述下标索引值确定得到。
可选地,所述处理请求为数据删除请求,所述处理模块主要用于:在所述值压缩表包括第三单差异表项的情况下,将所述第三单差异表项的位向量中的第n个比特位的比特值更新为第三数值,所述第三单差异表项为位向量的第n个比特位的比特值为所述第二数值的单差异表项,所述n基于所述下标索引值确定得到,所述第三数值与所述第二数值不同;在所述值压缩表包括第一聚合差异表项的情况下,基于所述第一聚合差异表项包括的第二突变位置,将所述第一聚合差异表项的字节存放字段中的第n个字节更新为所述参考值中所述第二突变位置上的字节,所述第一聚合差异表项的字节存放字段中的第n个字节为所述目标值在所述第二突变位置上的突变字节。
可选地,所述处理模块还用于:当更新后的第一聚合差异表项的字节存放字段中的突变字节的数量小于第一阈值时,将所述第一聚合差异表项拆分为多个单差异表项。
可选地,所述BT对应有BT读写锁,所述BT读写锁用于指示在基于所述处理请求访问所述BT时,对所述BT进行读写锁定。
可选地,所述装置还用于:基于所述目标BB的读写锁标号,从锁池中获取所述目标BB对应的读写锁,所述目标BB对应的读写锁用于指示在基于所述下标索引值,在所述目标BB中进行值操作的过程中,对所述目标BB进行读写锁定,所述锁池包括多个读写锁,且所述多个读写锁中存在至少一个读写锁对应有至少两个BB。
可选地,所述装置还用于:释放所述目标BB对应的读写锁。
第三方面,提供了一种键值对检索设备,该键值对检索设备包括处理器和存储器,所述存储器用于存储支持所述键值对检索设备执行上述第一方面所提供的键值对检索方法的程序,以及存储用于实现上述第一方面所提供的键值对检索方法所涉及的数据。所述处理器被配置为执行所述存储器中存储的程序。
第四方面,提供了一种计算机可读存储介质,所述计算机可读存储介质中存储有指令,当其在计算机上运行时,使得计算机执行上述第一方面所述的键值对检索方法。
第五方面,提供了一种包含指令的计算机程序产品,当其在计算机上运行时,使得计算机执行上述第一方面所述的键值对检索方法。
上述第二方面至第五方面所获得的技术效果与第一方面中对应的技术手段获得的技术效果近似,在这里不再赘述。
本申请实施例提供的技术方案带来的有益效果至少包括:
在本申请实施例中,将键划分为多个键字段,之后,基于在前的第一键字段从BT中确定对应的第一比特位,之后,基于在后的第二键字段从该第一比特位对应的多个BB中确定目标BB。这样,对于具有相同的第一键字段的键,均能够通过BT定位到相同的多个BB,也即,本申请实施例对键的共有部分的索引进行了整合,从而降低了索引的空间占比。
附图说明
图1是本申请实施例提供的一种键值对检索设备的结构示意图;
图2是本申请实施例提供的一种键值对检索方法的流程图;
图3是本申请实施例提供的一种值压缩表的示意图;
图4是本申请实施例提供的另一种值压缩表的示意图;
图5是本申请实施例提供的又一种值压缩表的示意图;
图6是本申请实施例提供的BT、BM和BB的读写锁示意图;
图7是本申请实施例提供的一种键值对检索装置的结构示意图。
具体实施方式
为使本申请实施例的目的、技术方案和优点更加清楚,下面将结合附图对本申请实施方式作进一步地详细描述。
在对本申请实施例进行详细的解释说明之前,首先对本申请实施例涉及的实施环境进行介绍。
图1是本申请实施例提供的一种键值对检索设备的结构示意图。下文实施例中提供的键值对检索方法即可以通过该键值对检索设备来执行。如图1所示,该键值对检索设备可以包括一个或多个处理器101、通信总线102、存储器103以及一个或多个通信接口104。
处理器101可以是一个通用中央处理器(central processing unit,CPU)、网络处理器(network processor,NP)、微处理器、或者可以是一个或多个用于实现本申请方案的集成电 路,例如,专用集成电路(application-specific integrated circuit,ASIC)、可编程逻辑器件(programmable logic device,PLD)或其组合。上述PLD可以是复杂可编程逻辑器件(complex programmable logic device,CPLD)、现场可编程逻辑门阵列(field-programmable gate array,FPGA)、通用阵列逻辑(generic array logic,GAL)或其任意组合。
通信总线102用于在上述组件之间传送信息。通信总线102可以分为地址总线、数据总线、控制总线等。为便于表示,图中仅用一条粗线表示,但并不表示仅有一根总线或一种类型的总线。
存储器103可以是只读存储器(read-only memory,ROM),也可以是随机存取存储器(random access memory,RAM),也可以是电可擦可编程只读存储器(electrically erasable programmable read-only memory,EEPROM)、光盘(包括只读光盘(compact disc read-only memory,CD-ROM)、压缩光盘、激光盘、数字通用光盘、蓝光光盘等)、磁盘存储介质或者其它磁存储设备,或者是能够用于携带或存储具有指令或数据结构形式的期望的程序代码并能够由计算机存取的任何其它介质,但不限于此。存储器103可以是独立存在,并通过通信总线102与处理器101相连接。或者,存储器103也可以和处理器101集成在一起。
通信接口104使用任何收发器一类的装置,用于与其它设备或通信网络通信。通信接口104包括有线通信接口,还可以包括无线通信接口。其中,有线通信接口例如可以为以太网接口。以太网接口可以是光接口,电接口或其组合。无线通信接口可以为无线局域网(wireless local area networks,WLAN)接口,蜂窝网络通信接口或其组合等。
在一些实施例中,键值对检索设备可以包括多个处理器,如图1中所示的处理器101和处理器105。这些处理器中的每一个可以是一个单核处理器,也可以是一个多核处理器。这里的处理器可以指一个或多个设备、电路、和/或用于处理数据(如计算机程序指令)的处理核。
在具体实现中,作为一种实施例,键值对检索设备还可以包括输出设备106和输入设备107。输出设备106和处理器101通信,可以以多种方式来显示信息。例如,输出设备106可以是液晶显示器(liquid crystal display,LCD)、发光二级管(light emitting diode,LED)显示设备、阴极射线管(cathode ray tube,CRT)显示设备或投影仪(projector)等。输入设备107和处理器101通信,可以以多种方式接收用户的输入。例如,输入设备107可以是鼠标、键盘、触摸屏设备或传感设备等。
在一些实施例中,存储器103用于存储执行本申请方案的程序代码108,处理器101可以执行存储器103中存储的程序代码108。该程序代码中可以包括一个或多个软件模块,该键值对检索设备可以通过处理器101以及存储器103中的程序代码108,来实现下文图2实施例提供的键值对检索方法。
需要说明的是,在本申请实施例中,键值对检索设备上可以部署有键值存储系统,该键值存储系统是指通过键值对进行数据存储的数据库。其中,该键值存储系统的索引可以存储在存储器中,处理器可以通过下述减少的键值对检索方法,基于存储器中存储的索引和其他相关数据,在该键值存储系统中进行数据存取和删除。
其中,键值对检索设备可以为服务器、终端设备、云端设备等,本申请实施例对此不做限定。
接下来对本申请实施例提供的键值对检索方法进行介绍。
图2是本申请实施例提供的一种键值对检索方法的流程图,该方法可以应用于前述的键值对检索设备中,在下文实施例中,将该键值对检索设备简称为检索设备,参见图2,该方法包括以下步骤:
步骤201:获取处理请求,该处理请求包括目标键和目标键对应的目标值的下标索引值,目标键包括第一键字段和第二键字段,第一键字段位于第二键字段之前。
在本申请实施例中,检索设备可以接收来自设备外部的处理请求,或者是根据用户操作生成处理请求。该处理请求可以为数据插入请求、数据查询请求或数据删除请求。其中,当该处理请求为数据查询请求或数据删除请求时,该数据查询请求或数据删除请求包括目标键和目标键对应的目标值的下标索引值,此时,该数据查询请求用于请求查询目标键对应的目标值,该数据删除请求用于请求删除目标键对应的目标值。当该处理请求为数据插入请求时,该数据插入请求包括目标键、目标值和目标值的下标索引值。其中,目标键和目标值为待插入的一个键值对,目标值的下标索引值用于指示该目标值在BB中的位向量中所对应的比特位。需要说明的是,在本申请实施例中,检索设备包括多个BB,每个BB用于存储多个值,例如,每个BB能够存储64个值。并且,每个BB中存储有位向量,该位向量包括多个比特位,每个比特位对应该BB中存储的一个值。例如,当每个BB中能够存储64个值时,BB中的位向量将包括64个比特位,每个比特位对应该BB中的一个值。关于BB的更为详细的介绍请参见后文说明。
检索设备在获取到处理请求之后,可以对该处理请求包括的目标键进行字段划分,从而得到第一键字段和第二键字段。其中,第一键字段位于第二键字段之前。
示例性地,在目标键的长度大于5个字节长度的情况下,检索设备可以将目标键的前三个字节作为第一键字段,将剩余的字节作为第二键字段。
步骤202:基于第一键字段,从BT中获取第一键字段对应的第一比特位上的比特值。
在得到目标键的第一键字段和第二键字段之后,检索设备可以通过该第一键字段来检索BT中对应的第一比特位。
需要说明的是,在本申请实施例中,BT包括多个比特位,每个比特位对应一个键字段和多个BB,其中,一个比特位对应的多个BB中的值所对应的键的前缀均为该比特位对应的键字段。并且,每个比特位的比特值可以为第一数值或其他数值。其中,当某个比特位的比特值为第一数值时,用于指示该键值存储系统中包括前缀为该比特位对应的键字段的键,当某个比特位的比特值为其他数值时,用于指示该键值存储系统中不包括有前缀为该比特位对应的键字段的键。其中,第一数值为1,其他数值为0,或者,第一数值为0,其他数值为1。
基于此,检索设备可以基于第一键字段,从BT包括的多个比特位中确定第一键字段对应的第一比特位,并获取该第一比特位上的比特值。之后,检索设备可以判断该第一比特位上的比特值是否为第一数值,进而基于处理请求来决定是否继续基于第二键字段进行后续的检索。
需要说明的是,在本申请实施例中,BT还可以对应有BT读写锁,该BT读写锁用于指示在基于某个处理请求访问BT时,对BT进行读写锁定。
其中,BT读写锁可以通过信号量来实现。例如,当检索设备中的某个线程在基于某个处理请求访问该BT时,该线程可以获取该信号量,这样,在该线程基于该处理请求访问该BT的过程中,其他线程将不能对该BT进行访问。后续,当该线程对该BT的访问结束之后,该 线程可以释放该信号量,以便后续其他要访问该BT的线程获取该信号量。由此可见,通过该BT读写锁,可以控制对该BT的并发访问。
步骤203:在第一比特位上的比特值为第一数值的情况下,基于第二键字段,在第一比特位对应的多个BB中确定目标键对应的目标BB,每个BB用于存储多个值。
如果检索设备确定出第一比特位上的比特值为第一数值,则说明该键值存储系统中包括前缀为第一键字段的键,在这种情况下,检索设备可以继续基于第二键字段,在第一比特位对应的多个BB中确定目标键对应的目标BB。
由前述步骤202中的介绍可知,BT中的每个比特位对应有多个BB。示例性地,BT中的每个比特位可以对应一个哈希表,该哈希表可以用于检索多个BB,这样,各个比特位通过对应的哈希表与多个BB对应。基于此,在本步骤中,检索设备首先获取第一比特位对应的第一哈希表,之后,基于第二键字段和第一哈希表,在多个BB中确定目标BB。
在一种可能的实现方式中,BT的每个比特位对应的哈希表的每个表项中存储有一个BB的指示信息,这样,每个比特位通过自身对应的哈希表与该哈希表中存储的指示信息所指示的多个BB对应。基于此,检索设备在获取到第一比特位对应的第一哈希表之后,可以基于第一哈希表对应的哈希函数对第二键字段进行哈希运算,得到对应的哈希值,之后,获取第一哈希表中该哈希值指示的表项中的BB的指示信息,此时,获取的BB的指示信息即为目标BB的指示信息,检索设备可以基于获取的BB的指示信息,确定出目标BB。
需要说明的是,BT中每个比特位对应的哈希表中存储的BB的指示信息可以为指向BB的一个指针,或者是其他能够唯一确定出BB的信息,本申请实施例对此不作限定。
在另一种可能的实现方式中,检索设备还可以对第二键字段进行进一步的划分,得到第一子字段和第二子字段。在这种情况下,该检索设备还包括多个BM,每个BM用于存储一个中层哈希表的指示信息,中层哈希表中的每个表项用于存储一个BB的指示信息,相应地,BT的每个比特位对应一个顶层哈希表,顶层哈希表的每个表项中存储有一个BM的指示信息,这样,BT中的每个比特位通过自身对应的顶层哈希表、顶层哈希表中存储的指示信息所指示的多个BM所对应的多个中层哈希表,与该多个中层哈希表中的指示信息所指示的多个BB对应。基于此,检索设备在获取到第一比特位对应的顶层哈希表,也即第一哈希表之后,可以基于第一子字段和第一哈希表,确定目标键对应的目标BM,目标BM中存储有第二哈希表的指示信息;基于第二哈希表的指示信息,获取第二哈希表,第二哈希表中存储有多个BB的指示信息;基于第二子字段和第二哈希表,确定目标BB。
其中,检索设备首先基于第一哈希表对应的哈希函数对第一子字段进行哈希运算,得到第一哈希值,之后,获取第一哈希表中该第一哈希值所指示的表项中的BM的指示信息,此时,获取的BM的指示信息即为目标BM的指示信息。检索设备基于该目标BM的指示信息可以确定出目标BM,进而从目标BM中获取第二哈希表的指示信息。之后,检索设备基于第二哈希表的指示信息获取第二哈希表,并基于第二哈希表对应的哈希函数对第二子字段进行哈希运算,得到第二哈希值。之后,检索设备获取第二哈希表中该第二哈希值所指示的表项中的BB的指示信息,此时,获取到的BB的指示信息即为目标BB的指示信息。检索设备基于该目标BB的指示信息即能够确定目标BB。
需要说明的是,上述的顶层哈希表中存储的BM的指示信息以及BM中存储的中层哈希表的指示信息均可以为指向相应BM和中层哈希表的指针,或者是其他能够唯一确定BM和 中层哈希表的信息,本申请实施例对此不作限定。
另外,在本申请实施例中,检索设备中的每个BM也可以具有对应的BM读写锁,这样,在检索设备基于某个处理请求对某个BM进行访问的过程中,可以对该BM进行锁定,以此来避免检索设备基于其他处理请求对该BM进行访问。其中,BM的读写锁的实现方式可以参考前述介绍的BT读写锁的实现方式,本申请实施例在此不再赘述。
可选地,在一些可能的情况中,第一比特位上的比特值可能不为第一数值,也即该键值存储系统中还未存储有前缀为第一键字段的键。在这种情况下,如果处理请求为数据插入请求,则检索设备首先为第一比特位创建对应的第一哈希表,并将第一比特位上的比特值置为第一数值。之后,检索设备基于第一哈希表对应的哈希函数对第一子字段进行哈希运算,得到第一哈希值,在第一哈希表中第一哈希值所指示的表项中存储目标BM的指示信息,并基于目标BM的指示信息创建目标BM。之后,在目标BM中存储第二哈希表的指示信息,并基于第二哈希表的指示信息创建第二哈希表。之后,基于第二哈希表对应的哈希函数对第二子字段进行哈希运算得到第二哈希值,在第二哈希表中第二哈希值所指示的表项中存储一个当前还未存储有值的BB的指示信息,该BB即为目标BB。
当然,如果检索设备中不包括BM,则检索设备可以参考上述方式直接在第一哈希表中第一哈希值所指示的表项中存储一个当前还未存储有值的BB的指示信息,该BB即为目标BB。
可选地,如果处理请求为数据删除请求或数据查询请求,则在第一比特位上的比特值不为第一数值的情况下,检索设备结束操作。
步骤204:基于下标索引值,在目标BB中进行值操作。
在确定出目标BB之后,检索设备可以基于处理请求中的下标索引值,在目标BB中进行该处理请求所对应的值操作。
由前述介绍可知,BB用于存储多个值。在本申请实施例中,BB可以通过两种不同的方式来存储值,根据BB存储值的方式的不同,检索设备基于下标索引值在目标BB中进行值操作的实现方式也不同。
在第一种实现方式中,BB通过存储参考值和值压缩表来存储多个值,其中,该参考值为该BB中存储的任意一个值。该值压缩表用于存储其他值与参考值之间的差异信息。这样,基于该值压缩表中存储的差异信息和参考值,即能恢复出多个值。在这种情况下,检索设备可以基于下标索引值、该参考值和值压缩表,在该值压缩表中进行该处理请求对应的值操作。
示例性地,值压缩表可以包括多个单差异表项,一个单差异表项用于存储一项差异信息,该差异信息包括突变位置、突变字节和对应的位向量,该突变位置用于指示其他值相较于参考值发生变化的字节的位置,突变字节为其他值相较于参考值发生变化的字节,位向量包括多个比特位,一个比特位与一个值对应,且多个比特位中该其他值对应的比特位上的比特值为第二数值。其中,当该多个比特位的初始比特值为0时,该第二数值可以为1,当该多个比特位的初始比特值为1时,第二数值可以为0。另外,在本申请实施例中,位向量的第一个比特位对应的值可以为参考值。
需要说明的是,在本申请实施例中,一个BB中存储的各个值的长度是相同的。另外,一个BB中的值压缩表中的各个单差异表项中的位向量包括的比特位的个数是相同的,均等于该BB能够存储的值的数量。例如,当目标BB能够存储64个值时,目标BB的值压缩表中 的各个单差异表项中的位向量将包括64个比特位,每个比特位用于指示一个值。其中,对于任一个值而言,这个值对应的比特位可以通过这个值的下标索引值来指示,例如,当某个值的下标索引值为0时,则这个值对应的比特位即为位向量中的第1个比特位,当某个值的下标索引值为4时,则对应第5个比特位,以此类推。另外,当某个单差异表项中的位向量中的某个比特位上的比特值为第二数值时,则说明该比特位对应的值中该单差异表项包括的突变位置上的字节与参考值中该突变位置上的字节不同,也即,该比特位对应的值中该突变位置上的字节即为该单差异表项包括的突变字节。
由于一个单差异表项中存放的差异信息包括突变位置、突变字节和位向量,因此,在本申请实施例中,值压缩表中的一个单差异表项可以包括三个字段,分别为第一字段、第二字段和第三字段,其中,第一字段用于存放突变位置,第二字段用于存放突变字节,第三字段用于存放位向量。
图3为本申请实施例给出的一种值压缩表中的单差异表项的示例。假设该目标BB中存储的64个值的长度均为8个字节长度,则其他值相较于参考值而言,发生突变的字节的位置可以为第1个至第8个字节中任一个字节,这样,单差异表项中的突变位置可以通过一个8比特的值来表示,也即,第一字段的长度为8比特。这个8比特的值的取值范围可以为0至7,其中,0用于指示突变位置为第一个字节,7用于指示突变位置为第8个字节,以此类推。另外,单差异表项的第二字段用于存放突变字节,一个突变字节的长度也为8比特。除此之外,该单差异表项的第三字段用于存放位向量,由于目标BB能够存储64个值,所以,该位向量包括64个比特位,也即该位向量的长度为64比特。这样,一个单差异表项将占用10个字节。如图3所示,单差异表项A中的突变位置为0,突变字节为0x02,这样,当单差异表项A中的位向量中某个比特位的比特值为第二数值时,例如,为1时,则说明该比特位对应的值中的第一个字节与参考值的第一个字节不同,其中,该比特位对应的值的第一个字节为0x02。再例如,单差异表项B中的突变位置为3,突变字节为0x03,则当单差异表项B中的位向量中某个比特位的比特值为1时,则说明该比特位对应的值中的第四个字节与参考值的第四个字节不同,其中,该比特位对应的值的第四个字节为0x03。
可选地,在一种可能的实现方式中,本申请实施例中的值压缩表除了包括单差异表项之外,还可以包括至少一个聚合差异表项,在本申请实施例中,至少一个包括一个或多个,至少n个包括n个或更多。其中,一个聚合差异表项通过对包括有相同的第一突变位置、不同的突变字节的多个单差异表项聚合得到,且该聚合差异表项包括聚合标识、第一突变位置以及字节存放字段,该字节存放字段中按照位向量中的各个比特位所对应的值的顺序,依次存储有各个值中位于第一突变位置上的字节。其中,如果某个比特位对应的值还未存储至该目标BB中,则该比特位对应的值在该第一突变位置上的字节为参考值在该第一突变位置上的字节。
需要说明的是,在有些情况下,各个值在相同的突变位置上的突变字节可能不同,例如,有8个值,每个值的第一个字节相对于参考值的第一个字节均发生了突变,也即,均与参考值的第一个字节不同,并且,这8个值的第一个字节也各不相同,这样,需要分别将这8个值的突变存储为8个单差异表项,当这种突变位置相同但是突变字节不同的单差异表项较多时,将会增加存储开销。基于此,在本申请实施例中,对多个包括有相同的突变位置、不同的突变字节的单差异表项可以进行聚合,从而得到一个聚合差异表项,以此来减少表项的存 储开销。
示例性地,对于值压缩表中包括的突变位置相同但突变字节不同的多个单差异表项,检索设备可以统计该多个单差异表项的数量是否小于第一阈值,其中,该第一阈值可以根据一个聚合差异表项所需占用的字节长度与一个单差异表项所需占用的字节长度来确定。
其中,一个聚合差异表项可以包括聚合标识、突变位置和字节存放字段。聚合标识可以通过1比特的标记位来实现,突变位置所占用的字节长度则可以根据目标BB存储的值的字节长度来确定,例如,目标BB存储的值为8字节长度的值,则突变位置所占用的字节长度可以为7个比特。由于字节存放字段要用于存放位向量中的各个比特位所对应的值在该聚合差异表项所包括的突变位置上的字节,对于目标BB中还未存储有对应的值的某个比特位,该比特位对应的位置上将存储参考值在该突变位置上的字节,这样,字节存放字段所占用的字节长度将为该目标BB存储的值的长度,例如,目标BB中存储的值为8字节长度的值,则字节存放字段将占用8字节。而一个单差异表项中突变位置所占用的字节同样可以根据目标BB存储的值的字节长度来确定,例如,目标BB存储的值为8字节长度的值,则突变位置所占用的字节长度可以为7或8个比特。突变字节所占用的字节长度为8个比特,位向量所占用的字节长度即为目标BB中能够存储的值的数量个比特。例如,目标BB能够存储64个值,则位向量将占用64个比特。基于此,检索设备可以计算一个聚合差异表项所需占用的字节长度与一个单差异表项所需占用的字节长度之间的比值,基于该比值来确定第一阈值。例如,以上述示例来说,一个聚合差异表项所需占用的字节长度为65个字节,而一个单差异表项需要占用10个字节,则检索设备确定出二者的比值为6.5,在这种情况下,检索设备可以将大于6.5的整数作为第一阈值,例如,第一阈值为7或8或其他数值。这样,当包括的突变位置相同但突变字节不同的多个单差异表项的数目不小于第一阈值时,也即,当存在7个或7个以上的此类单差异表项时,将其进行聚合之后,通过65个字节即能够记录可能在该突变位置上发生的所有突变,但是如果用单差异表项来记录这些差异时,则需要超过65个字节,由此可见,通过进行聚合,可以减少值压缩表所占用的存储空间。
需要说明的是,为了尽可能的不改变值压缩表的结构,在本申请实施例中,该聚合差异表项可以通过占用多个单差异表项包括的字段来实现。示例性地,该聚合差异表项可以通过占用的第一个单差异表项中的第一字段来存放聚合标识和突变位置。此时,第一个单差异表项中的第一字段的最高位比特可以作为聚合标识的标记位,当该标记位取值为1时,用于指示该表项为聚合差异表项。该第一字段的剩余比特可以用于指示突变位置。第一个单差异表项中的第二字段可以空置,或者,该第二字段中可以用于存放后续的字节存放字段中包括的突变字节的种类数量。由于字节存放字段中存放的字节所占用的长度可能远远大于单差异表项中的位向量所占用的字节长度,所以,该聚合差异表项中的字节存放字段可以通过第一个单差异表项的第三字段以及剩余的其他单差异表项包括的字段来实现。
例如,当目标BB中能够存储64个值且每个值的长度为8字节时,一个单差异表项的第一字段占用8比特、第二字段占用8比特,第三字段占用64比特,在这种情况下,参见图4,聚合差异表项所占用的第一个单差异表项的第一字段的最高位比特的比特值为1,用于标识该表项为聚合差异表项,第一字段的剩余比特用于指示第一突变位置为0,也即后续的字节存放字段中存放的是各个值或参考值的第一个字节。占用的第一个单差异表项的第二字段空置。从第一个单差异表项的第三字段开始按照位向量的各个比特位的顺序存储各个比特位对 应的值在该第一突变位置上的字节。例如,位向量的第1个比特位对应参考值,参考值的第一个字节为0x01。第2至4个比特位对应的值的第一个字节分别为0x02、0x03、0x04,第5至7个比特位对应的值还未存储至目标BB中,因此,这三个位置上均存放参考值的第一个字节,第8个比特位对应的值的第一个字节分别为0x05,由于第一个单差异表项中的第三字段最多放八个字节,所以,位向量的第9个比特位对应的值的第一个字节存放在占用的下一个单差异表项的第一字段中,例如,第9个比特位对应的值的第一个字节为0x06,则在下一个单差异表项的第一字段中存放0x06,第10个比特位对应的值的第一个字节存放在下一个单差异表项的第二字段中,以此类推,直至将位向量的各个比特位对应的值的第一个字节存放完为止。
在一种可能的实现方式中,上述对单差异表项进行聚合之后,也可以得到聚合差异表项组,也即,本申请实施例中的值压缩表除了包括单差异表项之外,还可以包括至少一个聚合差异表项组。其中,一个聚合差异表项组通过对多个包括有相同的突变位置、不同的突变字节的单差异表项聚合得到,一个聚合差异表项组包括多个聚合差异表项,每个聚合差异表项包括聚合标识、突变位置以及字节存放字段,该多个聚合差异表项中的突变位置均为第一突变位置,且该多个聚合差异表项中的多个字节存放字段中按照位向量中的各个比特位所对应的值的顺序,依次存储有各个值中位于第一突变位置上的字节。其中,如果某个比特位对应的值还未存储至该目标BB中,则该比特位对应的值在该第一突变位置上的字节为参考值在该第一突变位置上的字节。
示例性地,对于值压缩表中包括的突变位置相同但突变字节不同的多个单差异表项,检索设备可以统计该多个单差异表项的数量是否小于第一阈值,其中,该第一阈值可以为在目标BB能够存储的除参考值之外的全部值中相同位置上的突变字节均不同的情况下所需占用的聚合差异表项的数量,或者是大于该数量的值。例如,当该目标BB能够存储64个值时,在63个值中相同位置上的突变字节均不同的情况下,该突变位置将会对应有63个突变字节,为了保证值压缩表的结构不变,可以通过单差异表项原本的位向量的字段来存储这63个字节,由于一个表项中的一个位向量字段能够存储8个字节,这样,将需要8个表项中的位向量的字段来存储63个字节,也即,聚合得到的聚合差异表项组将包括8个聚合差异表项。此时,可以将第一阈值设置为8或者是大于8的值。这样,在包括的突变位置相同但突变字节不同的多个单差异表项的数量不小于第一阈值的情况下,通过对该多个单差异表项进行聚合之后,得到的聚合差异表项组包括的聚合差异表项的数量将不大于多个单差异表项的数量,所占的存储空间将小于多个单差异表项所占的存储空间,这样,有利于减少表项的存储开销。
例如,以目标BB中能够存储64个值,第一阈值为8为例,当存在8个包括的突变位置均为第一突变位置,但突变字节不同的单差异表项时,检索设备可以将这8个单差异表项聚合,得到一个聚合差异表项组,该聚合表项差异组包括8个聚合差异表项,其中,每个聚合差异表项包括聚合标识、突变位置以及字节存放字段。参见图5,为了保证值压缩表的结构不变,每个聚合差异表项也包括第一字段、第二字段和第三字段。不同的是,可以将第一字段的最高比特位作为标记位,通过该标记位上的值来指示该表项为聚合差异表项,例如,当该标记位的值为1时,指示该表项为聚合差异表项,当该标记位上的值为0时,指示该表项为单差异表项。该第一字段的剩余比特可以用于指示突变位置。每个聚合差异表项中的第二字段可以空置,第三字段则可以用于按照位向量中每个比特位所对应的值的顺序来存放各个值 在该突变位置上的字节,也即,该第三字段即为字节存放字段。例如,参见图5,在该聚合差异表项组的第一个聚合差异表项中,第一字段存放的第一突变位置为0,用于指示值的第一个字节与参考值不同,第二字段空置,第三字段也即字节存放字段存储有8个字节,这8个字节为单差异表项中的位向量中前8个比特位所对应的值中的第一个字节。需要说明的是,如果位向量的前8个比特位中某个比特位对应的值还未存储至该目标BB中,则在该字节存放字段中该比特位对应的位置上可以存储参考值的第一个字节。例如,参见图5,位向量的第1个比特位对应参考值,参考值的第一个字节为0x01,位向量的第2至4个比特位对应的值的第一个字节分别为0x02、0x03、0x04,第5至7个比特位对应的值还未存储至目标BB中,第8个比特位对应的值的第一个字节分别为0x05,这样,第一个聚合差异表项的字节存放字段中存放的字节将为0x0102030401010105。同理,该聚合差异表项组的第二个聚合差异表项中的第一字段仍然存放第一突变位置,第二字段仍然空置,第三字段仍为字节存放字段,用于存放位向量中接下来的8个比特位对应的值中的第一个字节,以此类推。
由此可见,当值压缩表中包括有相同突变位置不同突变字节的多个单差异表项的数量不小于第一阈值时,则最少需要第一阈值数量的单差异表项,后续如果再插入的值在该突变位置上的突变字节与该多个单差异表项都不同,则需要继续增加单差异表项,这样,包括有该突变位置但突变字节不同的多个单差异表项的数量将超过第一阈值。而通过本申请实施例提供的方法将多个单差异表项进行聚合之后,得到的聚合差异表项组中包括的聚合差异表项的数目最多等于第一阈值,即使后续再插入的值在该突变位置上的突变字节与该多个单差异表项都不同,也可以通过更改聚合差异表项组中的字节存放字段中该值对应位置上的字节来实现。由于一个单差异表项所占用的字节长度与一个聚合差异表项所占用的字节长度相同,所以,通过将多个单差异表项进行聚合后,在表项数目减少的情况下,表项所占用的存储空间也会减少,也即,有利于减少值压缩表的存储开销。
可选地,在一些可能的情况中,聚合差异表项组包括的聚合差异表项中还可以包括该聚合差异表项组包括的多个字节存放字段所存放的突变字节的种类数量。其中,该突变字节的种类数量可以存放在第二字段中。
例如,当某个聚合差异表项组包括的突变位置为位置A时,该聚合差异表项组中的多个字节存放字段所存放的多个字节中有m个字节与参考值在位置A上的字节不同,此时,该聚合差异表项组中存放的突变字节的数量即为m。在这种情况下,可以将该m存放在该聚合差异表项组中的第一个聚合差异表项的第二字段中,当然,也可以在该聚合差异表项组中的任一个或每个聚合差异表项的第二字段中存放m。
由前述步骤201中的介绍可知,处理请求可以为数据插入请求、数据查询请求和数据删除请求中的任一个,在此基础上,基于上述的值压缩表,检索设备对于不同的处理请求可以执行不同的值操作。
第一种情况:当处理请求为数据插入请求时,检索设备将数据插入请求包括的目标值的各个字节与目标BB中的参考值中对应的字节进行比较,得到目标值的突变位置和突变字节;之后,基于下标索引值、目标值的突变位置和突变字节,在值压缩表中插入目标值与参考值之间的差异信息。
其中,检索设备可以将数据插入请求中的目标值和目标BB中的参考值进行逐字节比对,也即,将目标值的第一个字节与参考值的第一个字节进行比较,将目标值的第二个字节与参 考值的第二个字节进行比对,以此类推,从而得到目标值包括的多个字节中与参考值对应的字节不同的字节,这些与参考值对应的字节不同的字节即为该目标值中的突变字节,检索设备可以记录目标值中的每个突变字节以及该突变字节在该目标值中所处的位置,将该位置作为突变位置。其中,目标值中的突变字节可能有一个,也可能有多个,相应地,突变字节所在的突变位置可能也有一个或多个。
在得到目标值中的突变字节和突变位置之后,以任一突变字节和该突变字节所在的突变位置为例,为了方便说明,将该突变位置称为位置A,突变字节称为字节a。检索设备可以在该目标BB的值压缩表中查找是否存在包含该位置A和该字节a的第一单差异表项,如果查找到该第一单差异表项,则说明当前目标BB存储的值中已经有某些值相较于参考值发生了该种类型的突变,在这种情况下,检索设备可以基于下标索引值,将第一单差异表项中的位向量中的第n个比特位的比特值更新为第二数值,其中,当下标索引值的最小取值为0时,则n等于该下标索引值加1,如果下标索引值的最小取值为1,则n等于该下标索引值。
需要说明的是,目标值的下标索引值能够指示出该目标值在位向量中对应的比特位。基于此,检索设备基于目标值的下标索引值,能够确定出第一单差异表项中的位向量中的第n个比特位即为该目标值对应的比特位,在这种情况下,检索设备可以将该第一单差异表项中的位向量中的第n个比特位的比特值更新为第二数值,以此来指示该目标值的位置A上的字节为字节a,与参考值的位置A上的字节不同。
可选地,如果检索设备在目标BB的值压缩表中未查找到包括有位置A和字节a的第一单差异表项,则该检索设备可以直接生成目标值的位向量,此时,该目标值的位向量中该目标值对应的比特位的比特值为第二数值,其余值为0。之后,检索设备可以基于该位置A、字节a和该目标值的位向量生成一个单差异表项,之后,将该单差异表项插入至该值压缩表中。
其中,在将该生成的单差异表项插入至值压缩表时,检索设备可以查找该目标BB中包括的突变位置为位置A的单差异表项,如果存在突变位置为位置A的单差异表项,则将生成的单差异表项插入至查找到的单差异表项之后。
可选地,如果检索设备在目标BB的值压缩表中未查找到包括有位置A和字节a的第一单差异表项,但是该值压缩表中存在包括位置A的第二单差异表项,检索设备也可以根据该值压缩表中包括的第二单差异表项的数量,来决定是直接基于位置A和字节a生成一个单差异表项,还是基于多个第二单差异表项、该位置A和字节a得到一个聚合表项组。
示例性地,检索设备可以将第二单差异表项的数目与第一阈值进行比较,在值压缩表包括的第二单差异表项的数目小于第一阈值的情况下,基于下标索引值生成目标值的位向量,该目标值的位向量中第n个比特位的比特值为第二数值,其中,当下标索引值的最小取值为0时,则n等于该下标索引值加1,如果下标索引值的最小取值为1,则n等于该下标索引值;基于目标值的突变位置、突变字节和位向量生成一个单差异表项,并将生成的单差异表项插入至该值压缩表。
需要说明的是,如果值压缩表包括的第二单差异表项的数目小于第一阈值,则说明该值压缩表中包括的突变位置为位置A但突变字节不同的单差异表项较少,此时,即使将多个第二单差异表项进行聚合,得到的聚合差异表项组中的聚合差异表项的数目可能也不会少于该多个第二单差异表项的数目,在这种情况下,将多个第二单差异表项进行聚合所带来的存储开销上的收益不是很明显甚至没有,基于此,检索设备可以直接基于下标索引值生成目标值 的位向量,之后,基于该位置A、字节a和该位向量生成一个单差异表项,并将生成的单差异表项插入至多个第二单差异表项中的最后一个第二单差异表项之后。
可选地,如果值压缩表包括的第二单差异表项的数目不小于第一阈值,则说明该值压缩表中包括的突变位置为位置A但突变字节不同的单差异表项较多,此时,检索设备可以对多个第二单差异表项、该目标值的突变字节和下标索引值进行聚合,得到包括的聚合差异表项的数量不大于第一阈值的目标聚合差异表项组,以此来提高表项的利用率,减少表项数目,进而减少值压缩表的存储开销。
需要说明的是,检索设备可以参考前述介绍的对多个单差异表项进行聚合的方法,生成目标聚合差异表项组,之后,该检索设备可以基于下标索引值,在该目标聚合差异表项组包括的多个字节存放字段存放的多个字节中确定第n个字节。将该第n个字节更改为目标值在该位置A上的突变字节,也即,更改为字节a。
可选地,如果检索设备在该值压缩表中未查找到包括的突变位置为该位置A的单差异表项,但是查找到了包括该位置A的聚合差异表项或聚合差异表项组,则该检索设备也可以直接基于该下标索引值,将该字节a插入到该聚合差异表项或聚合表项差异组中。其中,在将该字节a插入该聚合差异表项或聚合表项差异组时,检索设备可以在该聚合差异表项或聚合差异表项组包括的字节存放字段存放的多个字节中确定第n个字节。将该第n个字节更改为目标值在该位置A上的突变字节,也即,更改为字节a。
需要说明的是,在将第n个字节更改为目标值在该位置A上的突变字节之后,如果该聚合差异表项或聚合差异表项组中还包括有突变字节的种类数量,则检索设备可以重新统计字节存放字段中存储的多个字节中与参考值的位置A上的字节不同的突变字节的种类数量,若种类数量不变,则结束操作,如果种类数量增加,也即,之前字节存放字段中不存在字节a这种突变字节,则检索设备将包括的突变字节的种类数量加1,以完成更新。
上述主要介绍了当处理请求为数据插入请求时,检索设备如何基于值压缩表,将数据插入请求中的目标值插入至该值压缩表的过程。接下来介绍当处理请求为数据查询请求或数据删除请求时,检索设备如何获取该数据查询请求中的目标键所对应的目标值的过程。
第二种情况:当处理请求为数据查询请求时,在目标BB中的值压缩表包括第三单差异表项的情况下,基于第三单差异表项中的突变位置和突变字节、该值压缩表中每个聚合差异表项组包括的多个字节存放字段中的第n个字节和参考值,获取目标值,第三单差异表项为位向量的第n个比特位的比特值为所述第二数值的单差异表项。
其中,检索设备首先可以获取参考值。之后,从该值压缩表的第一个表项开始遍历,对于任一表项,检索设备可以根据该表项的第一字段中的最高比特位的比特值来判断该表项为单差异表项还是聚合差异表项,如果该表项为单差异表项,则检索设备基于下标索引值,检测该单差异表项的位向量的第n个比特位的比特值是否为第二数值,如果该单差异表项的位向量的第n个比特位的比特值为第二数值,则说明目标值在该单差异表项包括的突变位置上的字节即为该单差异表项中的突变字节,在这种情况下,检索设备可以将参考值中该单差异表项包括的突变位置上的字节修改为该单差异表项中的突变字节。当然,如果该单差异表项的位向量的第n个比特位的比特值不为第二数值,则说明目标值相较于参考值而言,在该单差异表项包括的突变位置上未发生变化,也即,该目标值在该突变位置上的字节与参考值在该突变位置上的字节相同,在这种情况下,检索设备保持参考值中该突变位置上的字节不变, 然后继续查看下一个表项。
可选地,在值压缩表中多个单差异表项聚合得到的为聚合差异表项组的情况下,如果检索设备根据表项的第一字段中的最高比特位的比特值确定该表项为聚合差异表项,则检索设备可以获取与该聚合差异表项包括有相同的突变位置的其他聚合差异表项,以此来确定出该聚合差异表项所属的聚合差异表项组。之后,检索设备可以根据下标索引值,获取该聚合差异表项组的多个字节存放字段中存放的第n个字节。之后,基于该聚合差异表项组包括的突变位置,将参考值中该突变位置上的字节替换为获取的字节。在值压缩表中多个单差异表项聚合得到的为聚合差异表项的情况下,如果检索设备根据表项的第一字段中的最高比特位的比特值确定该表项为聚合差异表项,则该检索设备可以获取该聚合差异表项的字节存放字段中存放的第n个字节,之后,基于该聚合差异表项包括的突变位置,将参考值中该突变位置上的字节替换为获取的字节。
如此,在通过上述方法将值压缩表的表项逐一遍历,对参考值进行修改后最终得到的值即为目标值。
第三种情况:当处理请求为数据删除请求时,在值压缩表包括第三单差异表项的情况下,将第三单差异表项的位向量中的第n个比特位的比特值更新为第三数值,第三单差异表项为位向量的第n个比特位的比特值为第二数值的单差异表项,第三数值与第二数值不同;在值压缩表包括第一聚合差异表项组的情况下,基于第一聚合差异表项组包括的第二突变位置,将第一聚合差异表项组的多个字节存放字段中的第n个字节更新为参考值中第二突变位置上的字节,第一聚合差异表项组的多个字节存放字段中的第n个字节为目标值在第二突变位置上的突变字节。
其中,检索设备可以参考前述处理数据查询请求的方式,遍历该值压缩表中的表项。如果某个表项为单差异表项,且该单差异表项中的第n个比特位的比特值为第二数值,则说明目标值在该单差异表项包括的突变位置上的字节即为该单差异表项中的突变字节,也即,目标值在该突变位置上的字节与参考值在该突变位置上的字节不同,此时,检索设备可以将该单差异表项中的位向量的第n个比特位的比特值更改为第三数值,以此来删除目标值与参考值在该突变位置上的差异。其中,当第二数值为1时,第三数值为0,当第二数值为0时,第三数值为1。
可选地,如果检索设备在将该单差异表项中的位向量的第n个比特位的比特值更改为第三数值之后,该位向量中不存在比特值为第二数值的比特位,则检索设备可以将该单差异表项删除。
如果某个表项为聚合差异表项,则检索设备可以参考前述介绍的方式,基于下标索引值,获取该聚合差异表项或该聚合差异表项所属的第一聚合差异表项组的字节存放字段中的第n个字节,并基于该聚合表项包括的第二突变位置,获取参考值中的第二突变位置上的字节。如果获取的两个字节不同,则说明字节存放字段中的第n个字节即为目标值在第二突变位置上的突变字节,在这种情况下,检索设备可以将字节存放字段中的第n个字节更改为参考值中第二突变位置上的字节,以此来删除目标值与参考值在该第二突变位置上的差异。
需要说明的是,如果聚合差异表项还包括字节存放字段中包括的突变字节的种类数量,则在将字节存放字段中的第n个字节更改为参考值在第二突变位置上的字节之后,检索设备还可以参考前述介绍的方式重新统计字节存放字段中包括的突变字节的种类数量,从而决定 是否对当前存储的突变字节的种类数量进行更新。
如此,在通过上述方法将值压缩表的表项逐一遍历删除目标值与参考值在各字节上的差异之后,即完成了对目标值的删除。
需要说明的是,在本申请实施例中,检索设备在通过上述介绍的方法对聚合差异表项或聚合差异表项所属的第一聚合表项组中记录的目标值与参考值在第二突变位置上的差异进行删除之后,如果更新后的聚合差异表项或第一聚合差异表项组的字节存放字段中的突变字节的种类数量小于第一阈值,则将该聚合差异表项或该聚合差异表项所属的第一聚合差异表项组拆分为多个单差异表项。
其中,由前述介绍可知,聚合差异表项或聚合差异表项组可以包括字节存放字段中包括的突变字节的种类数量。在这种情况下,检索设备可以获取突变字节的种类数量,如果该突变字节的种类数量小于第一阈值,则说明即使将该更新后的聚合差异表项或第一聚合差异表项组拆分,得到的单差异表项的数量也将小于第一阈值,少于该聚合差异表项或第一聚合表项组所占用的存储空间。在这种情况下,检索设备可以将更新后的聚合差异表项或第一聚合差异表项组恢复为多个单差异表项。
其中,以聚合差异表项组为例,检索设备可以从更新后的第一聚合差异表项组中的第一个聚合差异表项开始,将该聚合差异表项的突变位置作为一个单差异表项的突变位置,将该聚合差异表项的字节存放字段中的第一个突变字节作为该单差异表项的突变字节,之后,根据该突变字节在该聚合差异表项组包括的多个字节存放字段的多个字节中所处的位置,确定该突变字节对应的值在位向量中对应的比特位,进而基于该突变字节对应的值在位向量中对应的比特位,生成该单差异表项的位向量,其中,该单差异表项的位向量中该突变字节对应的值所对应的比特位的比特值为第二数值,其余比特为第三数值。
在拆分得到第一个单差异表项之后,检索设备继续获取第一个聚合差异表项的字节存放字段中的第二个突变字节,如果第二个突变字节与已拆分得到的单差异表项中的突变字节相同,则根据第二个突变字节在该聚合差异表项组包括的多个字节存放字段的多个字节中所处的位置,确定第二个突变字节对应的值在位向量中对应的比特位,然后,基于第二个突变字节对应的值在位向量中对应的比特位,将该已拆分得到的单差异表项中位向量的相应比特位上的比特值更新为第二数值。当然,如果第二个突变字节与已拆分得到的单差异表项中的突变字节不同,则可以参考上述介绍的方法,基于第二个突变字节,拆分得到一个单差异表项。
通过上述方法,检索设备可以遍历多个字节存放字段中的突变字节,从而依次拆分得到数量小于第一阈值的多个单差异表项。
如果值压缩表中多个单差异表项压缩得到的一个聚合差异表项,则检索设备同样可以参考上述方法,基于该聚合差异表项的字节存放字段中存放的多个字节,依次拆分得到多个单差异表项。
可选地,在一些可能的情况中,聚合差异表项可能未记录有突变字节的种类数量。在这种情况下,检索设备也可以获取参考值中该第二突变位置上的字节。之后,基于获取的字节,统计更新后的聚合差异表项或第一聚合差异表项组的字节存放字段存放的多个字节中与获取的字节不同的字节的种类数量,该种类数量即为更新后的聚合差异表项或第一聚合差异表项组中的突变字节的种类数量。如果突变字节的种类数量小于第一阈值,则检索设备可以参考上述方式对更新后的聚合差异表项或第一聚合差异表项组进行拆分,从而得到多个单差异表 项。
可选地,在一些可能的情况中,检索设备基于数据删除请求对目标BB中的目标值进行删除之后,该目标BB可能会变为空,也即,该目标BB中不再存储有值,在这种情况下,如果检索设备包括BM,则检索设备还可以将前述步骤203中确定的第二哈希表中第二哈希值所指示的表项中存储的目标BB的指示信息删除。进一步地,如果第二哈希表的所有表项为空,则检索设备可以将第二哈希表删除,并将目标BM中存储的第二哈希表的指示信息删除,之后,将第一哈希表中第一哈希值所指示的表项中存储的目标BM的指示信息删除。进一步地,如果第一哈希表中的所有表项为空,则检索设备删除第一哈希表,并将第一比特位的比特值设置为除第一数值之外的其他数值。
同理,如果检索设备不包括BM,则检索设备可以将步骤203中确定的第一比特位对应的哈希表中第二键字段的哈希值所指示的表项中存储的目标BB的指示信息删除。如果第一比特位对应的哈希表中的所有表项为空,则检索设备删除该哈希表,并将第一比特位的比特值设置为除第一数值之外的其他数值。
上文中主要介绍了目标BB通过参考值和值压缩表来存储多个值时,检索设备基于该值压缩表,在目标BB中进行处理请求对应的值操作的实现方式。
可选地,在另一种可能的实现方式中,目标BB可以直接存储一个位向量和多个值,其中,该位向量包括多个比特位,每个比特位用于指示一个值,其中,当某个比特位的比特值为第二数值时,用于指示目标BB中存储有该比特位对应的值,此时,该比特位将对应有一个值,当某个比特位为第三数值时,用于指示该目标BB中未存储有该比特位对应的值,此时,该比特位将没有对应的值。
基于此,当处理请求为数据插入请求时,检索设备首先基于数据插入请求中的下标索引值确定目标值在该位向量中对应的比特位。如果该目标值在该位向量中对应的比特位的比特值为第二数值,则说明该目标BB中已经存储有目标键对应的值,在这种情况下,检索设备可以将目标BB中该比特位对应的值替换为目标值。如果该目标值在该位向量中对应的比特位的比特值为第三数值,则说明该目标BB中未存储有目标键对应的值,在这种情况下,检索设备可以将该目标值存储为该比特位对应的值,之后,将该比特位的比特值更改为第二数值。
可选地,当处理请求为数据查询请求时,检索设备首先基于数据查询请求中的下标索引值在该位向量中确定对应的比特位。如果确定的比特位的比特值为第二数值,则获取该比特位对应的值,获取的值即为目标值。如果确定出的比特位的比特值为第三数值,则生成用于指示未查询到目标键对应的值的查询结果。
可选地,当处理请求为数据删除请求时,检索设备首先基于数据删除请求中的下标索引值在该位向量中确定对应的比特位。如果确定的比特位的比特值为第二数值,则删除该比特位对应的值。如果确定出的比特位的比特值为第三数值,则生成删除失败消息。
可选地,在本申请实施例中,检索设备还包括锁池,该锁池中可以包括多个读写锁,该多个读写锁中存在至少一个读写锁与至少两个BB对应。其中,至少一个包括一个或多个,至少两个包括两个或多于两个。例如,该多个读写锁中的某个读写锁与两个或两个以上的BB对应,某个读写锁与一个BB对应。也即是,该多个读写锁中存在有多个BB共用的读写锁。其中,多个读写锁中的各个读写锁可以通过读写锁标号进行区分,这样,检索设备可以存储 BB与其对应的读写锁标号之间的对应关系。基于此,检索设备在目标BB中进行值操作之前,还可以基于目标BB的读写锁标号,从锁池中获取目标BB对应的读写锁,以便在目标BB中进行值操作的过程中,对目标BB进行读写锁定。后续,在目标BB中的值操作结束之后,检索设备可以释放该目标BB对应的读写锁,以便该读写锁对应的其他BB使用。
其中,读写锁的实现方式可以参考前述介绍的BT读写锁的实现方式,本申请实施例在此不再赘述。
需要说明的是,由于BB的数量较多,所以本申请实施例中多个BB可以共享同一个读写锁,这样,能够降低读写锁的空间消耗。
示例性地,图6是本申请实施例示出的BT、BM和BB的读写锁示意图。如图6所示,BT对应有BT读写锁,多个BM中的每个BM对应有BM读写锁,且各个BM对应的BM读写锁不同。而多个BB的读写锁可以从锁池中获取,且锁池中的一个读写锁可以由多个BB共享,也可以一个BB使用一个独立的读写锁。例如,BB1和BB3共享读写锁2,BB5单独使用读写锁3,BB8单独使用读写锁4。
基于上述介绍的键值对检索方法,本申请实施例对采用该种方法的键值存储数据库的索引性能进行了模拟测试。其中,以12字节长度的键和8字节长度的值为例,按照第一键字段为3字节长度、第二键字段中的第一子字段为3字节长度、第二子字段为6字节长度对键进行划分,每个BB中最多存储64个值。初始键值对为(0,0),按照键递增1,值递增0x2000的模式,生成60亿条键值对。在该键值存储数据库中插入该60亿条键值对,查看实际占用的空间。之后,顺序插入10000个键值对,查看数据插入操作的平均延时。之后,删除10000个键值对,每个键值对中的键间隔64,查看数据删除操作的平均延时。另外,随机查询10000个键值对,查看数据查询操作的平均延时。
其中,在该键值存储数据库中插入60亿条键值对后,BT、BM、BB以及BT和BM对应的顶层哈希表和中层哈希表实际占用的空间、上述所有信息占用的总空间如下表1所示。
表1模拟测试中索引的各部分占用的存储空间大小
Figure PCTCN2022137906-appb-000001
由此可见,本申请实施例提供的索引实际占用的空间大小与60亿条键值对实际占用的空间大小相比,压缩率达到了23.8%。
数据插入操作的平均延时、数据删除操作的平均延时以及数据查询操作的平均延时如下表2所示。
表2模拟测试中各种操作的平均延时
操作 数据插入操作 数据查询操作 数据删除操作
时延(ns) 1132.8 511.5 875.5
由此可见,基于本申请实施例提供的操作方法,能够尽可能的将各种操作的延时控制在1微秒以内。
综上可见,首先,在本申请实施例中,将键划分为多个键字段,基于在前的第一键字段从BT中确定对应的第一比特位,之后,基于在后的第二键字段从该第一比特位对应的多个BB中确定目标BB。这样,对于具有相同的第一键字段的键,均能够通过BT定位到相同的 多个BB,也即,本申请实施例对键的共有部分的索引进行了整合,从而降低了索引的空间占比。
其次,当检索设备还包括BM时,检索设备还可以再对第二键字段进行进一步的划分,进而基于在前的第一子字段和第一哈希表检索对应的BM,从而获得该BM中的第二哈希表。这样,对于同样包含有第一键字段和第一子字段的前缀的键,这些键所对应的BB将能够通过一个哈希表(也即第二哈希表)来检索得到,降低了哈希表的数据规模,提升了哈希表的操作效率。并且,通过BT和BM,能够对键的更多共有部分的索引进行整合,进一步降低索引的空间占比。
第三,在本申请实施例中,在进行数据插入操作时,对于未存储的键,可以在检索过程中,在BT、顶层哈希表或中层哈希表中创建插入对应的数据,进而在目标BB中插入目标值,而在进行数据删除操作时,可以在检索过程中删除BT、BM、BB或哈希表中记录的相应元素。由此可见,在本申请实施例中,在进行数据插入或删除操作时,无需对索引结构进行重构,索引的动态更新性能较好。
最后,在本申请实施例中,在BB中可以通过存储参考值、其他值与参考值之间的差异信息来存储多个值,这样,可以减少多个值中的冗余字段的存储,从而能够降低值的存储空间占用,提高值压缩率。
接下来对本申请实施例提供的键值对检索装置进行介绍。
图7是本申请实施例提供的一种键值对检索装置的结构示意图。如图7所示,该键值对检索装置700包括第一获取模块701、第二获取模块702、确定模块703和处理模块704,其中:
第一获取模块701,用于执行前述实施例中的步骤201;
第二获取模块702,用于执行前述实施例中的步骤202;
确定模块703,用于执行前述实施例中的步骤203;
处理模块704,用于执行前述实施例中的步骤204。
需要说明的是,上述的各个模块可以通过前述介绍的键值对检索设备中的处理器执行存储器中存储的计算机指令来实现。
可选地,确定模块703主要用于:
获取第一比特位对应的第一哈希表;
基于第二键字段和第一哈希表,在多个BB中确定目标BB。
可选地,第二键字段包括第一子字段和第二子字段,确定模块703主要用于:
基于第一子字段和第一哈希表,确定目标键对应的目标中层位图BM,目标BM中存储有第二哈希表的指示信息;
基于第二哈希表的指示信息,获取第二哈希表,第二哈希表中存储有多个BB的指示信息;
基于第二子字段和第二哈希表,确定目标BB。
可选地,目标BB存储有参考值和值压缩表,值压缩表用于存储其他值与参考值之间的差异信息。
可选地,值压缩表包括多个单差异表项,一个单差异表项用于存储一项差异信息,差异 信息包括突变位置、突变字节和对应的位向量,突变位置用于指示其他值相较于参考值发生变化的字节的位置,突变字节为其他值相较于参考值发生变化的字节,位向量包括多个比特位,一个比特位与一个值对应,且多个比特位中其他值对应的比特位上的比特值为第二数值。
可选地,值压缩表还包括至少一个聚合差异表项,其中,一个聚合差异表项通过对包括有相同的第一突变位置、不同的突变字节的多个单差异表项聚合得到,且聚合差异表项包括聚合标识、第一突变位置以及字节存放字段,字节存放字段中按照位向量中的各个比特位所对应的值的顺序,依次存储有各个值中位于第一突变位置上的字节。
可选地,处理请求为数据插入请求,数据插入请求还包括目标值,处理模块704主要用于:
将目标值的各个字节与参考值中对应的字节进行比较,得到目标值的突变位置和突变字节;
基于下标索引值、目标值的突变位置和突变字节,在值压缩表中插入目标值与参考值之间的差异信息。
可选地,处理模块704主要用于:
在值压缩表包括第一单差异表项的情况下,将第一单差异表项中的位向量中第n个比特位的比特值更新为第二数值,n基于下标索引值确定得到,第一单差异表项为包括目标值的突变位置和突变字节的单差异表项。
可选地,处理模块704主要用于:
在值压缩表不包括目标值的突变字节,且值压缩表包括的第二单差异表项的数目小于第一阈值的情况下,基于下标索引值生成目标值的位向量,目标值的位向量中第n个比特位的比特值为第二数值,n基于下标索引值确定得到,第二单差异表项为包括目标值的突变位置的单差异表项;
基于目标值的突变位置、突变字节和位向量生成一个单差异表项,并将生成的单差异表项插入至值压缩表。
可选地,处理模块704主要用于:
在值压缩表不包括目标值的突变字节,且值压缩表包括的第二单差异表项的数目不小于第一阈值的情况下,对第二单差异表项、目标值的突变字节和下标索引值进行聚合,得到目标聚合差异表项,第二单差异表项为包括目标值的突变位置的单差异表项,第一阈值大于1,目标聚合差异表项占用的存储空间不大于多个第二单差异表项所占用的存储空间。
可选地,处理请求为数据查询请求,处理模块704主要用于:
在值压缩表包括第三单差异表项的情况下,基于第三单差异表项中的突变位置和突变字节、值压缩表中每个聚合差异表项包括的字节存放字段中的第n个字节和参考值,获取目标值,第三单差异表项为位向量的第n个比特位的比特值为第二数值的单差异表项,n基于下标索引值确定得到。
可选地,处理请求为数据删除请求,处理模块704主要用于:
在值压缩表包括第三单差异表项的情况下,将第三单差异表项的位向量中的第n个比特位的比特值更新为第三数值,第三单差异表项为位向量的第n个比特位的比特值为第二数值的单差异表项,n基于下标索引值确定得到,第三数值与第二数值不同;
在值压缩表包括第一聚合差异表项的情况下,基于第一聚合差异表项包括的第二突变位 置,将第一聚合差异表项的字节存放字段中的第n个字节更新为参考值中第二突变位置上的字节,第一聚合差异表项的字节存放字段中的第n个字节为目标值在第二突变位置上的突变字节。
可选地,处理模块704还用于:
当更新后的第一聚合差异表项的字节存放字段中的突变字节的数量小于第一阈值时,将第一聚合差异表项拆分为多个单差异表项。
可选地,BT对应有BT读写锁,BT读写锁用于指示在基于处理请求访问BT时,对BT进行读写锁定。
可选地,该装置700还用于:
基于目标BB的读写锁标号,从锁池中获取目标BB对应的读写锁,目标BB对应的读写锁用于指示在基于下标索引值,在目标BB中进行值操作的过程中,对目标BB进行读写锁定,锁池包括多个读写锁,且多个读写锁中存在至少一个读写锁对应有至少两个BB。
可选地,该装置700还用于:
释放目标BB对应的读写锁。
综上所述,在本申请实施例中,将键划分为多个键字段,基于在前的第一键字段从BT中确定对应的第一比特位,之后,基于在后的第二键字段从该第一比特位对应的多个BB中确定目标BB。这样,对于具有相同的第一键字段的键,均能够通过BT定位到相同的多个BB,也即,本申请实施例对键的共有部分的索引进行了整合,从而降低了索引的空间占比。
需要说明的是:上述实施例提供的键值对检索装置在进行检索时,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将设备的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。另外,上述实施例提供的键值对检索装置与键值对检索方法实施例属于同一构思,其具体实现过程详见方法实施例,这里不再赘述。
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意结合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机指令时,全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如:同轴电缆、光纤、数据用户线(Digital Subscriber Line,DSL))或无线(例如:红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质(例如:软盘、硬盘、磁带)、光介质(例如:数字通用光盘(Digital Versatile Disc,DVD))、或者半导体介质(例如:固态硬盘(Solid State Disk,SSD))等。
本领域普通技术人员可以理解实现上述实施例的全部或部分步骤可以通过硬件来完成,也可以通过程序来指令相关的硬件完成,所述的程序可以存储于一种计算机可读存储介质中, 上述提到的存储介质可以是只读存储器,磁盘或光盘等。
以上所述并不用以限制本申请实施例,凡在本申请实施例的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本申请实施例的保护范围之内。

Claims (34)

  1. 一种键值对检索方法,其特征在于,所述方法包括:
    获取处理请求,所述处理请求包括目标键和所述目标键对应的目标值的下标索引值,所述目标键包括第一键字段和第二键字段,所述第一键字段位于所述第二键字段之前;
    基于所述第一键字段,从顶层位图BT中获取所述第一键字段对应的第一比特位上的比特值;
    在所述第一比特位上的比特值为第一数值的情况下,基于所述第二键字段,在所述第一比特位对应的多个底层位图BB中确定所述目标键对应的目标BB,每个BB用于存储多个值;
    基于所述下标索引值,在所述目标BB中进行值操作。
  2. 根据权利要求1所述的方法,其特征在于,所述基于所述第二键字段,在所述第一比特位对应的底层位图BB中确定所述目标键对应的目标BB,包括:
    获取所述第一比特位对应的第一哈希表;
    基于所述第二键字段和所述第一哈希表,在所述多个BB中确定所述目标BB。
  3. 根据权利要求2所述的方法,其特征在于,所述第二键字段包括第一子字段和第二子字段,所述基于所述第二键字段和所述第一哈希表,在所述多个BB中确定所述目标BB,包括:
    基于所述第一子字段和所述第一哈希表,确定所述目标键对应的目标中层位图BM,所述目标BM中存储有第二哈希表的指示信息;
    基于所述第二哈希表的指示信息,获取所述第二哈希表,所述第二哈希表中存储有所述多个BB的指示信息;
    基于所述第二子字段和所述第二哈希表,确定所述目标BB。
  4. 根据权利要求1-3任一所述的方法,其特征在于,所述目标BB存储有参考值和值压缩表,所述值压缩表用于存储其他值与所述参考值之间的差异信息。
  5. 根据权利要求4所述的方法,其特征在于,所述值压缩表包括多个单差异表项,一个单差异表项用于存储一项所述差异信息,所述差异信息包括突变位置、突变字节和对应的位向量,所述突变位置用于指示所述其他值相较于所述参考值发生变化的字节的位置,所述突变字节为所述其他值相较于所述参考值发生变化的字节,所述位向量包括多个比特位,一个比特位与一个值对应,且所述多个比特位中所述其他值对应的比特位上的比特值为第二数值。
  6. 根据权利要求5所述的方法,其特征在于,所述值压缩表还包括至少一个聚合差异表项,其中,一个聚合差异表项通过对包括有相同的第一突变位置、不同的突变字节的多个单差异表项聚合得到,且所述聚合差异表项包括聚合标识、所述第一突变位置以及字节存放字段,所述字节存放字段中按照位向量中的各个比特位所对应的值的顺序,依次存储有各个值 中位于所述第一突变位置上的字节。
  7. 根据权利要求5或6所述的方法,其特征在于,所述处理请求为数据插入请求,所述数据插入请求还包括所述目标值,所述基于所述下标索引值,在所述目标BB中进行值操作,包括:
    将所述目标值的各个字节与所述参考值中对应的字节进行比较,得到所述目标值的突变位置和突变字节;
    基于所述下标索引值、所述目标值的突变位置和突变字节,在所述值压缩表中插入所述目标值与所述参考值之间的差异信息。
  8. 根据权利要求7所述的方法,其特征在于,所述基于所述下标索引值、所述目标值的突变位置和突变字节,在所述值压缩表中插入所述目标值与所述参考值之间的差异信息,包括:
    在所述值压缩表包括第一单差异表项的情况下,将所述第一单差异表项中的位向量中第n个比特位的比特值更新为所述第二数值,所述n基于所述下标索引值确定得到,所述第一单差异表项为包括所述目标值的突变位置和突变字节的单差异表项。
  9. 根据权利要求7所述的方法,其特征在于,所述基于所述下标索引值、所述目标值的突变位置和突变字节,在所述值压缩表中插入所述目标值与所述参考值之间的差异信息,包括:
    在所述值压缩表不包括所述目标值的突变字节,且所述值压缩表包括的第二单差异表项的数目小于第一阈值的情况下,基于所述下标索引值生成所述目标值的位向量,所述目标值的位向量中第n个比特位的比特值为所述第二数值,所述n基于所述下标索引值确定得到,所述第二单差异表项为包括所述目标值的突变位置的单差异表项;
    基于所述目标值的突变位置、突变字节和位向量生成一个单差异表项,并将生成的单差异表项插入至所述值压缩表。
  10. 根据权利要求7所述的方法,其特征在于,所述基于所述下标索引值、所述目标值的突变位置和突变字节,在所述值压缩表中插入所述目标值与所述参考值之间的差异信息,包括:
    在所述值压缩表不包括所述目标值的突变字节,且所述值压缩表包括的第二单差异表项的数目不小于第一阈值的情况下,对所述第二单差异表项、所述目标值的突变字节和下标索引值进行聚合,得到目标聚合差异表项,所述第二单差异表项为包括所述目标值的突变位置的单差异表项,所述第一阈值大于1,所述目标聚合差异表项占用的存储空间不大于多个第二单差异表项所占用的存储空间。
  11. 根据权利要求6所述的方法,其特征在于,所述处理请求为数据查询请求,所述基于所述下标索引值,在所述目标BB中进行值操作,包括:
    在所述值压缩表包括第三单差异表项的情况下,基于所述第三单差异表项中的突变位置和突变字节、所述值压缩表中每个聚合差异表项包括的字节存放字段中的第n个字节和所述参考值,获取所述目标值,所述第三单差异表项为位向量的第n个比特位的比特值为所述第 二数值的单差异表项,所述n基于所述下标索引值确定得到。
  12. 根据权利要求6所述的方法,其特征在于,所述处理请求为数据删除请求,所述基于所述下标索引值,在所述目标BB中进行值操作,包括:
    在所述值压缩表包括第三单差异表项的情况下,将所述第三单差异表项的位向量中的第n个比特位的比特值更新为第三数值,所述第三单差异表项为位向量的第n个比特位的比特值为所述第二数值的单差异表项,所述n基于所述下标索引值确定得到,所述第三数值与所述第二数值不同;
    在所述值压缩表包括第一聚合差异表项的情况下,基于所述第一聚合差异表项包括的第二突变位置,将所述第一聚合差异表项的字节存放字段中的第n个字节更新为所述参考值中所述第二突变位置上的字节,所述第一聚合差异表项的字节存放字段中的第n个字节为所述目标值在所述第二突变位置上的突变字节。
  13. 根据权利要求12所述的方法,其特征在于,所述方法还包括:
    当更新后的第一聚合差异表项的字节存放字段中的突变字节的数量小于第一阈值时,将所述第一聚合差异表项拆分为多个单差异表项。
  14. 根据权利要求1-13任一所述的方法,其特征在于,所述BT对应有BT读写锁,所述BT读写锁用于指示在基于所述处理请求访问所述BT时,对所述BT进行读写锁定。
  15. 根据权利要求1-14任一所述的方法,其特征在于,所述在所述目标BB中进行值操作之前,还包括:
    基于所述目标BB的读写锁标号,从锁池中获取所述目标BB对应的读写锁,所述目标BB对应的读写锁用于指示在基于所述下标索引值,在所述目标BB中进行值操作的过程中,对所述目标BB进行读写锁定,所述锁池包括多个读写锁,且所述多个读写锁中存在至少一个读写锁对应有至少两个BB。
  16. 根据权利要求15所述的方法,其特征在于,所述基于所述下标索引值,在所述目标BB中进行值操作之后,还包括:
    释放所述目标BB对应的读写锁。
  17. 一种键值对检索装置,其特征在于,所述装置包括:
    第一获取模块,用于获取处理请求,所述处理请求包括目标键和所述目标键对应的目标值的下标索引值,所述目标键包括第一键字段和第二键字段,所述第一键字段位于所述第二键字段之前;
    第二获取模块,用于基于所述第一键字段,从顶层位图BT中获取所述第一键字段对应的第一比特位上的比特值;
    确定模块,用于在所述第一比特位上的比特值为第一数值的情况下,基于所述第二键字段,在所述第一比特位对应的多个底层位图BB中确定所述目标键对应的目标BB,每个BB 用于存储多个值;
    处理模块,用于基于所述下标索引值,在所述目标BB中进行值操作。
  18. 根据权利要求17所述的装置,其特征在于,所述确定模块主要用于:
    获取所述第一比特位对应的第一哈希表;
    基于所述第二键字段和所述第一哈希表,在所述多个BB中确定所述目标BB。
  19. 根据权利要求18所述的装置,其特征在于,所述第二键字段包括第一子字段和第二子字段,所述确定模块主要用于:
    基于所述第一子字段和所述第一哈希表,确定所述目标键对应的目标中层位图BM,所述目标BM中存储有第二哈希表的指示信息;
    基于所述第二哈希表的指示信息,获取所述第二哈希表,所述第二哈希表中存储有所述多个BB的指示信息;
    基于所述第二子字段和所述第二哈希表,确定所述目标BB。
  20. 根据权利要求17-19任一所述的装置,其特征在于,所述目标BB存储有参考值和值压缩表,所述值压缩表用于存储其他值与所述参考值之间的差异信息。
  21. 根据权利要求20所述的装置,其特征在于,所述值压缩表包括多个单差异表项,一个单差异表项用于存储一项所述差异信息,所述差异信息包括突变位置、突变字节和对应的位向量,所述突变位置用于指示所述其他值相较于所述参考值发生变化的字节的位置,所述突变字节为所述其他值相较于所述参考值发生变化的字节,所述位向量包括多个比特位,一个比特位与一个值对应,且所述多个比特位中所述其他值对应的比特位上的比特值为第二数值。
  22. 根据权利要求21所述的装置,其特征在于,所述值压缩表还包括至少一个聚合差异表项,其中,一个聚合差异表项通过对包括有相同的第一突变位置、不同的突变字节的多个单差异表项聚合得到,且所述聚合差异表项包括聚合标识、所述第一突变位置以及字节存放字段,所述字节存放字段中按照位向量中的各个比特位所对应的值的顺序,依次存储有各个值中位于所述第一突变位置上的字节。
  23. 根据权利要求21或22所述的装置,其特征在于,所述处理请求为数据插入请求,所述数据插入请求还包括所述目标值,所述处理模块主要用于:
    将所述目标值的各个字节与所述参考值中对应的字节进行比较,得到所述目标值的突变位置和突变字节;
    基于所述下标索引值、所述目标值的突变位置和突变字节,在所述值压缩表中插入所述目标值与所述参考值之间的差异信息。
  24. 根据权利要求23所述的装置,其特征在于,所述处理模块主要用于:
    在所述值压缩表包括第一单差异表项的情况下,将所述第一单差异表项中的位向量中第 n个比特位的比特值更新为所述第二数值,所述n基于所述下标索引值确定得到,所述第一单差异表项为包括所述目标值的突变位置和突变字节的单差异表项。
  25. 根据权利要求23所述的装置,其特征在于,所述处理模块主要用于:
    在所述值压缩表不包括所述目标值的突变字节,且所述值压缩表包括的第二单差异表项的数目小于第一阈值的情况下,基于所述下标索引值生成所述目标值的位向量,所述目标值的位向量中第n个比特位的比特值为所述第二数值,所述n基于所述下标索引值确定得到,所述第二单差异表项为包括所述目标值的突变位置的单差异表项;
    基于所述目标值的突变位置、突变字节和位向量生成一个单差异表项,并将生成的单差异表项插入至所述值压缩表。
  26. 根据权利要求23所述的装置,其特征在于,所述处理模块主要用于:
    在所述值压缩表不包括所述目标值的突变字节,且所述值压缩表包括的第二单差异表项的数目不小于第一阈值的情况下,对所述第二单差异表项、所述目标值的突变字节和下标索引值进行聚合,得到目标聚合差异表项,所述第二单差异表项为包括所述目标值的突变位置的单差异表项,所述第一阈值大于1,所述目标聚合差异表项占用的存储空间不大于多个第二单差异表项所占用的存储空间。
  27. 根据权利要求22所述的装置,其特征在于,所述处理请求为数据查询请求,所述处理模块主要用于:
    在所述值压缩表包括第三单差异表项的情况下,基于所述第三单差异表项中的突变位置和突变字节、所述值压缩表中每个聚合差异表项包括的字节存放字段中的第n个字节和所述参考值,获取所述目标值,所述第三单差异表项为位向量的第n个比特位的比特值为所述第二数值的单差异表项,所述n基于所述下标索引值确定得到。
  28. 根据权利要求22所述的装置,其特征在于,所述处理请求为数据删除请求,所述处理模块主要用于:
    在所述值压缩表包括第三单差异表项的情况下,将所述第三单差异表项的位向量中的第n个比特位的比特值更新为第三数值,所述第三单差异表项为位向量的第n个比特位的比特值为所述第二数值的单差异表项,所述n基于所述下标索引值确定得到,所述第三数值与所述第二数值不同;
    在所述值压缩表包括第一聚合差异表项的情况下,基于所述第一聚合差异表项包括的第二突变位置,将所述第一聚合差异表项的字节存放字段中的第n个字节更新为所述参考值中所述第二突变位置上的字节,所述第一聚合差异表项的字节存放字段中的第n个字节为所述目标值在所述第二突变位置上的突变字节。
  29. 根据权利要求28所述的装置,其特征在于,所述处理模块还用于:
    当更新后的第一聚合差异表项的字节存放字段中的突变字节的数量小于第一阈值时,将所述第一聚合差异表项拆分为多个单差异表项。
  30. 根据权利要求17-29任一所述的装置,其特征在于,所述BT对应有BT读写锁,所述BT读写锁用于指示在基于所述处理请求访问所述BT时,对所述BT进行读写锁定。
  31. 根据权利要求17-30任一所述的装置,其特征在于,所述装置还用于:
    基于所述目标BB的读写锁标号,从锁池中获取所述目标BB对应的读写锁,所述目标BB对应的读写锁用于指示在基于所述下标索引值,在所述目标BB中进行值操作的过程中,对所述目标BB进行读写锁定,所述锁池包括多个读写锁,且所述多个读写锁中存在至少一个读写锁对应有至少两个BB。
  32. 根据权利要求31所述的装置,其特征在于,所述装置还用于:
    释放所述目标BB对应的读写锁。
  33. 一种键值对检索设备,其特征在于,所述键值对检索设备包括处理器和存储器,所述存储器用于存储计算机程序,所述处理器用于执行所述存储器中存储的计算机程序,以实现权利要求1-16任一所述的键值对检索方法。
  34. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质中存储有指令,当所述指令在计算机上运行时,使得所述计算机执行权利要求1-16任一所述的键值对检索方法。
PCT/CN2022/137906 2022-02-28 2022-12-09 键值对检索方法、装置及存储介质 WO2023160115A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210188396.1 2022-02-28
CN202210188396.1A CN116701386A (zh) 2022-02-28 2022-02-28 键值对检索方法、装置及存储介质

Publications (1)

Publication Number Publication Date
WO2023160115A1 true WO2023160115A1 (zh) 2023-08-31

Family

ID=87764608

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/137906 WO2023160115A1 (zh) 2022-02-28 2022-12-09 键值对检索方法、装置及存储介质

Country Status (2)

Country Link
CN (1) CN116701386A (zh)
WO (1) WO2023160115A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117493342A (zh) * 2023-11-06 2024-02-02 广州方舟信息科技有限公司 商品数据更新方法、装置、电子设备和存储介质
CN117493342B (zh) * 2023-11-06 2024-05-31 北京方易行信息科技有限公司 商品数据更新方法、装置、电子设备和存储介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6070164A (en) * 1998-05-09 2000-05-30 Information Systems Corporation Database method and apparatus using hierarchical bit vector index structure
CN102722531A (zh) * 2012-05-17 2012-10-10 北京大学 一种云环境中基于分片位图索引的查询方法
CN107491487A (zh) * 2017-07-17 2017-12-19 中国科学院信息工程研究所 一种全文数据库架构及位图索引创建、数据查询方法、服务器及介质
CN110019292A (zh) * 2017-09-06 2019-07-16 华为技术有限公司 一种数据的查询方法及装置
CN113157689A (zh) * 2020-01-22 2021-07-23 腾讯科技(深圳)有限公司 数据索引方法、装置及电子设备
CN114090575A (zh) * 2021-10-27 2022-02-25 北京搜狗科技发展有限公司 基于键值数据库的数据存储方法、检索方法及相应的装置

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6070164A (en) * 1998-05-09 2000-05-30 Information Systems Corporation Database method and apparatus using hierarchical bit vector index structure
CN102722531A (zh) * 2012-05-17 2012-10-10 北京大学 一种云环境中基于分片位图索引的查询方法
CN107491487A (zh) * 2017-07-17 2017-12-19 中国科学院信息工程研究所 一种全文数据库架构及位图索引创建、数据查询方法、服务器及介质
CN110019292A (zh) * 2017-09-06 2019-07-16 华为技术有限公司 一种数据的查询方法及装置
CN113157689A (zh) * 2020-01-22 2021-07-23 腾讯科技(深圳)有限公司 数据索引方法、装置及电子设备
CN114090575A (zh) * 2021-10-27 2022-02-25 北京搜狗科技发展有限公司 基于键值数据库的数据存储方法、检索方法及相应的装置

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117493342A (zh) * 2023-11-06 2024-02-02 广州方舟信息科技有限公司 商品数据更新方法、装置、电子设备和存储介质
CN117493342B (zh) * 2023-11-06 2024-05-31 北京方易行信息科技有限公司 商品数据更新方法、装置、电子设备和存储介质

Also Published As

Publication number Publication date
CN116701386A (zh) 2023-09-05

Similar Documents

Publication Publication Date Title
US9767131B2 (en) Hierarchical tablespace space management
US10114908B2 (en) Hybrid table implementation by using buffer pool as permanent in-memory storage for memory-resident data
JP6356675B2 (ja) 集約/グループ化動作:ハッシュテーブル法のハードウェア実装
US8583692B2 (en) DDL and DML support for hybrid columnar compressed tables
US7890541B2 (en) Partition by growth table space
EP3365803B1 (en) Parallel execution of queries with a recursive clause
US9495398B2 (en) Index for hybrid database
US20120323867A1 (en) Systems and methods for querying column oriented databases
CN111046034A (zh) 管理内存数据及在内存中维护数据的方法和系统
US9218394B2 (en) Reading rows from memory prior to reading rows from secondary storage
US20190370268A1 (en) Efficient partitioning of relational data
US11151138B2 (en) Computer program for processing a pivot query
US10394811B2 (en) Tail-based top-N query evaluation
CN113721862B (zh) 数据处理方法及装置
WO2021016050A1 (en) Multi-record index structure for key-value stores
US20180075074A1 (en) Apparatus and method to correct index tree data added to existing index tree data
US10558636B2 (en) Index page with latch-free access
US20130041887A1 (en) Adding entries to an index based on use of the index
US11403273B1 (en) Optimizing hash table searching using bitmasks and linear probing
WO2023160115A1 (zh) 键值对检索方法、装置及存储介质
CN111522820A (zh) 数据存储结构、存储检索方法、系统、设备及存储介质
WO2015129109A1 (ja) インデックス管理装置
CN113157692B (zh) 一种关系型内存数据库系统
US11556545B2 (en) Disk based hybrid transactional analytical processing system
US20220365905A1 (en) Metadata processing method and apparatus, and a computer-readable storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22928374

Country of ref document: EP

Kind code of ref document: A1