WO2018120109A1 - 数据处理的方法和装置 - Google Patents

数据处理的方法和装置 Download PDF

Info

Publication number
WO2018120109A1
WO2018120109A1 PCT/CN2016/113705 CN2016113705W WO2018120109A1 WO 2018120109 A1 WO2018120109 A1 WO 2018120109A1 CN 2016113705 W CN2016113705 W CN 2016113705W WO 2018120109 A1 WO2018120109 A1 WO 2018120109A1
Authority
WO
WIPO (PCT)
Prior art keywords
hash
hash table
location
data
level
Prior art date
Application number
PCT/CN2016/113705
Other languages
English (en)
French (fr)
Inventor
张丰伟
张学仓
伊戈尔德鲁日宁
王元钢
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to CN201680058640.5A priority Critical patent/CN109076021B/zh
Priority to PCT/CN2016/113705 priority patent/WO2018120109A1/zh
Publication of WO2018120109A1 publication Critical patent/WO2018120109A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/32Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols including means for verifying the identity or authority of a user of the system or for message authentication, e.g. authorization, entity authentication, data integrity or data verification, non-repudiation, key authentication or verification of credentials
    • H04L9/3236Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols including means for verifying the identity or authority of a user of the system or for message authentication, e.g. authorization, entity authentication, data integrity or data verification, non-repudiation, key authentication or verification of credentials using cryptographic hash functions
    • H04L9/3239Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols including means for verifying the identity or authority of a user of the system or for message authentication, e.g. authorization, entity authentication, data integrity or data verification, non-repudiation, key authentication or verification of credentials using cryptographic hash functions involving non-keyed hash functions, e.g. modification detection codes [MDCs], MD5, SHA or RIPEMD

Definitions

  • the present application relates to the field of storage, and in particular, to a method and apparatus for data processing.
  • a hash table can be used to store data in a storage system.
  • a hash table is an indexing method for data storage.
  • the storage system can calculate a unique index of data to be stored (such as a key in a data item in a KV (Key Value) storage system), a given hash function, and a hash table capacity.
  • a hash value thereby storing data (such as a value in a KV storage system) in a storage location indicated by the hash value (hereinafter, unless otherwise specified, "storage location" refers to a storage medium indicated by a hash value Position).
  • a given hash function and table capacity are calculated according to the given index, and a hash value is calculated, and then the search is performed at the storage location indicated by the hash value.
  • a hash collision Due to the limited capacity of the hash table, two different indexes will be calculated and the hash value will be in the same position in the hash table. This situation is called a hash collision.
  • the capacity of the hash table is re-adjusted.
  • the conflicting index can be stored in another location in the hash table, and then a pointer is placed in the conflicting location to indicate the location of the conflicting index in the hash table.
  • more and more conflicting elements (indexes or pointers) in the hash table will result in worse and worse reading and writing efficiency of the hash table.
  • the present application provides a data processing method and apparatus, which can improve the reading and writing efficiency of a hash table.
  • a method of data processing is provided, the method being applied to a storage system, the storage system comprising a multi-level hash table, the multi-level hash table for storing data, the multi-level hash table Including an Nth-level hash table and an N+1th-level hash table, where N ⁇ 0 and the N is an integer, the method includes: multiple locations included from the (N+1)th hash table The location where the minimum load is determined is the target location, where the plurality of locations are the corresponding locations in the N+1th hash table of the candidate locations of the first hash data in the Nth-level hash table, The first hash data is hash data to be inserted into the Nth-level hash table; and the second hash data in the Nth-level hash table is migrated to the a target location, wherein the second hash data is hash data stored by the target location at a corresponding location in the Nth-level hash table; the first hash data is inserted into the candidate location.
  • the location with the smallest load is determined from the N+1th hash table as the target location, and
  • the hash data in the Nth-level hash table is inserted into the target location, which improves the load balance of the N+1th hash table, thereby reducing the probability of a hash collision in the N+1th hash table.
  • determining, from the plurality of locations included in the (N+1)th hash table, a location with a minimum load as a target location including: determining a first location from the plurality of locations, where The first location is a location where the load is the largest among the plurality of locations; the third hash data stored in the first location is migrated to the N+2th hash table in the multi-level hash table; The first position is the target position.
  • the hash data stored in the location with the largest load in the N+1th hash table is migrated to the N+2th hash table, and then the location is used as the target location.
  • the hash data in the N+1th hash table is reduced, thereby further reducing the probability of hash collision in the N+1th hash table, and improving the reading and writing efficiency of the N+1th hash table.
  • the determining the first location from the plurality of locations comprises: determining, when the second location of the plurality of locations cannot insert the second hash data, from the plurality of locations The first location, wherein the second location is a location where a load is minimized among the plurality of locations before the third hash data is migrated to the N+2th hash table.
  • the location is directly determined as the target location, if the N+ If the second hash data is not inserted in the load-supplied position of the level 1 hash table, the hash data stored in the location with the largest load in the (N+1)th hash table is migrated to the N+2th hash table. Then, the location is used as the target location, thereby improving the load balancing degree of the N+1th hash table, thereby improving the read/write efficiency of the hash table.
  • the migrating the second hash data in the Nth-level hash table to the target location comprises: storing the second hash data and the fourth hash stored in the target location The data is merged and inserted into the target location.
  • the data processing method provided by the embodiment of the present application can reduce the hash data in the N+1th hash table, thereby reducing the probability of hash collision in the N+1th hash table, and improving the Nth +1 level Hash table read and write efficiency.
  • the method further includes: updating a load value of the target location.
  • the hash data When the hash data is continuously written to the N+1th hash table, it may be determined according to the updated load value of the target location whether the hash data in the Nth-level hash table can be inserted into the target location, thereby improving
  • the load balancing degree of the N+1th hash table improves the read/write efficiency of the N+1th hash table.
  • an apparatus for data processing which can implement the functions performed by an execution body of the method related to the above aspects, and the functions can be implemented by hardware or by executing corresponding software by hardware.
  • the hardware or software includes one or more corresponding units or modules of the above functions.
  • the apparatus includes a processor and a memory configured to support the apparatus to perform a corresponding function in the above method, the memory being for coupling with a processor, which is necessary to save the apparatus Program instructions and data.
  • the apparatus can also include a communication interface for supporting communication between the apparatus and other network elements.
  • an embodiment of the present application provides a computer storage medium for storing computer software instructions for use in the foregoing apparatus, including a program designed to perform the above aspects.
  • the location with the smallest load is determined as the target location from the N+1th hash table in the multi-level hash table, and the The hash data in the Nth-level hash table in the multi-level hash table is migrated to the target location, which improves the load balance of the N+1th hash table, thereby reducing the N+1th hash.
  • the probability of a hash collision in the table improves the read and write efficiency of the N+1th hash table.
  • FIG. 1 is a schematic structural diagram of a hash table to which an embodiment of the present application is applied;
  • FIG. 2 is a schematic flowchart of a data processing method according to an embodiment of the present application.
  • FIG. 3 is a schematic flowchart of another data processing method provided by an embodiment of the present application.
  • FIG. 4 is a schematic structural diagram of a possible data processing apparatus according to an embodiment of the present application.
  • FIG. 5 is a schematic structural diagram of another possible data processing apparatus provided by an embodiment of the present application.
  • a hash table is a data structure that implements an associative array and is widely used for fast data lookup.
  • a hash table has two important operations, one is a put operation, the element is inserted into the hash table by a write operation, and the other is a get operation to quickly find the element from the hash table.
  • a hash table can include the following structural elements:
  • An entry, an element stored in a hash table is called an item or an entry.
  • each item in the hash table is hashed into the bucket, and the hash value is used to calculate the hash value of the key.
  • the hash value can be used to find the keyword in the hash table. position.
  • Metadata which is used to indicate the load of the current bucket (for example, the number of items included in the bucket), wherein the load of the current bucket does not include the load in the lower-level hash table;
  • the location in the bucket When the location in the bucket is occupied, the location is marked as occupied, and when the location in the bucket is not occupied, the location is marked as free.
  • hash data refers to data included in a hash table, such as the above pointers, metadata, and keywords.
  • any one-level hash table may include multiple hash tables.
  • the position in the hash table may refer to a bucket, or a hash table in the next-level hash table, or other meanings, and its specific meaning Determined by the context logic of the statement, for example, assuming that the level 1 hash table is the upper level hash table of the level 2 hash table, for "migrating the hash data stored in the X position in the level 1 hash table to the first
  • the Y position in the level 2 hash table the X position refers to the bucket, and the Y position refers to the corresponding hash table of the bucket in the level 2 hash table.
  • FIG. 1 is a schematic structural diagram of a hash table to which an embodiment of the present application is applied.
  • the hash table includes a level 0 (height 0) hash table and a level 1 (height 1) hash table, the level 0 hash table includes bucket n, and the level 1 hash table includes a segment n (segment)
  • the segment n can also be referred to as a hash table n.
  • the hash data in the bucket n When writing to the level 0 hash table, when the position in the bucket n is filled, if the hash data is continued to be written in the bucket n, the hash data in the bucket n will overflow (spill) to In the next-level hash table (ie, the level 1 hash table), specifically, the overflowed hash data is inserted into the segment n of the level 1 hash table, and the segment n is the bucket n at the level 1
  • the corresponding position in the hash table, the specific position of the segment n may be indicated by a pointer in the bucket n; the hash data in the bucket n may be actively migrated to the segment n before the hash data in the bucket n overflows. .
  • the position of the index in the level 0 hash table for example, the position is in bucket n
  • query whether the index is at the position If the position is in the position, the query result is returned; if not in the position, the query is performed in the segment corresponding to the position in the level 1 hash table (ie, segment n), and if the query is returned, the query result is returned, if If you do not query, continue to query in the next level of the hash table.
  • the hash table shown in FIG. 1 is only an example.
  • the hash table applicable to the embodiment of the present application is not limited thereto.
  • the number of the hash table and the type of the hash table are not limited in the present application.
  • a specific object may have a different name depending on the habit, but this is not to be construed as limiting the scope of application of the embodiments of the present application.
  • the method 200 is a schematic diagram of a method for data processing provided by an embodiment of the present application.
  • the method 200 can be performed, for example, by a processor, as shown in FIG. 2, the method 200 includes:
  • S210 Determine, from a plurality of locations included in the N+1th hash table, a location where the load is the smallest, where the multiple locations are candidate locations of the first hash data in the Nth hash table.
  • the first hash data is hash data to be inserted in the Nth level hash table.
  • the processor may determine the candidate location (bucket) based on the calculated location, and may also determine the candidate location according to other methods.
  • the hash data in one of the two positions is kicked out and the keyword is inserted (ie, the shift operation), and the hashed data can be kicked out according to other hash functions.
  • Re-find the insertion position ie, make the next shift operation
  • the bucket involved in all the shift operations is the candidate position of the above-mentioned given keyword until the insertion position is found or the maximum number of seeks is reached.
  • the shift operation is not performed until the hash data in the candidate location has not migrated to the next-level hash table.
  • the corresponding position (hash table, or segment) of the candidate position in the N+1th hash table is a plurality of locations described in S210, and the least loaded position of the multiple locations may be, for example, The location where the stored hash data is the least, and the location where the load is the smallest can be determined according to other strategies.
  • the load of the hash table can be determined by dividing the number of hash data by the hash table capacity, so that the The location with the least load among the multiple locations.
  • the second hash data in the Nth-level hash table is migrated to the target location, where the second hash data is corresponding to the target location in the Nth-level hash table.
  • the location stores the hash data.
  • a flushing flag may be set at a location where the second hash data is stored, the second hash data may be migrated to the target location, and the second hash data may be migrated according to other methods.
  • the migration method of hash data is not limited.
  • the first hash data may be inserted into the candidate location by a shift operation or directly inserted into the candidate location.
  • the hash data will be migrated to other locations in the N+1th hash table, and a pointer is placed in the above overloaded position, so that the insertion position needs to be re-searched when the hash data is inserted.
  • fetching data it is necessary to read the pointer and then read the information indicated by the pointer, thereby reducing the reading and writing efficiency of the hash table.
  • the location with the smallest load is determined as the target location from the N+1th hash table. Inserting the hash data in the Nth-level hash table into the target location improves the load balancing degree of the (N+1)th hash table, thereby reducing the hash collision in the N+1th hash table. Probability, improve the read and write efficiency of the N+1th hash table.
  • determining, from the plurality of locations included in the (N+1)th hash table, a location with a minimum load as a target location including:
  • S211 Determine a first location from the plurality of locations, wherein the first location is a location where a load is the largest among the plurality of locations.
  • the third hash data stored in the first location is migrated to the N+2th hash table in the multi-level hash table.
  • the target location When determining the target location, first determining a location of the plurality of locations where the load is the largest, that is, the first location, migrating the third hash data stored in the first location to the N+th of the multi-level hash table
  • the level 2 hash table, the N+2 level hash table is the next level hash table of the (N+1)th hash table, and at this time, the first position is the number of the N+1th level hash table.
  • the location with the least load in the location that is, the target location.
  • the hash data stored in the location with the largest load in the N+1th hash table is migrated to the N+2th hash table, and then the location is used as the target location.
  • the hash data in the N+1th hash table is reduced, thereby further reducing the probability of hash collision in the N+1th hash table, and improving the reading and writing efficiency of the N+1th hash table.
  • the determining the first location from the plurality of locations includes:
  • the least loaded location (ie, the second location) of the plurality of locations may be determined first, and if the second location may directly insert the second hash data, the second location may be directly inserted a second hash data; if the second location cannot insert the second hash data, determining a location where the load is the largest (ie, the first location) from the plurality of locations, and storing the third hash data of the first location Migrating to the N+2th hash table in the multi-level hash table, at this time, the first location is the location where the load is the smallest among the plurality of locations, that is, the target location.
  • the location is directly determined as the target location, and no further need is needed.
  • Performing a migration process on the hash data in the N+1th hash table If the load in the N+1th hash table cannot be inserted into the second hash data, the N+1th level is obtained.
  • the hash data stored in the most loaded location in the hash table is migrated to the N+2th hash table, and then the location is used as the target location, thereby improving the load balancing degree of the N+1th hash table, thereby improving the load.
  • Hash table read and write efficiency.
  • the migrating the second hash data in the Nth-level hash table to the target location includes:
  • the second hash data may be directly inserted into the target location; if the target may store the hash data (ie, the fourth hash data), the second hash data may be The four hash data is merged and inserted into the target location.
  • the second hash data and the fourth hash data may be merged (updated or deleted or overwritten) according to the corresponding semantics, and all the merged hash data is a data set. (Includes hash data for different index values).
  • the data processing method provided by the embodiment of the present application can reduce the hash data in the N+1th hash table, thereby reducing the probability of hash collision in the N+1th hash table, and improving the probability. Read and write efficiency of the N+1th hash table.
  • the method 200 further includes:
  • the load value of the target location is updated, and by updating the load value of the target location, when the hash data is continuously written to the N+1th hash table, the target location may be updated.
  • the subsequent load value determines whether the hash data in the Nth-level hash table can be inserted into the target location, thereby improving the load balancing degree of the (N+1)th hash table and improving the N+1th hash table. Read and write efficiency.
  • FIG. 3 is a schematic flowchart of another method for data processing provided by an embodiment of the present application. As shown in FIG. 3, method 300 includes:
  • the hash function and the keyword corresponding to the hash data are calculated to obtain a position that the hash data may be inserted in the 0th level hash table, if The location can insert the hash data, and the write operation ends. If the location cannot insert the hash data, the next step can be performed.
  • S304 in the optional position of the first level hash table, select the location with the largest load (ie, the B position), and set the B position to the refresh flag, and the hash data of the B position and the B 2 position (level 2)
  • the hash data of the position corresponding to the B position in the hash table performs a merge operation, inserts the merged hash data into the B 2 position, deletes the hash data of the B position, updates the load of the B 2 position, and indicates B
  • the pointer of the 2 position is stored in the B position, and the next step is performed.
  • the B position is the position where the load is the smallest.
  • the refresh flag may also be set directly at the A position.
  • a refresh flag is set in a position (B 0 position) corresponding to the B position in the 0th level hash table, and the hash data in the B 0 position is migrated to the B position, and the next step is performed.
  • the method 300 for data processing determines the location where the load is the smallest (A location) in the first level hash table. If the location can insert the hash data to be inserted, the location is directly Determined as the target location, there is no need to migrate the hash data in the level 1 hash table. If the location cannot insert the hash data to be inserted, the load in the first level hash table is the largest. (B location) The stored hash data is migrated to the level 2 hash table, and then the location is used as the target location, thereby improving the load balancing degree of the first level hash table, thereby improving the read/write efficiency of the hash table. .
  • FIG. 4 shows an apparatus 400 for data processing provided by an embodiment of the present application. As shown in FIG. 4, the apparatus 400 includes:
  • the storage unit 420 is configured to store a multi-level hash table, where the multi-level hash table includes an Nth-level hash table and an N+1th-level hash table, where N ⁇ 0 and the N is an integer.
  • the processing unit 410 is configured to determine, from the plurality of locations included in the (N+1)th hash table, a location with a minimum load as a target location, where the multiple locations are first hash data in an Nth-level hash table a candidate position in the corresponding position in the (N+1)th hash table, the first hash data is hash data to be inserted in the Nth level hash table; and Second in the Nth-level hash table Hash data is migrated to the target location, wherein the second hash data is hash data stored by the target location at a corresponding location in the Nth-level hash table; A hash data is inserted into the candidate location.
  • the storage unit 410 is also used to store program code and data of the device 400 for supporting the processing unit 410 to perform the above-described processes and/or other processes of the techniques described herein; alternatively, the device 400 further includes a communication unit 430 For supporting the device 400 to communicate with other devices, for example, for supporting the processing unit 410 to acquire the first hash data.
  • the apparatus 400 for data processing may correspond to an execution body of the method of the embodiment of the present application, and the above and other operations and/or functions of the respective modules in the apparatus 400 are respectively implemented to implement the method of FIG. The process, for the sake of brevity, will not be described here.
  • the apparatus 400 for data processing determines that the location with the smallest load is the target location from the N+1th hash table in the multi-level hash table, and the multi-level hash table
  • the hash data in the Nth-level hash table is migrated to the target location, which improves the load balancing degree of the N+1th hash table, thereby reducing the probability of hash collision in the N+1th hash table.
  • the processing unit 410 is specifically configured to: determine a first location from the multiple locations, where the first location is a location where a load is the largest among the multiple locations;
  • the stored third hash data is migrated to an N+2th hash table in the multi-level hash table; the first location is determined to be the target location.
  • the device 400 for data processing migrates the hash data stored in the location with the largest load in the N+1th hash table to the N+2th hash table, and then uses the location as the target location.
  • the hash data in the N+1th hash table is reduced, thereby further reducing the probability of hash collision in the N+1th hash table, and improving the reading and writing efficiency of the N+1th hash table.
  • the processing unit 410 is specifically configured to: when the second location of the multiple locations cannot insert the second hash data, determine the first location from the multiple locations, where The second location is a location where the load is the smallest among the plurality of locations before the third hash data is migrated to the N+2th hash table.
  • the device 400 for data processing can directly determine the location as the target location if the location with the smallest load in the N+1th hash table can be inserted into the target location, and no need to The hash data in the N+1 level hash table is migrated. If the second load data cannot be inserted in the N+1 level hash table, the N+1th hash table is used. Medium negative The hash data of the largest location storage is migrated to the N+2th hash table, and then the location is used as the target location, thereby improving the load balancing degree of the N+1th hash table, thereby improving the hash table. Reading and writing efficiency.
  • the processing unit 410 is specifically configured to: insert the second hash data and the fourth hash data stored in the target location into the target location.
  • the apparatus 400 for data processing provided by the embodiment of the present application can reduce the hash data in the N+1th hash table, thereby reducing the probability of hash collision in the N+1th hash table, and improving the probability Read and write efficiency of N+1 level hash table.
  • processing unit 410 is further configured to: update a load value of the target location.
  • the load value of the target location is updated, and by updating the load value of the target location, when the hash data is continuously written to the N+1th hash table, the target location may be updated.
  • the subsequent load value determines whether the hash data in the Nth-level hash table can be inserted into the target location, thereby improving the load balancing degree of the (N+1)th hash table and improving the N+1th hash table. Read and write efficiency.
  • each module includes a corresponding hardware structure and/or software module for performing each function in order to implement the above functions.
  • the division of the module in the embodiment of the present application is schematic, and is only a logical function division, and the actual implementation may have another division manner.
  • the processing module 410 may be a processor or a controller, such as a central processing unit (CPU), a general-purpose processor, and a digital signal processor (Digital Signal Processor). , DSP), Application-Specific Integrated Circuit (ASIC), Field Programmable Gate Array (FPGA) or other programmable logic device, transistor logic device, hardware component, or any combination thereof. It is possible to implement or carry out the various illustrative logical blocks, modules and circuits described in connection with the present disclosure.
  • the processor may also be a combination of computing functions, for example, including one or more microprocessor combinations, a combination of a DSP and a microprocessor, and the like.
  • the communication module 430 can be a communication interface, a transceiver, a transceiver circuit, etc., wherein the communication interface is a collective name and can include one or more interfaces.
  • the storage module 420 can be a memory.
  • the processing unit 410 is a processor
  • the communication unit 430 is a communication interface
  • the storage unit 420 is a memory
  • the device for data processing according to the embodiment of the present application may be the device shown in FIG. 500.
  • the apparatus 500 includes a processor 510, a communication interface 520, and a memory 530.
  • the communication interface 520, the processor 510, and the memory 530 can communicate with each other through an internal connection path to transfer control and/or data signals.
  • the communication interface 520, the processor 510, and the memory 530 can be connected by a bus.
  • the apparatus 500 for data processing determines the location where the load is the smallest as the target location from the N+1th hash table when the hash data is inserted into the N+1th hash table. Inserting the hash data in the Nth-level hash table into the target location improves the load balancing degree of the (N+1)th hash table, thereby reducing the hash collision in the N+1th hash table. Probability, improve the read and write efficiency of the N+1th hash table.
  • the size of the sequence number of each process does not mean the order of execution sequence, and the execution order of each process should be determined by its function and internal logic, and should not constitute any limitation on the implementation process of the embodiment of the present application. .

Abstract

一种数据处理的方法(200)和装置。该方法(200)包括:从第N+1级哈希表包括的多个位置中确定负载最小的位置为目标位置,所述多个位置为第一哈希数据在所述第N级哈希表中的候选位置在所述第N+1级哈希表中对应的位置,所述第一哈希数据为待插入所述第N级哈希表中的哈希数据(S210);将所述第N级哈希表中的第二哈希数据迁移至所述目标位置,其中,所述第二哈希数据为所述目标位置在所述第N级哈希表中对应的位置存储的哈希数据(S220);将所述第一哈希数据插入所述候选位置(S230)。上述方法和装置可以减小第N+1级哈希表中哈希冲突的概率,提高第N+1级哈希表的读写效率。

Description

数据处理的方法和装置 技术领域
本申请涉及存储领域,尤其涉及一种数据处理的方法和装置。
背景技术
存储系统中可以采用哈希(Hash)表来存储数据,哈希表是一种数据存储的索引方式。例如,存储系统可以根据需要存储的数据的唯一索引(比如KV(Key Value,键值)存储系统中的数据项中的key)、给定的哈希函数和哈希表的容量,计算出一个哈希值,从而将数据(比如KV存储系统中的value)存储在该哈希值指示的存储位置(以下,如无特别说明,“存储位置”均指哈希值所指示的存储介质中的位置)上。在查询时,根据给定的索引,给定的哈希函数和表容量,计算出哈希值,进而在该哈希值指示的存储位置进行查找。
由于哈希表的容量有限,会出现两个不同的索引经过计算得到哈希值位于哈希表中相同的位置,这种情况称为哈希冲突。当出现哈希冲突时,一种情况下,会重新调整哈希表的容量。另一种情况下,可以将冲突的索引存储在哈希表中的其它位置,然后在冲突位置放置一个指针,指示该冲突的索引在哈希表中的位置。随着数据量的增大,哈希表中冲突的元素(索引或者指针)也会越来越多,从而导致哈希表的读写效率越来越差。
发明内容
有鉴于此,本申请提供了一种数据处理的方法和装置,能够提高哈希表的读写效率。
一方面,提供了一种数据处理的方法,所述方法应用于存储系统,所述存储系统包括多级哈希表,所述多级哈希表用于存储数据,所述多级哈希表包括第N级哈希表和第N+1级哈希表,所述N≥0且所述N为整数,所述方法包括:从所述第N+1级哈希表包括的多个位置中确定负载最小的位置为目标位置,所述多个位置为第一哈希数据在所述第N级哈希表中的候选位置在所述第N+1级哈希表中对应的位置,所述第一哈希数据为待插入所述第N级哈希表中的哈希数据;将所述第N级哈希表中的第二哈希数据迁移至所述 目标位置,其中,所述第二哈希数据为所述目标位置在所述第N级哈希表中对应的位置存储的哈希数据;将所述第一哈希数据插入所述候选位置。
根据本申请实施例提供的数据处理的方法,在向第N+1级哈希表中插入哈希数据时,从第N+1级哈希表中确定负载最小的位置为目标位置,并将第N级哈希表中的哈希数据插入该目标位置,提高了第N+1级哈希表的负载均衡度,从而可以减小第N+1级哈希表中哈希冲突的概率,提高第N+1级哈希表的读写效率。
可选地,所述从所述第N+1级哈希表包括的多个位置中确定负载最小的位置为目标位置,包括:从所述多个位置中确定第一位置,其中,所述第一位置为所述多个位置中负载最大的位置;将所述第一位置存储的第三哈希数据迁移至所述多级哈希表中的第N+2级哈希表;确定所述第一位置为所述目标位置。
根据本申请实施例提供的数据处理的方法,通过将第N+1级哈希表中负载最大的位置存储的哈希数据迁移至第N+2级哈希表,然后将该位置作为目标位置,减少了第N+1级哈希表中的哈希数据,从而进一步减少了第N+1级哈希表中哈希冲突的概率,提高第N+1级哈希表的读写效率。
可选地,所述从所述多个位置中确定第一位置,包括:当所述多个位置中的第二位置不能插入所述第二哈希数据时,从所述多个位置中确定所述第一位置,其中,所述第二位置为将所述第三哈希数据迁移至所述第N+2级哈希表之前所述多个位置中负载最小的位置。
根据本申请实施例提供的数据处理的方法,如果第N+1级哈希表中负载最小的位置可以插入所述第二哈希数据,则直接将该位置确定为目标位置,如果第N+1级哈希表中负载最小的位置不能插入所述第二哈希数据,则将第N+1级哈希表中负载最大的位置存储的哈希数据迁移至第N+2级哈希表,然后将该位置作为目标位置,从而提高了第N+1级哈希表的负载均衡度,进而提高了提高哈希表的读写效率。
可选地,所述将所述第N级哈希表中的第二哈希数据迁移至所述目标位置,包括:将所述第二哈希数据与所述目标位置存储的第四哈希数据合并后插入所述目标位置。
本申请实施例提供的数据处理的方法,可以减少第N+1级哈希表中的哈希数据,从而减小了第N+1级哈希表中哈希冲突的概率,提高了第N+1级 哈希表的读写效率。
可选地,所述方法还包括:更新所述目标位置的负载值。
当继续向第N+1级哈希表中写入哈希数据时,可以根据目标位置更新后的负载值确定是否可以将第N级哈希表中的哈希数据插入该目标位置,从而提高了第N+1级哈希表的负载均衡度,提高第N+1级哈希表的读写效率。
另一方面,提供了一种数据处理的装置,该装置可以实现上述方面所涉及方法的执行主体所执行的功能,所述功能可以通过硬件实现,也可以通过硬件执行相应的软件实现。所述硬件或软件包括一个或多个上述功能相应的单元或模块。
在一种可能的设计中,该装置的结构中包括处理器和存储器,该处理器被配置为支持该装置执行上述方法中相应的功能,该存储器用于与处理器耦合,其保存该装置必要的程序指令和数据。该装置还可以包括通信接口,该通信接口用于支持该装置与其它网元之间的通信。
再一方面,本申请实施例提供了一种计算机存储介质,用于储存为上述装置所用的计算机软件指令,其包含用于执行上述方面所设计的程序。
相比于现有技术,根据本申请实施例提供的数据处理的方法和装置,从多级哈希表中的第N+1级哈希表中确定负载最小的位置为目标位置,并将该多级哈希表中的第N级哈希表中的哈希数据迁移至该目标位置,提高了第N+1级哈希表的负载均衡度,从而可以减小第N+1级哈希表中哈希冲突的概率,提高第N+1级哈希表的读写效率。
附图说明
图1是适用本申请实施例的一种哈希表的示意性结构图;
图2是本申请实施例提供的一种数据处理方法的示意性流程图;
图3是本申请实施例提供的另一数据处理方法的示意性流程图;
图4是本申请实施例提供的一种可能的数据处理装置的示意性结构图;
图5是本申请实施例提供的另一种可能的数据处理装置的示意性结构图。
具体实施方式
为了便于理解本申请实施例,首先对本申请所涉及的概念做简要描述。
哈希表是实现关联数组(associative array)的一种数据结构,广泛应用于数据的快速查找。哈希表有两个重要操作,一个是写(put)操作,通过写操作把元素插入哈希表中;一个是读(get)操作,从哈希表中快速地找到元素。
一个哈希表可以包括下列结构要素:
项(entry),存入到哈希表中的元素称为项或者条目。
桶(bucket),哈希表中的每一个项被散列到桶中,用哈希函数计算关键字(key)的哈希值,通过哈希值就可以找到关键字在哈希表中的位置。例如对于一个cuckoo hash(布谷鸟哈希)表,桶是一个数组集合,数组可以是(s2,pointer)、(s1,s2,pointer)、(key,pointer)或者其他组织形式,但至少要包括pointer(指示数据),pointer指示关键字所对应的数据的存储位置,s1和s2表示根据关键字和两个哈希函数分别计算得到的两个哈希值,s1=h1(key),s2=h2(key),h1和h2是两个哈希函数(例如cityhash),在向cuckoo hash表中插入元素时,首先计算得到s1和s2,然后将s1和s2分别与cuckoo hash表的容量(size)进行取模运算,得到的结果即待插入元素在cuckoo hash表中的候选位置。
桶中的哈希数据有三种类型,
指针(directory),用于指示该位置在下级哈希表中的对应位置;
元数据(metadata),用于指示当前桶的负载(例如,桶中包括的项的数量),其中,当前桶的负载不包括下级哈希表中的负载;
其它哈希数据,例如关键字;
当桶中的位置被占用时,该位置被标记为占据(occupied),当桶中的位置没有被占用时,该位置被标记为空闲(free)。
在本申请的各个实施例中,哈希数据是指哈希表所包括的数据,例如上述指针、元数据和关键字。此外,任意一级哈希表可能包括多个哈希表,哈希表中的位置可以指桶,也可以指下一级哈希表中的一个哈希表,或者是其它含义,其具体含义由语句的上下文逻辑所确定,例如,假设第1级哈希表是第2级哈希表的上级哈希表,对于“将第1级哈希表中X位置存储的哈希数据迁移至第2级哈希表中的Y位置”,X位置指的是桶,Y位置指的是该桶在第2级哈希表中对应的哈希表。
图1是适用本申请实施例的一种哈希表的示意性结构图。如图1所示, 该哈希表包括第0级(level 0)哈希表和第1级(level 1)哈希表,第0级哈希表包括桶n,第1级哈希表包括分段(segment)n,分段n也可称为哈希表n。
在对第0级哈希表进行写操作时,当桶n中的位置被写满后,如果继续在桶n中写入哈希数据,桶n中的哈希数据就会溢出(spill)到下一级哈希表(即,第1级哈希表)中,具体地,溢出的哈希数据会插入第1级哈希表的分段n中,分段n是桶n在第1级哈希表中对应的位置,该分段n的具体位置可以通过桶n中的指针指示;也可以在桶n中的哈希数据溢出前主动将桶n中的哈希数据迁移至分段n。
在对第0级哈希表进行读操作时,对于一个给定的索引,首先计算该索引在第0级哈希表中的位置(例如该位置位于桶n),查询该索引是否在该位置,如果在该位置则返回查询结果;如果不在该位置,则在第1级哈希表中与该位置对应的分段(即分段n)中进行查询,如果查询到则返回查询结果,如果查询不到则继续在下一级哈希表中查询。
图1所示的哈希表仅是举例说明,适用于本申请实施例的哈希表不限于此,本申请对哈希表的数量以及哈希表的类型均不作限定。此外,特定对象可能随着习惯的不同而具有不同名称,但这并不能被理解为对本申请实施例适用范围的限定。
图2是本申请实施例提供的一种数据处理的方法的示意图。方法200例如可以由处理器执行,如图2所示,该方法200包括:
S210,从第N+1级哈希表包括的多个位置中确定负载最小的位置为目标位置,所述多个位置为第一哈希数据在第N级哈希表中的候选位置在所述第N+1级哈希表中对应的位置,所述第一哈希数据为待插入所述第N级哈希表中的哈希数据。
当处理器准备向第N级哈希表中插入第一哈希数据时,处理器可以根据计算出的位置确定候选位置(桶),还可以根据其它方法确定候选位置。
例如,对于一个给定的关键字,利用两个哈希函数计算出两个哈希表中的位置,在这两个哈希表的位置中选择一个位置插入该关键字,如果这两个位置都不能插入该关键字,则踢出这两个位置中的一个位置中的哈希数据,并插入该关键字(即,移位操作),被踢出的哈希数据可以根据其它哈希函数重新查找插入位置(即,进行下一次移位操作),直到所有的哈希数据均 找到插入位置或者达到最大查找次数为止,在全部的移位操作中所涉及到的桶即为上述给定的关键字的候选位置。
上述示例仅是举例说明确定候选位置的一种可能的方法,实际上,在候选位置中的哈希数据未迁移至下一级哈希表之前,并不会进行移位操作。
上述候选位置在第N+1级哈希表中对应的位置(哈希表,或称为分段)即为S210中所述的多个位置,该多个位置中负载最小的位置例如可以是存储的哈希数据最少的位置,还可以根据其它策略确定负载最小的位置,例如,还可以根据哈希数据的数量除以哈希表容量得到的结果确定哈希表的负载,从而可以确定所述多个位置中负载最小的位置。
S220,将所述第N级哈希表中的第二哈希数据迁移至所述目标位置,其中,所述第二哈希数据为所述目标位置在所述第N级哈希表中对应的位置存储的哈希数据。
例如可以在第二哈希数据所存储的位置设置刷新(flushing)标记,将第二哈希数据迁移至所述目标位置,还可以根据其它方法迁移第二哈希数据,本申请各个实施例对哈希数据的迁移方法不作限定。
S230,将所述第一哈希数据插入所述候选位置。
将第二哈希数据迁移至目标位置后,第一哈希数据就可以通过移位操作插入所述候选位置,或者直接插入所述候选位置。
如果第N+1级哈希表的某个位置负载过大而不能插入第N级哈希表的哈希数据时(例如,因某个位置的哈希数据过多导致哈希冲突过大),会将该哈希数据迁移至第N+1级哈希表中的其它位置,并在上述负载过大的位置放置一个指针,这样,在插入哈希数据时需要重新寻找插入位置,在读取哈希数据时需要先读取指针再读取指针指示的信息,从而降低了哈希表的读写效率。
因此,根据本申请实施例提供的数据处理的方法,在向第N+1级哈希表中插入哈希数据时,从第N+1级哈希表中确定负载最小的位置为目标位置,并将第N级哈希表中的哈希数据插入该目标位置,提高了第N+1级哈希表的负载均衡度,从而可以减小第N+1级哈希表中哈希冲突的概率,提高第N+1级哈希表的读写效率。
可选地,所述从所述第N+1级哈希表包括的多个位置中确定负载最小的位置为目标位置,包括:
S211,从所述多个位置中确定第一位置,其中,所述第一位置为所述多个位置中负载最大的位置。
S212,将所述第一位置存储的第三哈希数据迁移至所述多级哈希表中的第N+2级哈希表。
S213,确定所述第一位置为所述目标位置。
在确定目标位置时,可以首先确定所述多个位置中负载最大的位置,即,所述第一位置,将第一位置存储的第三哈希数据迁移至多级哈希表中的第N+2级哈希表,第N+2级哈希表是第N+1级哈希表的下一级哈希表,此时,第一位置即第N+1级哈希表的所述多个位置中负载最小的位置,即,目标位置。
根据本申请实施例提供的数据处理的方法,通过将第N+1级哈希表中负载最大的位置存储的哈希数据迁移至第N+2级哈希表,然后将该位置作为目标位置,减少了第N+1级哈希表中的哈希数据,从而进一步减少了第N+1级哈希表中哈希冲突的概率,提高第N+1级哈希表的读写效率。
可选地,所述从所述多个位置中确定第一位置,包括:
S214,当所述多个位置中的第二位置不能插入所述第二哈希数据时,从所述多个位置中确定所述第一位置,其中,所述第二位置为将所述第三哈希数据迁移至所述第N+2级哈希表之前所述多个位置中负载最小的位置。
在确定目标位置时,可以先确定所述多个位置中负载最小的位置(即,第二位置),如果第二位置可以直接插入所述第二哈希数据,则可以直接在第二位置插入第二哈希数据;如果第二位置不能插入第二哈希数据,则从所述多个位置中确定负载最大的位置(即,第一位置),将第一位置存储的第三哈希数据迁移至多级哈希表中的第N+2级哈希表,此时,第一位置即所述多个位置中负载最小的位置,即,目标位置。
因此,根据本申请实施例提供的数据处理的方法,如果第N+1级哈希表中负载最小的位置可以插入所述第二哈希数据,则直接将该位置确定为目标位置,无需再对第N+1级哈希表中的哈希数据进行迁移处理,如果第N+1级哈希表中负载最小的位置不能插入所述第二哈希数据,则将第N+1级哈希表中负载最大的位置存储的哈希数据迁移至第N+2级哈希表,然后将该位置作为目标位置,从而提高了第N+1级哈希表的负载均衡度,进而提高了哈希表的读写效率。
可选地,所述将所述第N级哈希表中的第二哈希数据迁移至所述目标位置,包括:
S221,将所述第二哈希数据与所述目标位置存储的第四哈希数据合并后插入所述目标位置。
如果目标位置没有存储哈希数据,则可以将第二哈希数据直接插入目标位置;如果目标可以存储有哈希数据(即,第四哈希数据),则可以将第二哈希数据与第四哈希数据合并后插入目标位置。
例如,对于索引值相同的哈希数据,可以根据相应的语义对第二哈希数据与第四哈希数据进行合并处理(更新或者删除或者覆盖),合并后的全部哈希数据是一个数据集(包括不同索引值的哈希数据)。
因此,本申请实施例提供的数据处理的方法,可以减少第N+1级哈希表中的哈希数据,从而减小了第N+1级哈希表中哈希冲突的概率,提高了第N+1级哈希表的读写效率。
可选地,方法200还包括:
S240,更新所述目标位置的负载值。
在目标位置插入第二哈希数据后,更新目标位置的负载值,通过更新目标位置的负载值,当继续向第N+1级哈希表中写入哈希数据时,可以根据目标位置更新后的负载值确定是否可以将第N级哈希表中的哈希数据插入该目标位置,从而提高了第N+1级哈希表的负载均衡度,提高第N+1级哈希表的读写效率。
图3是本申请实施例提供的另一数据处理的方法的示意性流程图。如图3所示,方法300包括:
S301,写操作开始后,对于待插入的哈希数据,通过哈希函数以及该哈希数据对应的关键字计算后得到该哈希数据在第0级哈希表中可能插入的位置,如果该位置可以插入该哈希数据,则写操作结束,如果该位置不能插入该哈希数据,则可以进行下一步。
S302,根据S301中计算得到的位置,在第0级哈希表中寻找该哈希数据在第0级哈希表中所有可能的插入位置。
S303,针对该哈希数据在0级哈希表中所有可能的插入位置,从这些位置在第1级哈希表中对应的位置中确定负载最小且未被设置刷新标记的位置(即,A位置),如果A位置有足够的空间,则在第0级哈希表中与A位置 对应的位置(A0位置)设置刷新标记,将A0位置的哈希数据迁移至A位置,将A0位置的哈希数据与A位置的哈希数据进行合并操作,更新A0位置的负载,在A0位置插入所述待插入的哈希数据并删除A0位置的刷新标记,写操作结束;如果A位置没有足够的空间,则进行下一步。
S304,在第1级哈希表的可选位置中,选择负载最大的位置(即,B位置),并将B位置设置刷新标记,将B位置的哈希数据与B2位置(第2级哈希表中与B位置对应的位置)的哈希数据执行合并操作,将合并后的哈希数据插入B2位置,删除B位置的哈希数据,更新B2位置的负载,并将指示B2位置的指针保存在B位置,执行下一步,此时,B位置即负载最小的位置。可选地,S304中,也可以直接在A位置设置刷新标记。
S305,将第0级哈希表中与B位置对应的位置(B0位置)设置刷新标记,将B0位置的哈希数据迁移至B位置,执行下一步。
S306,在B0位置插入所述待插入的哈希数据,并删除B0位置的刷新标记,写操作结束。
本申请实施例提供的数据处理的方法300,通过在第1级哈希表中确定负载最小的位置(A位置),如果该位置可以插入所述待插入的哈希数据,则直接将该位置确定为目标位置,无需再对第1级哈希表中的哈希数据进行迁移处理,如果该位置不能插入所述待插入的哈希数据,则将第1级哈希表中负载最大的位置(B位置)存储的哈希数据迁移至第2级哈希表,然后将该位置作为目标位置,从而提高了第1级哈希表的负载均衡度,进而提高了哈希表的读写效率。
上文结合图2和图3,详细描述了本申请实施例提供的数据处理的方法,下面,将结合图4和图5,详细描述本申请实施例提供的数据处理的装置。
图4示出了本申请实施例提供的数据处理的装置400。如图4所示,该装置400包括:
存储单元420,用于存储多级哈希表,所述多级哈希表包括第N级哈希表和第N+1级哈希表,所述N≥0且所述N为整数。
处理单元410,用于从所述第N+1级哈希表包括的多个位置中确定负载最小的位置为目标位置,所述多个位置为第一哈希数据在第N级哈希表中的候选位置在所述第N+1级哈希表中对应的位置,所述第一哈希数据为待插入所述第N级哈希表中的哈希数据;以及用于将所述第N级哈希表中的第二 哈希数据迁移至所述目标位置,其中,所述第二哈希数据为所述目标位置在所述第N级哈希表中对应的位置存储的哈希数据;以及用于将所述第一哈希数据插入所述候选位置。
存储单元410还用于存储装置400的程序代码和数据,用于支持处理单元410完成上述处理过程和/或本文所描述的技术的其它过程;可选地,装置400还包括可以包括通信单元430,用于支持装置400与其它装置进行通信,例如,用于支持处理单元410获取所述第一哈希数据。
根据本申请实施例的数据处理的装置400可对应于本申请实施例的方法的执行主体,并且装置400中的各个模块的上述和其它操作和/或功能分别为了实现图2中的方法的相应流程,为了简洁,在此不再赘述。
因此,本申请实施例提供的数据处理的装置400,从多级哈希表中的第N+1级哈希表中确定负载最小的位置为目标位置,并将该多级哈希表中的第N级哈希表中的哈希数据迁移至该目标位置,提高了第N+1级哈希表的负载均衡度,从而可以减小第N+1级哈希表中哈希冲突的概率,提高第N+1级哈希表的读写效率。。
可选地,所述处理单元410具体用于:从所述多个位置中确定第一位置,其中,所述第一位置为所述多个位置中负载最大的位置;将所述第一位置存储的第三哈希数据迁移至所述多级哈希表中的第N+2级哈希表;确定所述第一位置为所述目标位置。
本申请实施例提供的数据处理的装置400,通过将第N+1级哈希表中负载最大的位置存储的哈希数据迁移至第N+2级哈希表,然后将该位置作为目标位置,减少了第N+1级哈希表中的哈希数据,从而进一步减少了第N+1级哈希表中哈希冲突的概率,提高第N+1级哈希表的读写效率。
可选地,所述处理单元410具体用于:当所述多个位置中的第二位置不能插入所述第二哈希数据时,从所述多个位置中确定所述第一位置,其中,所述第二位置为将所述第三哈希数据迁移至所述第N+2级哈希表之前所述多个位置中负载最小的位置。
本申请实施例提供的数据处理的装置400,如果第N+1级哈希表中负载最小的位置可以插入所述第二哈希数据,则直接将该位置确定为目标位置,无需再对第N+1级哈希表中的哈希数据进行迁移处理,如果第N+1级哈希表中负载最小的位置不能插入所述第二哈希数据,则将第N+1级哈希表中负 载最大的位置存储的哈希数据迁移至第N+2级哈希表,然后将该位置作为目标位置,从而提高了第N+1级哈希表的负载均衡度,进而提高了哈希表的读写效率。
可选地,所述处理单元410具体用于:将所述第二哈希数据与所述目标位置存储的第四哈希数据合并后插入所述目标位置。
本申请实施例提供的数据处理的装置400,可以减少第N+1级哈希表中的哈希数据,从而减小了第N+1级哈希表中哈希冲突的概率,提高了第N+1级哈希表的读写效率。
可选地,所述处理单元410还用于:更新所述目标位置的负载值。
在目标位置插入第二哈希数据后,更新目标位置的负载值,通过更新目标位置的负载值,当继续向第N+1级哈希表中写入哈希数据时,可以根据目标位置更新后的负载值确定是否可以将第N级哈希表中的哈希数据插入该目标位置,从而提高了第N+1级哈希表的负载均衡度,提高第N+1级哈希表的读写效率。
上面主要从功能划分的角度对本申请实施例提供的数据处理的装置进行了介绍。可以理解的是,各个模块为了实现上述功能,其包含了执行各个功能相应的硬件结构和/或软件模块。需要说明的是,本申请实施例中对模块的划分是示意性的,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式。
在本申请实施例提供的数据处理的装置中,处理模块410可以是处理器或控制器,例如可以是中央处理器(Central Processing Unit,CPU),通用处理器,数字信号处理器(Digital Signal Processor,DSP),专用集成电路(Application-Specific Integrated Circuit,ASIC),现场可编程门阵列(Field Programmable Gate Array,FPGA)或者其他可编程逻辑器件、晶体管逻辑器件、硬件部件或者其任意组合。其可以实现或执行结合本申请公开内容所描述的各种示例性的逻辑方框,模块和电路。所述处理器也可以是实现计算功能的组合,例如包含一个或多个微处理器组合,DSP和微处理器的组合等等。通信模块430可以是通信接口、收发器、收发电路等,其中,通信接口是统称,可以包括一个或多个接口。存储模块420可以是存储器。
当处理单元410为处理器,通信单元430为通信接口,存储单元420为存储器时,本申请实施例所涉及的数据处理的装置可以为图5所示的装置 500。
参阅图5所示,该装置500包括:处理器510、通信接口520、存储器530。其中,通信接口520、处理器510以及存储器530可以通过内部连接通路相互通信,传递控制和/或数据信号,例如,通信接口520、处理器510以及存储器530可以通过总线连接。
所属领域的技术人员可以清楚地了解到,为了描述的方便和简洁,上述描述的装置和模块的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
因此,本申请实施例提供的数据处理的装置500,在向第N+1级哈希表中插入哈希数据时,从第N+1级哈希表中确定负载最小的位置为目标位置,并将第N级哈希表中的哈希数据插入该目标位置,提高了第N+1级哈希表的负载均衡度,从而可以减小第N+1级哈希表中哈希冲突的概率,提高第N+1级哈希表的读写效率。
在本申请各个实施例中,各过程的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。
另外,本文中术语“和/或”,仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。另外,本文中字符“/”,一般表示前后关联对象是一种“或”的关系。
本领域技术人员很容易意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,本申请能够以硬件或硬件和计算机软件的结合形式来实现。某个功能究竟以硬件还是计算机软件驱动硬件的方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
以上所述的具体实施方式,对本申请的目的、技术方案和有益效果进行了进一步详细说明,所应理解的是,以上所述仅为本申请的具体实施方式而已,并不用于限定本申请的保护范围,凡在本申请的技术方案的基础之上,所做的任何修改、等同替换、改进等,均应包括在本申请的保护范围之内。

Claims (10)

  1. 一种数据处理的方法,其特征在于,所述方法应用于存储系统,所述存储系统包括多级哈希Hash表,所述多级哈希表用于存储数据,所述多级哈希表包括第N级哈希表和第N+1级哈希表,所述N≥0且所述N为整数,所述方法包括:
    从所述第N+1级哈希表包括的多个位置中确定负载最小的位置为目标位置,所述多个位置为第一哈希数据在所述第N级哈希表中的候选位置在所述第N+1级哈希表中对应的位置,所述第一哈希数据为待插入所述第N级哈希表中的哈希数据;
    将所述第N级哈希表中的第二哈希数据迁移至所述目标位置,其中,所述第二哈希数据为所述目标位置在所述第N级哈希表中对应的位置存储的哈希数据;
    将所述第一哈希数据插入所述候选位置。
  2. 根据权利要求1所述的方法,其特征在于,所述从所述第N+1级哈希表包括的多个位置中确定负载最小的位置为目标位置,包括:
    从所述多个位置中确定第一位置,其中,所述第一位置为所述多个位置中负载最大的位置;
    将所述第一位置存储的第三哈希数据迁移至所述多级哈希表中的第N+2级哈希表;
    确定所述第一位置为所述目标位置。
  3. 根据权利要求2所述的方法,其特征在于,所述从所述多个位置中确定第一位置,包括:
    当所述多个位置中的第二位置不能插入所述第二哈希数据时,从所述多个位置中确定所述第一位置,其中,所述第二位置为将所述第三哈希数据迁移至所述第N+2级哈希表之前所述多个位置中负载最小的位置。
  4. 根据权利要求1至3中任一项所述的方法,其特征在于,所述将所述第N级哈希表中的第二哈希数据迁移至所述目标位置,包括:
    将所述第二哈希数据与所述目标位置存储的第四哈希数据合并后插入所述目标位置。
  5. 根据权利要求1至4中任一项所述的方法,其特征在于,所述方法还包括:
    更新所述目标位置的负载值。
  6. 一种数据处理的装置,其特征在于,所述装置配置于存储系统,所述存储系统包括多级哈希Hash表,所述多级哈希表用于存储数据,所述多级哈希表包括第N级哈希表和第N+1级哈希表,所述N≥0且所述N为整数,所述装置包括处理单元和存储单元,所述存储单元用于存储所述多级哈希表,所述处理单元用于:
    从所述第N+1级哈希表包括的多个位置中确定负载最小的位置为目标位置,所述多个位置为第一哈希数据在第N级哈希表中的候选位置在所述第N+1级哈希表中对应的位置,所述第一哈希数据为待插入所述第N级哈希表中的哈希数据;
    将所述第N级哈希表中的第二哈希数据迁移至所述目标位置,其中,所述第二哈希数据为所述目标位置在所述第N级哈希表中对应的位置存储的哈希数据;
    将所述第一哈希数据插入所述候选位置。
  7. 根据权利要求6所述的装置,其特征在于,所述处理单元具体用于:
    从所述多个位置中确定第一位置,其中,所述第一位置为所述多个位置中负载最大的位置;
    将所述第一位置存储的第三哈希数据迁移至所述多级哈希表中的第N+2级哈希表;
    确定所述第一位置为所述目标位置。
  8. 根据权利要求7所述的装置,其特征在于,所述处理单元具体用于:
    当所述多个位置中的第二位置不能插入所述第二哈希数据时,从所述多个位置中确定所述第一位置,其中,所述第二位置为将所述第三哈希数据迁移至所述第N+2级哈希表之前所述多个位置中负载最小的位置。
  9. 根据权利要求6至8中任一项所述的装置,其特征在于,所述处理单元具体用于:
    将所述第二哈希数据与所述目标位置存储的第四哈希数据合并后插入所述目标位置。
  10. 根据权利要求6至9中任一项所述的装置,其特征在于,所述处理单元还用于:
    更新所述目标位置的负载值。
PCT/CN2016/113705 2016-12-30 2016-12-30 数据处理的方法和装置 WO2018120109A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201680058640.5A CN109076021B (zh) 2016-12-30 2016-12-30 数据处理的方法和装置
PCT/CN2016/113705 WO2018120109A1 (zh) 2016-12-30 2016-12-30 数据处理的方法和装置

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2016/113705 WO2018120109A1 (zh) 2016-12-30 2016-12-30 数据处理的方法和装置

Publications (1)

Publication Number Publication Date
WO2018120109A1 true WO2018120109A1 (zh) 2018-07-05

Family

ID=62706806

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/113705 WO2018120109A1 (zh) 2016-12-30 2016-12-30 数据处理的方法和装置

Country Status (2)

Country Link
CN (1) CN109076021B (zh)
WO (1) WO2018120109A1 (zh)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109800228A (zh) * 2018-12-28 2019-05-24 深圳竹云科技有限公司 一种高效快速解决hash冲突的方法
CN109828966A (zh) * 2019-01-17 2019-05-31 平安科技(深圳)有限公司 渐进式重哈希方法、装置、计算机设备及存储介质
CN111857982A (zh) * 2019-04-25 2020-10-30 浙江大学 一种数据处理方法及其装置
CN111953682A (zh) * 2020-08-11 2020-11-17 北京八分量信息科技有限公司 银行云计算门户网站页面的防篡改方法、装置及相关产品
CN113141317A (zh) * 2021-03-05 2021-07-20 西安电子科技大学 流媒体服务器负载均衡方法、系统、计算机设备、终端
CN114661680A (zh) * 2022-05-25 2022-06-24 蓝象智联(杭州)科技有限公司 一种私有数据隐匿共享方法
CN116401258A (zh) * 2023-06-06 2023-07-07 支付宝(杭州)信息技术有限公司 数据索引方法、数据查询方法及对应装置

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110688219B (zh) * 2019-09-05 2022-03-18 浙江理工大学 基于反向混沌布谷鸟搜索的自适应权重负载均衡算法
CN112612419B (zh) * 2020-12-25 2022-10-25 西安交通大学 Nvm的数据存储结构、存储方法、读取方法、设备及介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1437357A (zh) * 2002-02-07 2003-08-20 华为技术有限公司 多哈希函数的虚通道标识/虚通路标识的查找方法
US20050125407A1 (en) * 2003-12-04 2005-06-09 Microsoft Corporation System and method for image authentication of a resource-sparing operating system
CN101267381A (zh) * 2007-03-13 2008-09-17 大唐移动通信设备有限公司 哈希表操作方法及装置
CN102073733A (zh) * 2011-01-19 2011-05-25 中兴通讯股份有限公司 哈希表管理方法及装置
CN103581024A (zh) * 2013-11-21 2014-02-12 盛科网络(苏州)有限公司 Mac地址硬件与软件相结合的学习方法及装置

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6151708A (en) * 1997-12-19 2000-11-21 Microsoft Corporation Determining program update availability via set intersection over a sub-optical pathway
CN101667958B (zh) * 2008-09-01 2012-08-29 华为技术有限公司 选择哈希函数的方法、存储及查找路由表的方法及装置
CN101674234B (zh) * 2009-08-21 2012-07-25 曙光信息产业(北京)有限公司 Ip报文的分片重组方法和装置
CN102754394B (zh) * 2010-08-19 2015-07-22 华为技术有限公司 一种哈希表存储、查找方法以及装置
CN105447059B (zh) * 2014-09-29 2019-10-01 华为技术有限公司 一种数据处理方法及装置

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1437357A (zh) * 2002-02-07 2003-08-20 华为技术有限公司 多哈希函数的虚通道标识/虚通路标识的查找方法
US20050125407A1 (en) * 2003-12-04 2005-06-09 Microsoft Corporation System and method for image authentication of a resource-sparing operating system
CN101267381A (zh) * 2007-03-13 2008-09-17 大唐移动通信设备有限公司 哈希表操作方法及装置
CN102073733A (zh) * 2011-01-19 2011-05-25 中兴通讯股份有限公司 哈希表管理方法及装置
CN103581024A (zh) * 2013-11-21 2014-02-12 盛科网络(苏州)有限公司 Mac地址硬件与软件相结合的学习方法及装置

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109800228A (zh) * 2018-12-28 2019-05-24 深圳竹云科技有限公司 一种高效快速解决hash冲突的方法
CN109800228B (zh) * 2018-12-28 2023-03-10 深圳竹云科技有限公司 一种高效快速解决hash冲突的方法
CN109828966A (zh) * 2019-01-17 2019-05-31 平安科技(深圳)有限公司 渐进式重哈希方法、装置、计算机设备及存储介质
CN111857982A (zh) * 2019-04-25 2020-10-30 浙江大学 一种数据处理方法及其装置
CN111857982B (zh) * 2019-04-25 2023-10-27 浙江大学 一种数据处理方法及其装置
CN111953682A (zh) * 2020-08-11 2020-11-17 北京八分量信息科技有限公司 银行云计算门户网站页面的防篡改方法、装置及相关产品
CN113141317A (zh) * 2021-03-05 2021-07-20 西安电子科技大学 流媒体服务器负载均衡方法、系统、计算机设备、终端
CN113141317B (zh) * 2021-03-05 2022-09-30 西安电子科技大学 流媒体服务器负载均衡方法、系统、计算机设备、终端
CN114661680A (zh) * 2022-05-25 2022-06-24 蓝象智联(杭州)科技有限公司 一种私有数据隐匿共享方法
CN114661680B (zh) * 2022-05-25 2022-08-12 蓝象智联(杭州)科技有限公司 一种私有数据隐匿共享方法
CN116401258A (zh) * 2023-06-06 2023-07-07 支付宝(杭州)信息技术有限公司 数据索引方法、数据查询方法及对应装置
CN116401258B (zh) * 2023-06-06 2023-09-22 支付宝(杭州)信息技术有限公司 数据索引方法、数据查询方法及对应装置

Also Published As

Publication number Publication date
CN109076021A (zh) 2018-12-21
CN109076021B (zh) 2020-09-11

Similar Documents

Publication Publication Date Title
WO2018120109A1 (zh) 数据处理的方法和装置
US8812555B2 (en) Dynamic lock-free hash tables
US10810179B2 (en) Distributed graph database
CN107038206B (zh) Lsm树的建立方法、lsm树的数据读取方法和服务器
JP6764359B2 (ja) 重複除去dramメモリモジュール及びそのメモリ重複除去方法
US8661005B2 (en) Optimized deletion and insertion for high-performance resizable RCU-protected hash tables
KR101467589B1 (ko) 데이터 구조를 가지는 하나 이상의 장치 판독가능 매체, 및장치 실행가능 명령어를 구비한 하나 이상의 장치 판독가능 매체
US20140136510A1 (en) Hybrid table implementation by using buffer pool as permanent in-memory storage for memory-resident data
KR20170112952A (ko) 중복제거 어플리케이션을 즉시 처리하는 효율적인 메모리를 위한 최적화된 합스카치 복수의 해시 테이블들
US10691696B2 (en) Key-value storage using a skip list
US20200167327A1 (en) System and method for self-resizing associative probabilistic hash-based data structures
KR20170112953A (ko) 중복제거 어플리케이션을 즉시 처리하는 효율적인 메모리를 위한 가상 버킷 복수의 해시 테이블들
CN111427885B (zh) 基于查找表的数据库管理方法和装置
EP3267329A1 (en) Data processing method having structure of cache index specified to transaction in mobile environment dbms
TWI648640B (zh) 一種用以建構人工智慧電腦之平行硬體搜索系統
CN113297432B (zh) 用于分区拆分与合并的方法、处理器可读介质和系统
JP5790755B2 (ja) データベース管理装置及びデータベース管理方法
CN106599247A (zh) LSM‑tree结构中数据文件的合并方法及装置
US11914740B2 (en) Data generalization apparatus, data generalization method, and program
WO2018046084A1 (en) Systems and methods for managing an ordered list data structure
US11507799B2 (en) Information processing apparatus and method of operating neural network computing device therein
CN111143232B (zh) 用于存储元数据的方法、设备和计算机可读介质
KR20110068578A (ko) GPU를 이용한 R-tree에서의 범위 질의의 병렬 처리 방법
US20240028560A1 (en) Directory management method and system for file system based on cuckoo hash and storage medium
JP5637312B2 (ja) キャッシュ制御装置及びパイプライン制御方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16925547

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16925547

Country of ref document: EP

Kind code of ref document: A1