WO2015176315A1 - Procédé d'intégration de hachage, dispositif et système de gestion de base de données - Google Patents

Procédé d'intégration de hachage, dispositif et système de gestion de base de données Download PDF

Info

Publication number
WO2015176315A1
WO2015176315A1 PCT/CN2014/078304 CN2014078304W WO2015176315A1 WO 2015176315 A1 WO2015176315 A1 WO 2015176315A1 CN 2014078304 W CN2014078304 W CN 2014078304W WO 2015176315 A1 WO2015176315 A1 WO 2015176315A1
Authority
WO
WIPO (PCT)
Prior art keywords
group
hash
data
preset
original data
Prior art date
Application number
PCT/CN2014/078304
Other languages
English (en)
Chinese (zh)
Inventor
桑永嘉
李俊
施会华
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to PCT/CN2014/078304 priority Critical patent/WO2015176315A1/fr
Priority to CN201480037464.8A priority patent/CN105359142B/zh
Publication of WO2015176315A1 publication Critical patent/WO2015176315A1/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor

Definitions

  • Hash connection method device and database management system
  • the present invention relates to the field of database technologies, and more particularly to a hash connection method, apparatus, and database management system.
  • BACKGROUND With the development and application of database technology, the amount of data stored in a database has transitioned from megabytes (M) and gigabytes (G) to the current terabytes (T) and gigabytes ( P). Based on the amount of data that can be stored in the current database, the amount of data that the user needs to face in the process of querying the database is G, T, or even P. In the case of querying such a large amount of data, it is necessary to satisfy the fast response of the query, which poses a great challenge to the processing performance of the database, and the database performance is crucial in the query process.
  • the basic methods for implementing j 0 i n operations in the database are mainly Hash Join, Merge Join, and the improved Radix Join algorithm for Grace Join.
  • the packet and Join are mainly included.
  • TLB Translation Lookaside Buffer, page table buffer, TLB entry refers to the buffer in the LTB.
  • the severe TLB miss caused by the page table entry) (there is no required table page in the TLB).
  • the existing query uses the multi-way packet method to reduce the TLB miss in the grouping phase.
  • the most common query process is as follows: First, grouping is performed by means of multiplexed packets, and the raw data is hashed in each grouping process, and then, after obtaining the multiplexed group, the Join operation is performed.
  • an object of the embodiments of the present invention is to provide a hash connection method, apparatus, and database management system, which overcomes the problem of wasting computing resources in the existing database query process.
  • the embodiment of the present invention provides the following technical solutions:
  • a first aspect of the embodiments of the present invention provides a hash connection method, which is applied to a database, and includes: receiving a structured query language SQL statement including a connection Join operation, and parsing and acquiring at least two target data groups to be connected;
  • N hash hash packets for each data segment in each target data group in sequence based on a preset grouping rule wherein, in each hash packet, calculating the original data in the data segment based on the first hash packet
  • the hash value represented by the bit bit is used to divide the original data corresponding to the hash value of the same bit position in the current hash grouping process into the same group, and the original data divided into the same group is classified according to each original.
  • the positions of the data in the target data group are sorted and saved in the same group, and N takes a positive integer greater than or equal to 1;
  • the groups are sorted according to the hash value corresponding to the original data contained in each group from small to large;
  • the Join operation is performed by taking the original data in each group obtained after the N times of the hash packets in the target data groups to be connected in order.
  • the performing, by using the preset grouping rule, the first hash packet in the N times hash packet for each data segment in each target data group includes: The hash value of the original data contained in the data segment, and the bit value is used to represent the calculated hash value;
  • the original data corresponding to the hash value with the same value in the specified bit position is divided into the same group, and the original data divided in the same group is in the same position in the target data group according to each original data. Sort and save within the group;
  • the unspecified bit bits of the hash value corresponding to each original data are associated with the original data and saved;
  • the performing the second to the nth hash packets in the N hash packets in the data segment in each target data group in sequence based on the preset grouping rule includes: Hash the original data in any group obtained after the last hash grouping, n is included in N, and a positive integer greater than 2 includes:
  • the original data corresponding to the hash value of the same bit in the current hash grouping process is divided into the same group based on the unspecified bit in the last hash packet associated with the original data in the current group. Internally, and sorting and saving the original data divided into the same group in the same group according to the position of each original data in the target data group;
  • the first type of the preset grouping rule involved in the first aspect of the embodiment of the present invention includes: preset the number of hash packets N, or preset the total number of packets S, or preset the number of hash packets N and the total number of preset packets S ;
  • the preset grouping rule is the preset hash packet number N
  • the data segments in each of the target data are hash-grouped in turn until the N-th hash packet is completed;
  • the preset grouping rule is the preset total number S of packets, hashing the data segments in each of the target data groups in turn, until the number of packets of each of the target data groups is equal to the preset number of packets;
  • the preset grouping rule is a preset hash packet number N and a preset packet total number S, and the preset hash packet number N has a higher priority than the preset packet total number S, and is sequentially used in each of the target data groups.
  • the data segment is hashed until the hash packet is completed N times;
  • the preset grouping rule is the preset hash packet number N and the preset packet total number S, and the preset packet total number S has a higher priority than the preset hash packet number N, sequentially for each of the target data groups The data segment is hashed until the number of packets of each of the target data groups is equal to the total number of preset packets S;
  • the preset grouping rule is the preset hash packet number N and the preset packet total number S, and the priority of the preset hash packet number N is the same as the priority of the preset packet total S, sequentially for each of the target data
  • the data segment in the group is hashed until the hash packet is completed N times and the number of packets of each target data group is equal to the total number of preset packets S;
  • N is determined by the storage size of the page table buffer TLB, which is a positive integer greater than or equal to 1, and N contains n;
  • S is determined by the size of the database cache cache, and is a positive integer greater than or equal to 2;
  • the priority of the preset hash packet number N and the preset packet total S is determined by the storage size of the TLB and the size of the cache.
  • the second preset packet rule involved in the first aspect of the embodiment of the present invention includes: a preset number of hash packets N, a preset number of packets m of each hash packet, and a total number of preset packets S; wherein, N The value is determined by the storage size of the page table buffer TLB, which is a positive integer greater than or equal to 1, m is less than N; the value of S is determined by the size of the database cache cache, which is a positive integer greater than or equal to 2; When the data segment in each of the target data is hashed, the packet is grouped according to the preset number of packets of each hash packet, so that the last packet number is equal to the preset hash packet number, and the total number of the divided groups is equal to the preset. The total number of groups.
  • each target data group into multiple data segments by using a vector vector is:
  • the vector vector is a quantity unit, one vector corresponds to one data segment, and each target data group is sequentially divided into M data segments, the value of M is determined by the number of original data in the target data group, and the database cache cache The size and size of the page table buffer TLB storage;
  • the number of the original data included in the first to the M-1th data segments is the same, and the number of the original data included in the Mth data segment is less than or equal to the first to the M-1 data segments.
  • each original data corresponding to a hash value having the same value in a specified bit position is divided into the same group, and each original data divided into the same group is divided.
  • Sorting and saving in the same group according to the position of each original data in the target data group includes: searching for each original data corresponding to the same hash value in the specified bit position in the current hash grouping process, and each original The data is divided into the same group, wherein the bit size required for the current hash packet is specified according to the size of the database cache cache and the storage size of the page table buffer TLB; traversing the subscripts of each original data divided in the same group, The subscripts of the respective original data are used to identify the location of each original data in the target data group;
  • the original data corresponding to each subscript is arranged from small to large;
  • Each raw data is written into the same group and saved in the order from small to large.
  • the hash with the same value in the specified bit position in the current hash grouping process is used.
  • the original data corresponding to the value is divided into the same group, and is divided in
  • Each raw data in the same group is sorted and saved in the same group according to the position of each raw data in the target data group, including:
  • the storage size of the TLB is determined;
  • the subscripts of the respective original data are used to identify the locations of the respective original data in the target data group;
  • the original data corresponding to each subscript is arranged from small to large;
  • Each raw data is written into the same group and saved in the order from small to large.
  • the two groups work as a pair of raw data join operations, and perform the Join operation on the original data in each of the two target data groups;
  • the manner in which the two groups are a pair of original data join operations includes:
  • a second aspect of the embodiment of the present invention provides a hash connection apparatus, which is applied to a database, and includes: a receiving unit, configured to receive a structured query language SQL statement including a connection Join operation, and parse and obtain at least two to be connected. Target data set;
  • a dividing unit configured to divide each target data group into a plurality of data segments by using a vector vector
  • a grouping unit configured to sequentially perform N hash hash packets for each data segment in each target data group based on a preset grouping rule, where, in each hash packet, the data segment is calculated based on the first hash packet
  • the raw data is represented by the bit value, and the original data corresponding to the hash value of the same bit position in the current hash grouping process is divided into the same group, and each original is divided into the same group.
  • Data, sorted and saved in the same group according to the position of each original data in the target data group, and N takes a positive integer greater than or equal to 1;
  • a sorting unit configured to obtain a group obtained after N times hash grouping for each target data group, in which the hash value corresponding to the original data included in each group is from small to large for each d, Group sorting;
  • a connecting unit configured to perform a Join operation on the original data in each group obtained after the N hash packets in the target data groups to be connected in the order of the two connected data groups.
  • a third aspect of the embodiments of the present invention provides a database management system, which is applied to a database, and includes:
  • a memory having a storage medium, wherein the memory stores a program for performing a database query; and a processor connected to the memory via a bus, when the database query is executed, the processor invokes a database query program stored in the memory And executing the database query procedure according to a hash connection method provided by the first aspect of the embodiments of the present invention described above.
  • the embodiment of the present invention discloses a hash connection method, device and database management system as compared with the prior art.
  • the target data group is grouped into a plurality of data segments, and then the target data group to be connected is divided into multiple data segments by using a vector vector.
  • the value of the specified bit in the current hash grouping process is the same.
  • the original data corresponding to the hash value is divided into the same group, and each original data divided in the same group is sorted and saved in the same group according to the position of each original data in the target data group.
  • the embodiment of the present invention can perform hash packet processing on a plurality of original data simultaneously by using a vector as a quantity unit and a hash packet by using a specified bit in the hash grouping process, and does not need to repeatedly calculate the original in the process of multiple hash packets.
  • the hash value of the data which reduces the cache miss cache miss, also eliminates the need to repeatedly calculate the hash value to avoid the waste of computing resources.
  • the sorting complexity is lower than the sorting complexity when the raw data is randomly assigned to join.
  • FIG. 1 is a flowchart of a hash connection method according to Embodiment 1 of the present invention
  • FIG. 2 is a schematic diagram of a third-time hash packet disclosed in Example 4 of the third embodiment of the present invention
  • FIG. 3 is a schematic diagram of the same original data included in each data segment disclosed in Embodiment 4 of the present invention
  • FIG. 5 is a schematic diagram of a grouping of raw data in a data segment according to Embodiment 4 of the present invention
  • FIG. 6 is a flowchart of dividing a group in a second to Nth hash grouping process according to Embodiment 4 of the present invention.
  • FIG. 8 is a schematic structural diagram of a hash connection apparatus according to Embodiment 5 of the present invention.
  • FIG. 9 is a schematic structural diagram of a database management system according to Embodiment 5 of the present invention. detailed description For the purposes of reference and clarity, the description, abbreviations or abbreviations of the technical terms used below are summarized as follows:
  • TLB Translation Look aside Buffer
  • page table buffer page table entry
  • TLB entry refers to the page table entry cached in LTB
  • Cache miss means that the requested data is not in the memory layer to be accessed.
  • an embodiment of the present invention provides a hash connection method, apparatus, and data management system, which can implement a hash packet by using a vector vector as a quantity unit and using a specified bit bit in a current hash grouping process in a subsequent grouping process.
  • each hash group is divided into the original data in each group, so that the original data in each group obtained after grouping multiple data segments is locally ordered, when the local ordered original data is joined.
  • the sorting complexity is lower than the sorting complexity when the raw data is randomly assigned to join.
  • the first embodiment of the present invention discloses a hash connection method, and the method is applied to a database.
  • the process is as shown in step S101 to step S105 in FIG.
  • Step S101 Receive a structured query language SQL statement including a connection Join operation, and parse and obtain at least two target data groups to be connected;
  • step S101 is executed, and the received SQL query statement containing the Join operation is parsed by the database, and at least two target data to be connected are obtained.
  • Group That is to say, two target data groups to be connected are paired, and at least two target data groups to be connected appear in the process of parsing, that is to say, the target data groups to be connected are parsed in pairs.
  • Step S102 dividing each target data group into a plurality of data segments of the determined data by using a vector vector as a quantity unit;
  • step S102 the same operation is performed on the parsed pair of target data groups to be connected, and a target data group is taken as an example in the process of dividing the data segments.
  • the current target data set is divided by the vector vector.
  • the unit of the vector refers to how many pieces of raw data are contained in a vector as a fixed unit.
  • the target data group is divided into a plurality of data segments by using the vector quantity unit, that is, one data segment corresponds to one vector.
  • the maximum number of original data that can be included in one data segment is one vector unit, and the target data group is divided into multiple data segments, and the divided data segments are included in the data segment.
  • the number of raw data is usually the same.
  • the number of units is limited to the quantity unit vector.
  • each target data group to be connected can be divided into a plurality of data segments.
  • a vectorization method is used.
  • a vector is used as a quantity unit, and a hash value is simultaneously calculated for the original data in the vector, and then several original data in the same group are compared. Write the corresponding group at one time, which reduces the cache miss and improves the join performance.
  • Step S103 Perform N times hash grouping on the data segments in each target data group in sequence according to the preset grouping rule, where, in each hash grouping, calculate the original data in the data segment based on the first hash group.
  • the hash value represented by the bit bit is used to divide the original data corresponding to the hash value of the same bit position in the current hash grouping process into the same group, and the original data divided into the same group, according to each The position of the original data in the target data group is sorted and saved in the same group, and N takes a positive integer greater than or equal to 1;
  • each target data is sequentially performed based on a preset grouping rule.
  • the data segments in the group are hashed N times.
  • the hash packet is ended to the end of the last data segment.
  • the hash value is calculated simultaneously for all the original data contained in the data segment, and the hash value of each original data is represented by a bit, and the bit is installed.
  • the number of bits in the database itself is determined by the maximum number of CPUs currently CPU of the computer.
  • the hash value corresponding to the original data calculated during the first hash grouping process is represented by a 32-bit bit. If the computer on which the database is currently installed is 64-bit, the hash value corresponding to the original data calculated during the first hash grouping is represented by a 64-bit bit.
  • the comparison is performed on the specified bit of each hash value represented by the bit, or traversed, or searched at the specified bit.
  • the hash value with the same value is set, and the original data corresponding to the hash value is divided into the same group. For example, if the number of bits required for the first hash packet is 2 bits, then the highest bit of the hash value indicated by each bit is used, and the two bits are compared backwards, or traversed, within the group. .
  • the position of each raw data in the target data group is sorted within the group, and the position can also be considered as the position of each original data in the data segment.
  • the original data A, B, and C are divided into the same group. If A is ranked 3rd in the target data group, B is ranked 1st in the target data group, and C is ranked 6th in the target data group. After sorting, the actual storage order of A, B, and C in the group is: B, A, C.
  • the process of performing the first hash packet for each data segment from the top to the bottom is the same, and the designated bit bit is sequentially started from the undesired highest bit bit from the start of the first hash packet. .
  • the process of performing N times hash value grouping after the first hash packet needs to calculate the hash value of the original data, in the subsequent hash grouping process, only the unspecified bit bits of the hash values corresponding to the original data are used for hashing.
  • the preset grouping rule mentioned in the step S103 is to preset the number of hash packets N, or the preset total number of packets S, or preset the number of hash packets N and the total number of preset packets S; and, preset the number of hash packets N, the preset number of packets m of each hash packet and the total number of preset packets S.
  • the value of N is determined by the storage size of the page table buffer TLB, which is a positive integer greater than or equal to 1, m is less than N; the value of S is determined by the size of the database cache cache, which is a positive integer greater than or equal to 2.
  • Step S104 For each group obtained after each target data group has been hashed by N times, in the target data group, each group is sorted according to the hash value corresponding to the original data included in each group, from small to large. ;
  • each group in the target data group obtained after performing N hash grouping according to the preset grouping rule is reordered.
  • the way is: Sort the groups according to the hash value of the raw data contained in the group. For example: After grouping the target data sets, get group 1, group 2, and group 3; where, the raw data contained in group 1 has a hash value of 3, and the raw data contained in group 2 has a hash value of 5, group 3 The raw data contained in the hash value is 0. After sorting, the order of the groups in the target data group is: Group 3, Group 1 and Group 2.
  • each group obtained after performing N-time hash grouping according to a preset grouping rule the original data that is finally divided into the same group usually corresponds to the same hash value.
  • Step S105 Perform the Join operation on the original data in each group obtained after the N times hash group in the target data groups to be connected according to the ranking.
  • Step S105 is performed for each target data group to be connected, for the target data groups to be connected after sorting the original data in the same group divided by the hash grouping process in which the step S102 to the step S104 are performed.
  • An ordered group in sequence, joins a group in a target data group to be connected with another group in the target data group to be connected, and performs the Join operation on the ordered raw data in each group.
  • the hash value is calculated in groups by a vector, and then the same group is grouped.
  • the hash values corresponding to several original data contained in the one-time data are written into the corresponding group at one time.
  • Hash grouping in the form of a vector can avoid unnecessary cache thrashing, which reduces the cache miss and improves the performance of Join.
  • the hash value of each original data is calculated only in the first grouping process, and the number of bits used later are recorded to the associated position of the corresponding original data for use in the subsequent grouping process, thereby eliminating duplication. Calculate the cost of the hash value and avoid waste of resources.
  • the raw data in each group is sorted after each hash packet is written into each corresponding group.
  • the final sorting is performed for each group, since the original data has been partially sorted in the process of the multiplex grouping disclosed in the embodiment of the present invention, the original data in each group is locally The order is up, so you only need to sort the groups. In this way, the complexity of sorting the original data and the individual groups in each group after the grouping is completed in the prior art can be greatly reduced, and the time consumed by the sorting is reduced. And when this locally ordered raw data is joined, the sorting complexity is lower than the sorting complexity when the randomly allocated raw data is joined.
  • the hash connection method disclosed in the first embodiment of the present invention is mainly described in detail in the second embodiment of the present invention for the N times hash packets mentioned in step S103 shown in FIG.
  • the process of sequentially performing the first hash packet in the hash packet for each data segment in each target data group based on the preset grouping rule includes:
  • Step S1031 Calculate a hash value of the original data included in the current data segment, and use a bit bit to represent the calculated hash value.
  • the target data group is divided by a vector in a quantity unit according to the execution step S102. Taking any one of the target data groups as an example, when performing step S1031, the hash of each original data included in the same data segment is simultaneously calculated. The value, and the bit value is used to represent the hash value obtained by calculating each raw data. As described in the first embodiment of the present application, the bit bit is related to the number of bits of the computer itself in which the database is installed, and is determined by the maximum number of CPUs currently being the CPU of the computer.
  • step SI 032 in the process of performing the first hash sub-packet, according to the size of the data cache cache and the storage size of the page table buffer TLB, the bit bits required for the current hash packet are determined,
  • the hash value represented by the bit bit corresponding to each original data in the data segment is divided into the same group in the process of dividing the group by the original data corresponding to the hash value of the same bit position.
  • the hash value represented by the current bit bit is specified from the highest bit to the lowest bit direction, and when the group is divided, the same first two bits of the same hash value are corresponding.
  • the raw data is divided into the same group.
  • the position of the original data in the target data group is used to sort in the current group.
  • the original data is included in the same group: A, B, C, where A is at the 6th position of the target data group, B is at the 1st position of the target data group, and C is at the position of the target data group.
  • the position of the original data in the saved group obtained after executing step S1033 is: B, C, A, so that the original data in each group obtained by each division is ordered.
  • Step S1033 Associate the unspecified bit bits of the hash value corresponding to each original data with the original data, and save the associated bits of the original data corresponding to each hash value;
  • Step S1032 after performing the step S1033, after the group is divided, the hash value corresponding to the original data is not used in the hash packet process, or the unspecified bit bit is saved at the associated position of the original data.
  • the associated location may be a storage space adjacent to the original data, or may be another storage space associated with the original data.
  • the re-grouping is stopped. If the preset grouping rule is not met, the original data in each group after the current first hash grouping is continued to be grouped again.
  • the raw data in any one of the groups obtained after the last hash grouping in the second to nth hash packets is hash grouped, and n takes a positive integer greater than 2 and is included in N.
  • the above process of sequentially performing the second or even nth hash packets in the N segments of the data segments in each target data group based on the preset grouping rules includes:
  • Step S1034 According to the unspecified bit in the last hash packet saved in the original data association position in the current group, the original data corresponding to the hash value with the same value in the specified bit position in the current hash grouping process is divided. Within the same group, and for each of the original groups divided into the same group Starting data, sorting and saving each original data in the same group according to the position of each original data in the target data group;
  • step S1034 according to the bit bits required for the current hash packet specified in the bit position saved at the original data associated position, the hash value corresponding to the same value is assigned to the hash value of the same bit position.
  • the original data is in the same group, and at the same time, according to the same bit position, when the original data can be divided into the same group, the position of the original data in the target data group is used, and the current data is sorted in the current group. .
  • Step S1035 Save the remaining unspecified bit bits associated with each original data again at the associated position of the original data;
  • step S1035 the remaining unspecified bit bits are again saved at the associated position of the original data for use in subsequent packets.
  • the bit bit currently held at the original data associated position is the unused bit remaining after performing step S1032. If the bit bit currently used for the hash packet is still two bits, the same two bits are the two bits taken from the highest bit of the current remaining bit to the lowest bit.
  • step S1034 and the step S1035 are performed, if the current grouping situation does not satisfy the preset grouping rule, the loop returns to step S1034 and step S1035 until the current grouping of the target data group is stopped.
  • the target data group is grouped to satisfy the preset grouping rule, and the original data divided in the same group is sorted in each grouping process, so that each time the hash grouping process is obtained Although the grouping results are disordered as a whole, they are ordered in each group obtained.
  • the sorting complexity is lower than the randomly assigned raw data. Sorting complexity when joining.
  • the hash value of each original data is calculated only in the first grouping process, and the subsequent used bits are recorded to the corresponding associated positions of the original data for the subsequent grouping process. Used directly in the middle, thus eliminating the cost of repeatedly calculating the hash value and avoiding waste of resources.
  • the original data in each group is sorted, so that after the last hash group is completed, the original data in each group is partially Ordered, so only the groups obtained after grouping the target data group hash need to be sorted. In this way, the original data and the groups in each group can be sorted after the grouping is completed in the prior art. The complexity of reducing the time spent by sorting.
  • the method for the hash connection according to the first embodiment and the second embodiment of the present invention is mainly described in detail in the second embodiment of the present invention for the preset grouping rule mentioned in step S103 shown in FIG.
  • the target data group is stopped after completing the hash packets by N times. Group by.
  • the value of N is determined by the storage size of the page table buffer TLB, and is a positive integer greater than or equal to 1.
  • the value of N is 4.
  • the process of performing hash grouping on the original data in any one group obtained after the last hash grouping in the second to nth hash packets disclosed in the first embodiment of the present invention is performed.
  • the hash grouping of the target data group is stopped. At this time, the obtained number of groups is the number of groups of the target data group.
  • the preset grouping rule is the preset total number S of packets
  • the value of S is determined by the size of the database cache cache and is a positive integer greater than or equal to 2.
  • Example 2 When the total number of preset packets that can be divided by the current target data group determined by the size of the database cache cache is 10, the first hash packet is performed for the current target data group, and after the first hash packet is completed, the obtained If the number of packets is less than 10, the hash packet is continued until the number of packets of the current target data group reaches 10, and the hash packet is stopped.
  • the preset grouping rule is the preset hash packet number N and the preset packet total number S, and the preset hash packet number N has a higher priority than the preset packet total number S, sequentially for each of the target data groups The data segment is hashed until the hash packet is completed N times;
  • the preset grouping rule is the preset hash packet number N and the preset packet total number S, and the preset packet total number S has a higher priority than the preset hash packet number N, sequentially for each of the target data groups The data segment is hashed until the number of packets of each of the target data groups is equal to the total number of preset packets S;
  • the preset grouping rule is the preset hash packet number N and the preset packet total number S, and the priority of the preset hash packet number N is consistent with the priority of the preset packet total S, sequentially for each of the The data segment in the target data group is hashed until the hash packet is completed N times and the number of packets of each of the target data groups is equal to the total number of preset packets S;
  • the priority of the preset hash packet number N and the preset total number S is determined by the storage size of the TLB and the size of the cache.
  • Example 3 The preset number of packets determined by the storage size of the page table buffer TLB is 3, and the total number of preset packets determined by the size of the database cache cache is 16.
  • the priority of the preset hash packet number N is the same as the priority of the preset packet total S, the total number of packets obtained after the target data group is grouped 3 times based on the preset packet number is exactly 16; when the default hash is obtained
  • the priority of the number of packets N is higher than the total number of preset packets S, after the target data group is grouped 3 times based on the preset number of packets, there may be a case where the total number of packets obtained is less than 16, or equal to 16, Or greater than 16;
  • the priority of the preset total number S is higher than the preset hash packet number N, in the process of grouping, there may be a case where, when the total number of packets is 16, the target data group is obtained.
  • the number of groupings is greater than 3 times, or less than 3 times, or equal
  • the preset grouping rule includes: a preset number of hash packets N, a preset number of packets m of each hash packet, and a total number of preset packets S; wherein, the value of N is determined by the storage size of the page table buffer TLB , is a positive integer greater than or equal to 1, m is less than N; the value of S is determined by the size of the database cache cache, is a positive integer greater than or equal to 2; when hashing the data segments in each of the target data in turn The packet is grouped according to the preset number m of packets of each hash packet, so that the last number of packets is equal to the preset hash packet number N, and the total number of divided groups is equal to the total number of preset packets 8.
  • Example 4 as shown in FIG. 2, the preset number of packets determined by the storage size of the page table buffer TLB is 3, the number of packets per hash packet is 2, and the total number of preset packets determined by the size of the database cache cache is 16.
  • each data segment is subdivided into two groups in the first hash grouping process, and respectively written into the corresponding group;
  • each group after the previous grouping is again divided into two data segments and written into the corresponding groups, and so on until the hash group is executed for the target data group and 16 is obtained. Groups.
  • the preset grouping rule based on the hash grouping process mentioned in step S103 shown in FIG. 1 is mainly explained.
  • the preset grouping rule is mainly determined based on the storage size of the page table buffer TLB in the computer according to the database, and the size of the database cache cache. Based on the preset grouping rule, the cache miss may be avoided during the grouping process. Enter Improve the performance of subsequent Join.
  • a hash connection method according to the first embodiment to the third embodiment of the present invention, wherein, for step S102 shown in FIG. 1, the target data group is divided into a plurality of data segments by a vector vector.
  • the specific process includes:
  • the vector vector is a quantity unit, one vector corresponds to one data segment, and each target data group is sequentially divided into M data segments, the value of M is determined by the number of original data in the target data group, and the database cache cache The size and size of the page table buffer TLB storage;
  • the number of the original data included in the first to the M-1th data segments is the same, and the number of the original data included in the Mth data segment is less than or equal to the first to the M-1 data segments.
  • the target data group that needs to be hashed contains a total of 25 original data, with a vector as a quantity unit, and the vector quantity unit contains 5 original data, so that 5 original data constitute one data segment.
  • the target data group containing 25 raw data is divided into five data segments by dividing the target data unit by the number of vectors.
  • the original data contained in the 1st to 5th data segments is the same, as shown in Fig. 3, the case where the number of original data included in each data segment is the same.
  • the target data group that needs to be hashed contains a total of 28 original data, with a vector as the quantity unit, and the vector quantity unit contains 5 original data, so that 5 original data constitute one data segment.
  • the target data group containing 28 raw data is divided into six data segments by the vector number unit.
  • the original data contained in the 1st to 5th data segments is the same, and the 6th data segment contains 3 raw data, which is smaller than the original data contained in the 5th data segment of the 1st value.
  • a hash connection method according to Embodiment 2 of the present invention, wherein the steps disclosed in the above disclosure are divided into the same group, and each piece of original data divided in the same group is in the target data group according to each original data.
  • the location in the same group sorts and saves the original data in the same group.
  • the specific process is shown in Figure 4, including: the hash value;
  • Step S202 searching for a hash with the same value in the specified bit position in the current hash grouping process.
  • Each raw data corresponding to the value divides each original data into the same group;
  • the hash value is represented by a bit.
  • the hash value on the specified bit is looked up.
  • the specified bit bit may be specified according to the size of the database cache cache and the storage size of the page table buffer TLB before the current packet is performed; or may be based on the size and page table of the database cache cache when receiving the hash packet needs to be received.
  • the storage size of the buffered TLB is used to specify the bit bits to be used in the subsequent grouping process. When this grouping is performed, there is no need to re-specify, directly in the bit position required for this hash packet. Find it.
  • Step S203 traversing subscripts of each original data to be divided into the same group, and the subscripts of the respective original data are used to identify the location of each original data in the target data group;
  • Step S204 Arrange the original data corresponding to each subscript from small to large according to the size of each subscript;
  • Step S205 Write each original data into the same group and save according to the sequence from small to large.
  • steps S203 to S205 are performed to sort the original data divided in the same group and write them in the same group during the grouping process, so that the order is locally ordered in the process of the target data group.
  • a piece of data in units of vectors (shown by a dashed box in FIG. 5) is used to calculate a hash value together with the original data in the data segment.
  • value is the real value of the participating join
  • position in Figure 5 represents the position of each original data in the entire data segment
  • position-1 represents the subscript of each original data that is sorted and sorted in the same group
  • hash Value represents the hash value corresponding to the original data.
  • the original data that needs to be written into the current group is sorted while the original data is written into the current group.
  • the next adjacent vector is operated as above until all the vectors in the target data group have completed the current hash group.
  • the local hash group after the first hash group of the target data group is obtained, thereby sharing the burden of sorting the original data in the final sorting of each group, thereby realizing the reduction of group complexity.
  • the first hash packet is executed in the above manner for each data segment in the target data group currently grouped, if the current packet satisfies the preset packet rule, the re-grouping is stopped.
  • a hash connection method according to the second embodiment of the present invention, wherein, for the step S1034 disclosed above, based on the unspecified bit in the last hash packet saved at the original data association location in the current group, the current Each raw data corresponding to the same hash value in the specified bit position in the hash grouping process is divided into the same group, and each original data divided in the same group is in the target data group according to each original data. The locations are sorted and saved in the same group.
  • Figure 6 including:
  • Step S301 Calling an unspecified bit bit in the last hash packet saved at each original data association location in the group currently performing the hash packet;
  • the current group calls any one of the groups obtained after the last hash group, and calls the unspecified bit in the last hash packet saved in the original data association position in the current group. Bit, is for further current group to perform hash grouping again.
  • Step S302 determining, according to the unspecified bit bit of the call, a bit bit required for the current hash packet process, where the bit bit required in the current hash packet process is based on the size of the database cache cache and The storage size of the page table buffer TLB is determined;
  • Step S303 Find each original data corresponding to the hash value with the same value in the specified bit position in the current hash grouping process, and divide each original data into the same group.
  • Step S304 traversing subscripts of each original data to be divided into the same group, and the subscripts of the respective original data are used to identify the location of each original data in the target data group;
  • Step S305 Arrange, according to the size of each subscript, each original data corresponding to each subscript from small to large;
  • Step S306 Write each original data into the same group and save according to the sequence from small to large.
  • the sorting process of the original data divided into the same group in the above steps S304 to S306 is the same as the step S203 to the step S205 in the above-mentioned FIG. 4, and the detailed description is not mentioned here.
  • step S301 Performing the above step S301 to each group obtained by the previous hash group of the target data group Step S306, thereby obtaining a new group with internal raw data ordered after hashing again.
  • the hash packet is stopped. If the preset grouping rule is not satisfied, step S301 to step S303 are performed to group the groups obtained by the previous hash group again until the preset grouping rule is satisfied.
  • a hash connection method according to the above-mentioned first embodiment of the present invention to the third embodiment of the present invention, wherein, in step S105 of the above disclosure, the two target data groups to be connected are sequentially obtained by N in sequence
  • the raw data in each group obtained after the hash group is joined, and the specific process includes:
  • Step S501 Acquire, in sequence, each of the two target data groups to be connected to perform N times hash grouping
  • step S501 is executed to obtain each group in the two target data groups to be connected.
  • Step S502 The two groups perform a Join operation on the original data in each group of the two target data groups in a manner of performing a raw data Join operation.
  • the raw data join operation is performed according to the pair of two groups, and the original data in each group in the two target data groups to be connected is joined. As shown in Figure 7, it includes:
  • Step S503 sequentially traversing each group in another target data group by a group in a target data group;
  • Step S504 it is determined whether the current group traverses to the same group in another target data group, and if so, step S505 is performed, and if no, step S507 is performed;
  • Step S505 if traversing to the same group, the original data in the group is sequentially joined with the original data in the same group, wherein the same group refers to the hash of the original data stored in the group.
  • the value is the same as the hash value of the raw data stored in the group used for traversal;
  • Step S506 determining whether the original data in any one of the two groups currently performing the Join operation has performed the Join operation, and if yes, executing step S507, and if not, continuing to perform the Join operation of the original data in the two groups. And returning to step S506;
  • Step S507 moving to the next group returns to step S503;
  • the hash connection performs grouping and the process required to be performed in the Join process.
  • the hash value is calculated simultaneously for the original data in each vector unit in the first grouping process, and then the hash values corresponding to the plurality of original data included in the same group are written to the corresponding one-time.
  • the subsequent use of a number of bit bits is recorded to the corresponding location of the corresponding original data for use in the subsequent grouping process, thereby eliminating the cost of repeatedly calculating the hash value and avoiding waste of resources.
  • the original data in each group is sorted before the original data is written into each corresponding group, and each group is performed after the completion of the hash grouping. Sorting, so that after the final grouping is completed, the final sorting of each group can reduce the burden of sorting the data of the group and the internal data of the group, and reduce the time consumed by the sorting.
  • the hash connection apparatus is applied to a database, and mainly includes: a receiving unit 101, a dividing unit 102, a grouping unit 103, a sorting unit 104, and a connecting unit 105.
  • the receiving unit 101 is configured to receive a structured query language SQL statement including a connection Join operation, and parse and obtain at least two target data groups to be connected;
  • a subsequent dividing unit 102 is performed, and the grouping unit 103 and the sorting unit 104 undergo division, grouping and sorting, and then enter the connecting unit 105 to make the grouped waiting.
  • the two target data groups connected perform a Join operation.
  • the dividing unit 102 is configured to divide each target data group into multiple data segments by using a vector vector as a quantity unit;
  • the grouping unit 103 is configured to perform N times hash hash grouping on the data segments in each target data group in sequence according to a preset grouping rule, where the data segment is calculated based on the first hash group each time the hash grouping is performed.
  • the hash value represented by the bit in the original data is divided into the same group by the hash data corresponding to the same bit value in the current hash grouping process, and is divided into the same group.
  • N takes a positive integer greater than or equal to 1;
  • Sorting unit 104 is used to obtain a group obtained after N times hash grouping for each target data group, at the target In the data group, the ds and groups are sorted according to the hash value corresponding to the original data contained in each group from small to large;
  • a connecting unit 105 configured to sequentially take the two target data groups to be connected according to the sorting
  • the grouping unit 103 includes: a first hash grouping and a hash grouping module 1031 for the data segments in the target data group from top to bottom; and, in any group obtained after the last hash grouping
  • the primary hash grouping module 1031 is configured to calculate a hash value of the original data included in the current data segment, and use the bit bit to represent the calculated hash value; and the hash value corresponding to the same bit position is corresponding to the hash value.
  • the original data is divided into the same group, and each raw data divided into the same group is sorted and saved in the same group according to the position of each original data in the target data group; the hash corresponding to each original data is The unspecified bit in the value is associated with the original data and saved;
  • the multiple hash grouping module 1032 is configured to: use the unspecified bit in the last hash packet associated with and saved by the original data in the current group, and set the hash with the same value in the specified bit position in the current hash grouping process.
  • the original data corresponding to the value is divided into the same group, and each original data divided in the same group is sorted and saved in the same group according to the position of each original data in the target data group; The remaining unspecified bits of the original data association are saved again.
  • the grouping unit When the preset grouping rule is a preset hash packet number N, the grouping unit is configured to perform hash grouping on data segments in each of the target data groups in sequence, until N times hash packets are completed; When the preset grouping rule is the preset total number S of packets, the grouping unit is configured to perform hash grouping on the data segments in each of the target data groups until the number of groups of each target data group is equal to a preset.
  • the grouping unit is used to sequentially The data segment in the target data group is hashed until the hash packet is completed N times; when the preset packet rule is the preset hash packet number N and the preset packet total number S, and the preset packet total S is prioritized
  • the grouping unit is configured to perform hash grouping on the data segments in each of the target data groups until the number of packets of each target data group is equal to a preset group. Total number S;
  • the grouping unit is used to sequentially Performing a hash grouping on the data segments in each of the target data groups until the N times hash packets are completed and the number of packets of each of the target data groups is equal to the preset total number S of packets;
  • the grouping unit is configured to group each hash according to a preset The number of packets is grouped such that the last number of packets is equal to the number of preset hash packets, and the total number of groups divided is equal to the total number of preset packets;
  • N is determined by the storage size of the page table buffer TLB, which is a positive integer greater than or equal to 1, N contains n, and m is less than N; the value of S is determined by the size of the database cache cache, which is greater than or equal to 2. A positive integer; the priority of the preset hash packet number N and the preset total number of packets S is determined by the storage size of the TLB and the size of the cache.
  • the execution unit and the principle of the dividing unit 102 shown in FIG. 8 are divided into the above-mentioned "the vector vector is a quantity unit to divide each of the target data groups.
  • the descriptions for the multiple data segments are the same, and are not described here. They mainly include:
  • a first dividing module configured to use a vector vector as a quantity unit, a vector corresponding to a data segment, and sequentially dividing each target data group into M data segments, wherein the value of M is determined by the original data in the target data group The number, and the size of the database cache cache and the storage size of the page table buffer TLB;
  • the number of original data included in the first to the M-1th data segments is the same, the Mth The number of original data included in the data segment is less than or equal to the number of original data contained in the first to M-1 data segments.
  • the original data corresponding to the hash value with the same value in the specified bit position is divided into the same group, and each original data divided in the same group is in accordance with each original data.
  • the first hash grouping module 1031 that is sorted and saved in the same group, the specific execution process and the principle can be referred to the first hash detailed description section disclosed in the third embodiment of the present invention. There is no longer a comment here, which mainly includes: the hash value represented by the bit;
  • the first search sub-module is configured to search for each original data corresponding to the same hash value in the specified bit position in the current hash grouping process, and divide each original data into the same group, wherein, according to the size of the database cache cache and The storage size of the page table buffer TLB specifies the bit bits needed for the current hash packet;
  • a first traversal sub-module configured to traverse a subscript of each original data divided in the same group, the subscript of each original data is used to identify a location of each original data in the target data group;
  • a module configured to arrange raw data corresponding to each subscript from small to large according to the size of each subscript;
  • the first sorting sub-module is configured to write each original data into the same group and save according to the order from small to large.
  • the hash value corresponding to the same bit in the current hash grouping process is corresponding to the hash value.
  • Each of the original data is divided into the same group, and each of the original data divided in the same group is sorted and saved in the same group according to the position of each original data in the target data group. 1032, the specific implementation process and the principle can be referred to the detailed description of the multiple hash packets disclosed in the above-mentioned first embodiment to the fourth embodiment of the present invention, and details are not described herein.
  • Determining a sub-module configured to determine a bit bit to be used in a current hash packet process from the unspecified bit position of the call, where a bit number used in a current hash packet process is used According to the size of the library cache cache and the storage size of the page table buffer TLB;
  • the second search sub-module is configured to search for each original data corresponding to the hash value of the same bit position in the current hash grouping process, and divide each original data into the same group;
  • a second traversal sub-module configured to traverse a subscript of each original data divided in the same group, the subscript of each original data is used to identify a location of each original data in the target data group; a module, configured to arrange each original data corresponding to each subscript from small to large according to the size of each subscript;
  • the second sorting sub-module is configured to write each original data into the same group and save according to the order from small to large.
  • connection unit 105 can be referred to the detailed description of the Join operation in the fourth embodiment of the present invention, and details are not described herein.
  • An obtaining module configured to respectively acquire, in sequence, the two target data groups to be connected to each group after the N times hash grouping;
  • the Join module is used to perform the Join operation of the raw data in each group of the two target data groups by performing a Join operation of the original data for the pair of two groups;
  • the Join module includes:
  • a third traversal sub-module for sequentially traversing each group in another target data group by a group in a target data group; if traversing to the same group, executing the first Join sub-module; if not traversing to the same group Moving to the next group to return to the second traversal sub-module; until all groups in the target data group perform traversal operations on each of the other target data groups;
  • the first Join sub-module is configured to perform a Join operation on the original data in the group that is traversed, and the original data in the same group, wherein the same group refers to the original stored in the group.
  • the hash value of the data is the same as the hash value of the original data stored in the group for traversing; after the original data in the group has been subjected to the Join operation, moving to the next group returns to the third traversal sub-module.
  • Embodiment 5 of the present invention discloses a hash connection apparatus corresponding to the execution of the hash connection method described above. Based on the units and modules disclosed above, in the process of performing a hash grouping on a target data group, the hash is calculated in groups by a vector. The value, and then the hash value corresponding to several original data included in the same group is once written into the corresponding group. Group by vector It can avoid unnecessary cache thrashing, which can reduce the cache miss and improve the performance of Join. Moreover, the hash value of each original data is calculated only in the first grouping process, and the number of bits used in the subsequent use are recorded to the associated position of the corresponding original data for use in the subsequent grouping process, thereby eliminating duplication. Calculate the cost of the hash value and avoid waste of resources.
  • the hash connection method described in connection with the embodiments of the present disclosure can be implemented directly in hardware, in a memory executed by a processor, or a combination of both in a data management system. Accordingly, the present invention also discloses a data management system in accordance with the method and apparatus disclosed in the above embodiments of the present invention. Specific embodiments are given below for detailed description.
  • the data management system 1 includes a memory 11 and a processor 13 connected to the memory 11 via a bus 12.
  • the memory 11 has a storage medium in which a program for performing a database query is stored.
  • the memory 11 may contain high speed RAM memory and may also include non-volatile memory such as at least one disk memory.
  • the processor 13 is connected to the memory 11 via a bus 13, and the processor 13 calls the database query program stored in the memory 11 when performing a database query.
  • the database query program may include program code, and the program code includes a series of operation instructions arranged in a certain order.
  • Processor 13 may be a central processing unit CPU, or a specific integrated circuit, or one or more integrated circuits configured to implement embodiments of the present invention.
  • the program for performing data scheduling invoked by the processor 13 may specifically include:
  • N hash hash packets for each data segment in each target data group in sequence based on a preset grouping rule, wherein, in each hash packet, calculating the data segment based on the first hash packet
  • the hash value represented by the bit data obtained by the original data divides the original data corresponding to the hash value of the same bit position in the current hash grouping process into the same group, and divides the original data divided into the same group.
  • the groups are sorted according to the hash value corresponding to the original data contained in each group from small to large;
  • the Join operation is performed by taking the original data in each group obtained after the N times of the hash packets in the target data groups to be connected in order.
  • the embodiment of the present invention discloses that by performing the hash packet by using the vector as the quantity unit and using the unspecified bit in the previous hash grouping process in the subsequent grouping process, it is possible to perform hash packet processing on several original data at the same time, and multiple times.
  • it is not necessary to repeatedly calculate the hash value of the original data that is, the cache miss cache is reduced, and the hash value is repeatedly calculated to avoid waste of computing resources.
  • the original data divided in the same group is sorted, thereby achieving the purpose of reducing the complexity of sorting each group.
  • the sorting complexity is lower than the sorting complexity when the raw data is randomly assigned to join.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

L'invention concerne un procédé d'intégration de hachage, un dispositif et un système de gestion de base de données, le procédé consistant : lors de la division d'un groupe de données cible durant une interrogation de base de données, à utiliser un vecteur comme unité de quantité pour diviser et calculer la valeur de hachage des données d'origine dans un segment de données, et à représenter la valeur de hachage par des bits ; à diviser les données d'origine correspondant à la même valeur de hachage de bits spécifiés en le même groupe sur la base d'une règle de groupement préétablie dans un groupement de hachage, à continuer à exécuter un groupement de hachage dans un groupement suivant par utilisation des bits non spécifiés dans le groupement de hachage précédent, et dans le processus de groupement, selon les positions des données d'origine dans le groupe de données cible, à classer les données d'origine dans le même groupe ; et à réaliser une opération d'intégration sur les données d'origine groupées et classées à joindre aux groupes correspondants dans le groupe de données cible, permettant ainsi de réduire la complexité de classement ultérieur de chaque groupe.
PCT/CN2014/078304 2014-05-23 2014-05-23 Procédé d'intégration de hachage, dispositif et système de gestion de base de données WO2015176315A1 (fr)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/CN2014/078304 WO2015176315A1 (fr) 2014-05-23 2014-05-23 Procédé d'intégration de hachage, dispositif et système de gestion de base de données
CN201480037464.8A CN105359142B (zh) 2014-05-23 2014-05-23 哈希连接方法和装置

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2014/078304 WO2015176315A1 (fr) 2014-05-23 2014-05-23 Procédé d'intégration de hachage, dispositif et système de gestion de base de données

Publications (1)

Publication Number Publication Date
WO2015176315A1 true WO2015176315A1 (fr) 2015-11-26

Family

ID=54553263

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2014/078304 WO2015176315A1 (fr) 2014-05-23 2014-05-23 Procédé d'intégration de hachage, dispositif et système de gestion de base de données

Country Status (2)

Country Link
CN (1) CN105359142B (fr)
WO (1) WO2015176315A1 (fr)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019070340A1 (fr) * 2017-10-06 2019-04-11 Microsoft Technology Licensing, Llc Opération de jointure et interface pour caractères de remplacement
CN111026720A (zh) * 2019-12-20 2020-04-17 深信服科技股份有限公司 一种文件处理方法、系统及相关设备
CN111125011A (zh) * 2019-12-20 2020-05-08 深信服科技股份有限公司 一种文件处理方法、系统及相关设备

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106326475B (zh) * 2016-08-31 2019-12-27 中国科学院信息工程研究所 一种高效的静态哈希表实现方法及系统
CN108549666B (zh) * 2018-03-22 2021-05-04 上海达梦数据库有限公司 一种数据表的排序方法、装置、设备及存储介质
CN117891414A (zh) * 2024-03-14 2024-04-16 支付宝(杭州)信息技术有限公司 一种基于完美哈希的数据存储方法及相关设备

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080162410A1 (en) * 2006-12-27 2008-07-03 Motorola, Inc. Method and apparatus for augmenting the dynamic hash table with home subscriber server functionality for peer-to-peer communications
CN101593202A (zh) * 2009-01-14 2009-12-02 中国人民解放军国防科学技术大学 基于共享Cache多核处理器的数据库哈希连接方法
CN102508924A (zh) * 2011-11-22 2012-06-20 上海达梦数据库有限公司 一种使用归并连接实现优美哈希连接的方法
US20130173589A1 (en) * 2011-12-29 2013-07-04 Yu Xu Techniques for optimizing outer joins

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080162410A1 (en) * 2006-12-27 2008-07-03 Motorola, Inc. Method and apparatus for augmenting the dynamic hash table with home subscriber server functionality for peer-to-peer communications
CN101593202A (zh) * 2009-01-14 2009-12-02 中国人民解放军国防科学技术大学 基于共享Cache多核处理器的数据库哈希连接方法
CN102508924A (zh) * 2011-11-22 2012-06-20 上海达梦数据库有限公司 一种使用归并连接实现优美哈希连接的方法
US20130173589A1 (en) * 2011-12-29 2013-07-04 Yu Xu Techniques for optimizing outer joins

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019070340A1 (fr) * 2017-10-06 2019-04-11 Microsoft Technology Licensing, Llc Opération de jointure et interface pour caractères de remplacement
US11010387B2 (en) 2017-10-06 2021-05-18 Microsoft Technology Licensing, Llc Join operation and interface for wildcards
CN111026720A (zh) * 2019-12-20 2020-04-17 深信服科技股份有限公司 一种文件处理方法、系统及相关设备
CN111125011A (zh) * 2019-12-20 2020-05-08 深信服科技股份有限公司 一种文件处理方法、系统及相关设备
CN111026720B (zh) * 2019-12-20 2023-05-12 深信服科技股份有限公司 一种文件处理方法、系统及相关设备
CN111125011B (zh) * 2019-12-20 2024-02-23 深信服科技股份有限公司 一种文件处理方法、系统及相关设备

Also Published As

Publication number Publication date
CN105359142B (zh) 2019-04-05
CN105359142A (zh) 2016-02-24

Similar Documents

Publication Publication Date Title
WO2015176315A1 (fr) Procédé d'intégration de hachage, dispositif et système de gestion de base de données
US8832350B2 (en) Method and apparatus for efficient memory bank utilization in multi-threaded packet processors
CN110399535B (zh) 一种数据查询方法、装置及设备
US9871727B2 (en) Routing lookup method and device and method for constructing B-tree structure
CN108363621B (zh) numa架构下的报文转发方法、装置、存储介质及电子设备
JP2005235228A5 (fr)
US8423499B2 (en) Search device and search method
US10049035B1 (en) Stream memory management unit (SMMU)
WO2019029236A1 (fr) Procédé d'attribution de mémoire et serveur
Tang et al. A data skew oriented reduce placement algorithm based on sampling
Chen et al. Fpga-accelerated samplesort for large data sets
US20160132559A1 (en) Tcam-based table query processing method and apparatus
Li et al. High performance MPI datatype support with user-mode memory registration: Challenges, designs, and benefits
Jeong et al. REACT: Scalable and high-performance regular expression pattern matching accelerator for in-storage processing
CN110008030B (zh) 一种元数据访问的方法、系统及设备
López-Ortiz et al. Paging for multi-core shared caches
WO2013185660A1 (fr) Dispositif et procédé de stockage d'instructions de processeur de réseau
WO2015032214A1 (fr) Procédé et dispositif de recherche de routage à haute vitesse prenant simultanément en charge ipv4 et ipv6
CN113377689A (zh) 一种路由表项查找、存储方法及网络芯片
CN111126619B (zh) 一种机器学习方法与装置
US8332595B2 (en) Techniques for improving parallel scan operations
US20200097297A1 (en) System and method for dynamic determination of a number of parallel threads for a request
CN112506813B (zh) 一种内存管理方法和系统
Que et al. Exploring network optimizations for large-scale graph analytics
US7302524B2 (en) Adaptive thread ID cache mechanism for autonomic performance tuning

Legal Events

Date Code Title Description
WWE Wipo information: entry into national phase

Ref document number: 201480037464.8

Country of ref document: CN

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 14892265

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 14892265

Country of ref document: EP

Kind code of ref document: A1