CN110825921A

CN110825921A - Method for solving Hash collision

Info

Publication number: CN110825921A
Application number: CN201911107261.2A
Authority: CN
Inventors: 陈晖�; 张晓峰; 陈伟峰; 王东锋
Original assignee: Tianjin Optical Electrical Communication Technology Co Ltd
Current assignee: Tianjin Optical Electrical Communication Technology Co Ltd
Priority date: 2019-11-13
Filing date: 2019-11-13
Publication date: 2020-02-21

Abstract

The invention discloses a method for solving Hash collision. According to the method, another hash function is used for calculating the rule again, and the rule is stored in a new hash table, so that the problem of rule collision coverage in the original hash table caused by only one-time hash calculation is solved. And when the key is searched and matched later, carrying out hash calculation on the key twice, searching in the two hash tables, and taking or obtaining the two search results to obtain a final result. In this way, although the original hash table does not search the corresponding rule due to the collision problem, the new hash table searches the corresponding rule, and the final result of the sum of the search results of the two hash tables is that the expected rule is searched, that is, the result is a hit. The method provided by the invention can well solve the problem of hash collision and provides beneficial reference for realizing the search matching technology based on the hash.

Description

Method for solving Hash collision

Technical Field

The invention relates to the field of data search matching, in particular to a method for solving hash collision, which is used for solving the collision problem when data search matching is realized based on hash.

Background

When the data search is matched, the received data is compared with the known database rule to achieve the purpose of matching and screening. The hash table has the advantages of high searching speed and capability of storing a large number of rules, and is widely applied to the technical field of data search and matching. However, when the hash table is used for data searching, the hash collision problem cannot be avoided. The hash collision is to different rules or keys, and when the same hash function is used for calculation, the hash results may be the same, that is, all the hash results are mapped to the same position of the hash table, so that the collision is generated. A common method for processing hash collision is to reserve a certain collision depth interval for each hash calculation result, and store different rules to different positions of the collision depth interval when a collision occurs, thereby avoiding the problem of rule coverage. When searching and matching the keys, traversing each rule in the collision depth interval to see whether the rule is equal to the keys to be searched, thereby obtaining whether a hit result is obtained. This implementation is most straightforward, but the disadvantage is also obvious, that is, due to the existence of the collision depth interval, when the key search is matched, each rule in the collision depth interval is traversed, which may cause the search performance to be degraded. In addition, if the collision depth interval is too large, the whole hash table becomes large, and the required storage space is large, which is unacceptable for some applications with tight storage space.

Disclosure of Invention

In view of the problems of the above technology, the present invention provides a method for solving hash collision. The invention aims to solve the problems of reduced hash table searching performance and overlarge storage space in the prior art.

The technical scheme adopted by the invention is as follows: a method for solving Hash collision is realized on a hardware platform based on FPGA, and is characterized by comprising the following steps:

if two rules, namely rule _0 and rule _1, need to be stored in a hash table, the corresponding hash function is hash, if the hash (rule _0) and the hash (rule _1) result are equal after the hash operation, the rule of rule _1 will override the rule of rule _0, and if the key _0 corresponding to rule _0 needs to be matched, then the key _0 will not be successfully matched;

therefore, another hash function is marked as hash _ s, the rule _0 and the rule _1 are calculated again, at this time, the results of the hash _ s (rule _0) and the hash _ s (rule _1) are different, that is, both the two rules are stored in a new hash table, and the collision coverage condition does not occur, if the key _0 corresponding to the rule _0 needs to be matched, the hash table corresponding to the hash function hash will not hit, the hash table corresponding to the hash function hash _ s can hit, and the two hit results are taken or the final key _0 obtains the expected hit result.

The beneficial effects produced by the invention are as follows: by adopting the method of calculating twice by adopting different hash functions for the same rule, when the first hash function is collided, because the second hash function and the first hash function have different calculation modes, the probability of collision generated when the second hash function processes the same rule is very low. The method provided by the invention has wide application value in the field of data search and matching.

Drawings

FIG. 1 is a schematic diagram of a hash collision;

fig. 2 is a schematic diagram of the present invention for solving hash collision.

Detailed Description

The invention is further described below with reference to the accompanying drawings:

FIG. 1 is a schematic diagram of hash collision, in which rules rule _0 and rule _1 are stored in a hash table in advance, and then key _0 corresponding to rule _0 is used to perform a search and match on the hash table. As can be seen from fig. 1, since the calculated values of the hash (rule _0) and the hash (rule _1) are the same, i.e. both map to the same address addr _ r in the hash table, so that a hash collision occurs, the rule _1 covers the rule _ 0. Therefore, when key _0 is searched in the hash table, the hash (key _0) is mapped to the address addr _ r in the hash table, and the rule extracted from the hash table is rule _1, which is not the expected rule _0, so that the search result is miss, i.e. no hit.

FIG. 2 illustrates how the present invention solves the hash collision, and the rule _0 and rule _1 are calculated again by using another hash function, i.e. hash _ s, so that the rule _0 and rule _1 are also stored in another new hash table, i.e. hash _ s table. As can be seen from fig. 2, the new hash function calculation value hash _ s (rule _0) is different from hash _ s (rule _1), i.e. two rules are mapped to different locations in the hash _ s table, rule _0 is mapped to addr _ m in the hash _ s table, and rule _1 is mapped to addr _ n in the hash _ s table. It can be seen that no case of regular collision coverage occurs in the new hash table hash _ table. Similarly, when key _0 is searched in the hash table of the original hash table, the obtained result is miss, and when key _0 is searched in the hash table of the new hash table, since hash _ s (key _0) is also mapped to addr _ m in the hash table, the read rule is rule _0, which is a desired rule, the result of the search in the hash table of the new hash table is match, that is, the result is hit. And taking or operation is carried out on the results of the two hash table searches, and the total search result is a hit.

It can be seen from the above method for processing hash collision that the main idea of the present invention is to solve the collision problem of the original hash function by using different hash functions, and the probability of collision between the two hash functions is very small, which solves the collision problem of hash to a certain extent. As can be seen from the above explanation with reference to the drawings, since another new hash function is used, a new hash table with the same size is used more, that is, the storage space of the hash table is doubled, which is much less than the storage space consumed by the method that usually adopts the collision depth interval to solve the hash collision. In addition, when the key searches in the hash table, two hash tables are simultaneously searched, and the searching performance is the same as that of only one hash table, namely, no extra clock period is consumed for searching for matching. The method provided by the invention can well solve the problem of hash collision and provides beneficial reference for the search matching technology realized based on the hash table.

Claims

1. A method for solving Hash collision is realized on a hardware platform based on FPGA, and is characterized by comprising the following steps: