JP2020119363A

JP2020119363A - Retrieval device and method for creating hash table

Info

Publication number: JP2020119363A
Application number: JP2019011155A
Authority: JP
Inventors: 斉金子; Hitoshi Kaneko; 雅幸西木; Masayuki Nishiki
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2019-01-25
Filing date: 2019-01-25
Publication date: 2020-08-06
Also published as: WO2020153154A1

Abstract

To reduce occurrence of hash collision even when an original space of a hash method is large.SOLUTION: A retrieval device includes: an input unit for receiving an input of a data group to be registered in a table; a division unit for dividing each input data group into a plurality of sections to generate divided data with a predetermined number of bits; and a table creation unit that creates a hash table that associates a hash value of the prescribed number of bits with a NULL as a pointer at the time of registration of the data group, updates the pointer associated with the hash value of the divided data in the created hash table from NULL to a pointer to the hash table of the divided data following the divided data in data of the division source of the divided data, and reduces a total amount of used memories of each of the hash tables by constructing a tree whose node is each of the hash table to reduce hash collision in the hash table.SELECTED DRAWING: Figure 1

Description

本発明は、検索装置、および、ハッシュテーブルの作成方法に関する。 The present invention relates to a search device and a hash table creation method.

従来、パケットの転送等において、どのパケットにどのruleを適用するかの検索にハッシュ法が用いられていた。このハッシュ法は高速な検索を実現する技術である。 Conventionally, in packet transfer and the like, a hash method has been used to search which rule applies to which packet. This hash method is a technology that realizes high-speed search.

ALAXALA、［平成31年1月13日検索］、インターネット＜URL：https://www.nic.ad.jp/ja/materials/iw/2012/proceedings/d1/＞ALAXALA, [January 13, 2019 search], Internet <URL: https://www.nic.ad.jp/en/materials/iw/2012/proceedings/d1/>

ここで、ハッシュ法の原空間が大きい場合、メモリ等のリソース上の問題から、ハッシュ空間の縮尺度を高めなければならず、その縮尺度に応じてハッシュ衝突のリスクが生じる。 Here, when the original space of the hash method is large, the reduction scale of the hash space has to be increased due to a resource problem such as memory, and there is a risk of hash collision depending on the reduction scale.

例えば、検索に用いられるハッシュ空間をＮbitとし、ハッシュ関数として恒等関数（y=x）を使用し、64bitマシンを用いる場合、ハッシュテーブルには、2^N×64(bit)のメモリリソースが必要である。ここで、ハッシュテーブルのメモリリソースを16GByte以内に抑えたい場合、ハッシュ空間は31bit以下である必要がある。よって、ハッシュ法の原空間が64bitの場合、縮尺度が1/（2³³）のハッシュ関数を選択しなければならない。その結果、登録対象のrule（データ）の数が10億以下であれば、すべてのデータが同じハッシュ値にマッピングされる場合もあり、その場合、すべての登録データを線形検索しなければならない。例えば、非特許文献１に記載の技術においても、ハッシュ空間を縮尺するため、複数のデータが１つのハッシュ値にマッピングされ、固定化されず、再度検索する必要が出てくる。 For example, when the hash space used for search is Nbit, the identity function (y=x) is used as the hash function, and a 64-bit machine is used, the hash table requires 2 ^N × 64 (bit) memory resources. Is. Here, in order to suppress the memory resource of the hash table within 16 GByte, the hash space must be 31 bits or less. Therefore, when the original space of the hash method is 64 bits, a hash function with a reduction scale of 1/(2 ³³ ) must be selected. As a result, if the number of rules (data) to be registered is 1 billion or less, all the data may be mapped to the same hash value. In that case, all registered data must be linearly searched. For example, even in the technique described in Non-Patent Document 1, since the hash space is scaled down, a plurality of data are mapped to one hash value, are not fixed, and need to be searched again.

そこで、本発明は、前記した問題を解決し、ハッシュ法の原空間が大きい場合であってもハッシュ衝突の発生を低減することを課題とする。 Therefore, an object of the present invention is to solve the above-mentioned problems and reduce the occurrence of hash collision even when the original space of the hash method is large.

前記した課題を解決するため、本発明は、テーブルへの登録対象のデータ群の入力を受け付ける入力部と、入力されたデータ群それぞれを複数の区間に分割することにより所定のbit数の分割データを生成する分割部と、前記データ群の登録時に、前記所定のbit数のハッシュ値とポインタとしてNULLとを対応付けたハッシュテーブルを作成し、作成したハッシュテーブルにおいて前記分割データのハッシュ値に対応付けられるポインタをNULLから、当該分割データの分割元のデータにおいて当該分割データに続く分割データのハッシュテーブルへのポインタに更新し、前記ハッシュテーブルそれぞれをノードとするツリーを構築することにより、前記ハッシュテーブルそれぞれの使用メモリの総量を低減し、前記ハッシュテーブルにおけるハッシュ衝突を低減するテーブル作成部と、を備えることを特徴とする。 In order to solve the above-mentioned problems, the present invention provides an input unit that receives an input of a data group to be registered in a table, and divided data of a predetermined number of bits by dividing each input data group into a plurality of sections. And a dividing unit that generates a hash table that associates the hash value of the predetermined number of bits with NULL as a pointer when registering the data group, and corresponds to the hash value of the divided data in the created hash table. By updating the pointer attached from NULL to a pointer to the hash table of the divided data following the divided data in the original data of the divided data, and constructing a tree with each of the hash tables as a node, the hash A table creating unit that reduces the total amount of memory used for each table and reduces hash collisions in the hash table.

本発明によれば、ハッシュ法の原空間が大きい場合であってもハッシュ衝突の発生を低減することができる。 According to the present invention, it is possible to reduce the occurrence of hash collision even when the original space of the hash method is large.

図１は、第１の実施形態の検索装置が構築するハッシュテーブルのツリーの一例を示す図である。FIG. 1 is a diagram illustrating an example of a tree of a hash table constructed by the search device according to the first embodiment. 図２は、各実施形態の検索装置の構成例を示す図である。FIG. 2 is a diagram illustrating a configuration example of the search device according to each embodiment. 図３は、第２の実施形態の検索装置の効果を説明するための図である。FIG. 3 is a diagram for explaining the effect of the search device according to the second embodiment. 図４は、第３の実施形態の検索装置の効果を説明するための図である。FIG. 4 is a diagram for explaining the effect of the search device according to the third embodiment. 図５は、第４の実施形態の検索装置が作成するハッシュテーブルのツリーの使用メモリ量の最大値を説明するための図である。FIG. 5 is a diagram for explaining the maximum value of the used memory amount of the tree of the hash table created by the search device according to the fourth embodiment. 図６は、各実施形態の検索装置の機能を実現するプログラムを実行するコンピュータの一例を示す図である。FIG. 6 is a diagram illustrating an example of a computer that executes a program that implements the functions of the search device according to each embodiment.

以下、図面を参照しながら、本発明を実施するための形態（実施形態）を第１の実施形態から第４の実施形態に分けて説明する。本発明は、各実施形態に限定されない。 Hereinafter, modes (embodiments) for carrying out the present invention will be described separately from the first embodiment to the fourth embodiment with reference to the drawings. The present invention is not limited to each embodiment.

［第１の実施形態］
［ハッシュテーブルのツリーの例］
まず、図１を用いて、第１の実施形態におけるハッシュテーブルのツリーの構築例を説明する。ここではデータの検索に用いられるハッシュ空間（原空間）が64bitである場合を例に説明する。本実施形態の検索装置は、例えば、検索に用いられる64bitのハッシュ空間を４つに分割し、16bitのハッシュ空間を示すハッシュテーブルをノードとするツリーを構築する。 [First Embodiment]
[Example of tree of hash table]
First, an example of building a tree of a hash table according to the first embodiment will be described with reference to FIG. Here, a case where the hash space (original space) used for data search is 64 bits will be described as an example. The search device according to the present embodiment divides a 64-bit hash space used for search into four, and constructs a tree having a hash table indicating a 16-bit hash space as a node.

この検索装置は、ハッシュテーブルへの登録対象のデータ（以下、適宜「rule」と称す）群の入力を受け付けると、登録に必要なハッシュテーブルを新規に作成する。ここでは、ハッシュテーブルに用いるハッシュ関数として、恒等関数（y=x）を用いる場合を例に説明する。 When this search device receives an input of a group of data to be registered (hereinafter appropriately referred to as “rule”) in the hash table, it newly creates a hash table necessary for registration. Here, a case where an identity function (y=x) is used as the hash function used in the hash table will be described as an example.

また、本実施形態におけるハッシュテーブルは、ハッシュ値とポインタとを対応付けたテーブルである。なお、図１に示すハッシュテーブル１０４，１０６，１０９，１１０のハッシュ値には、bool（false/true）が対応付けられているが、ポインタを対応付けてもよい。 Further, the hash table in this embodiment is a table in which hash values and pointers are associated with each other. Note that bool (false/true) is associated with the hash values of the hash tables 104, 106, 109, and 110 shown in FIG. 1, but a pointer may be associated therewith.

例えば、検索装置が、図１に示す（１）〜（４）のデータを４つの区間（区間１１１〜１１４）に分割し、分割されたデータ（分割データ）を16bitのハッシュテーブル１０１〜１１０に登録する。具体的には、検索装置は、ハッシュテーブル１０１において（１）〜（４）のデータに登場する分割データ（＝ハッシュ値）に対応するポインタを「NULL」から「NP（ネクストポインタ）」に更新する。そして、このNPに対応するハッシュテーブル（例えば、ハッシュテーブル１０２）のハッシュ値のうち、上記の分割データに続く分割データに対応するポインタを「NULL」から「NP」に更新する。 For example, the search device divides the data (1) to (4) shown in FIG. 1 into four sections (sections 111 to 114), and divides the divided data (divided data) into 16-bit hash tables 101 to 110. sign up. Specifically, the search device updates the pointer corresponding to the divided data (=hash value) appearing in the data of (1) to (4) in the hash table 101 from “NULL” to “NP (next pointer)”. To do. Then, among the hash values of the hash table (for example, the hash table 102) corresponding to this NP, the pointer corresponding to the divided data following the above divided data is updated from “NULL” to “NP”.

例えば、検索装置は、ハッシュテーブル１０１を作成し、このハッシュテーブル１０１における分割データ「２００」および「３００」のポインタを「NULL」から「NP」に更新する。次に、検索装置は、ハッシュテーブル１０２を作成すると、このハッシュテーブル１０２において、上記の分割データ「２００」に続く分割データ「３００」および「６００」のポインタを「NULL」から「NP」に更新する。検索装置は、上記の処理を（１）〜（４）のデータそれぞれについて実行することにより、（１）〜（４）のデータをハッシュテーブル１０１〜１１０に登録する。その結果、ハッシュテーブル１０１〜１１０をノードとし、ノード同士がNPにより接続されたハッシュテーブルのツリーが構築される。 For example, the search device creates the hash table 101 and updates the pointers of the divided data “200” and “300” in the hash table 101 from “NULL” to “NP”. Next, when the search device creates the hash table 102, the pointers of the divided data “300” and “600” following the divided data “200” in the hash table 102 are updated from “NULL” to “NP”. To do. The search device registers the data (1) to (4) in the hash tables 101 to 110 by executing the above-described processing for each of the data (1) to (4). As a result, a hash table tree in which the hash tables 101 to 110 are used as nodes and the nodes are connected by NPs is constructed.

検索装置が、上記のハッシュテーブルのツリーを構築した後、例えば、（１）のデータの検索要求を受け付けた場合、図１に示すハッシュテーブルのツリーを辿ることにより、（１）のデータを検索する。検索の結果、（１）のデータを発見すると、検索装置はtrueを返し、（１）のデータを発見できなかった場合、検索装置はfalseを返す。 After the search device constructs the above-mentioned hash table tree, for example, when a search request for the data of (1) is received, the search of the data of (1) is performed by tracing the tree of the hash table shown in FIG. To do. As a result of the search, when the data of (1) is found, the search device returns true, and when the data of (1) cannot be found, the search device returns false.

上記のように、検索装置は、例えば、検索対象となるハッシュ空間を分割し、分割したハッシュ空間を示すハッシュテーブルのツリーを構築する。また、検索装置は、データの登録時に必要なハッシュテーブルを作成する。その結果、検索対象となるハッシュ空間が比較的大きい場合であっても、ハッシュテーブルの使用メモリ量を大幅に低減することができる。これにより、検索装置は、ハッシュテーブルに用いるハッシュ値を縮尺しないでよい、あるいは、ハッシュ値を縮尺する場合であっても縮尺度を小さくすることができるので、ハッシュ衝突の発生を低減することができる。また、例えば、検索装置において、各ハッシュテーブルに使用することができるメモリ量の総量（つまり、rule登録に使用することができるメモリ量の総量）が所定値（例えば、16GByte）以内に限られている場合であっても、使用メモリ量を上記の所定値以内に収めることができる。 As described above, the search device divides the hash space to be searched, for example, and constructs a tree of a hash table indicating the divided hash space. Further, the search device creates a hash table required when registering the data. As a result, even when the hash space to be searched is relatively large, the amount of memory used in the hash table can be significantly reduced. With this, the search device does not have to scale the hash value used in the hash table, or the scale can be reduced even when the hash value is scaled, so that the occurrence of hash collision can be reduced. it can. Further, for example, in the search device, the total amount of memory that can be used for each hash table (that is, the total amount of memory that can be used for rule registration) is limited to within a predetermined value (for example, 16 GBytes). Even if there is, the used memory amount can be kept within the above-mentioned predetermined value.

［構成］
次に、図２を用いて第１の実施形態の検索装置１０の構成例を説明する。検索装置１０は、入力部１１、制御部１２、記憶部１３、出力部１４を備える。 [Constitution]
Next, a configuration example of the search device 10 according to the first embodiment will be described with reference to FIG. The search device 10 includes an input unit 11, a control unit 12, a storage unit 13, and an output unit 14.

入力部１１および出力部１４は、例えば、検索装置１０の備える入出力インタフェースにより、実現される。制御部１２は、例えば、検索装置１０の備えるＣＰＵによるプログラム実行処理により実現される。記憶部１３は、例えば、検索装置１０の備えるメインメモリおよびハードディスク装置により実現される。 The input unit 11 and the output unit 14 are realized by, for example, an input/output interface included in the search device 10. The control unit 12 is realized by, for example, a program execution process performed by a CPU included in the search device 10. The storage unit 13 is realized by, for example, a main memory and a hard disk device included in the search device 10.

入力部１１は、各種データの入力を受け付ける。例えば、入力部１１は、ハッシュテーブルへの登録対象となるデータ群の入力を受け付ける。制御部１２は、検索装置１０全体の制御を司り、例えば、分割部１２１と、テーブル作成部１２２と、検索部１２３とを備える。破線で示す分割制御部１２４および指示部１２５は装備される場合と装備されない場合とがあり、装備される場合については、後記する。 The input unit 11 receives input of various data. For example, the input unit 11 receives an input of a data group to be registered in the hash table. The control unit 12 controls the entire search device 10, and includes, for example, a division unit 121, a table creation unit 122, and a search unit 123. The division control unit 124 and the instruction unit 125 indicated by broken lines may or may not be equipped, which will be described later.

分割部１２１は、入力されたデータ群それぞれを複数の区間に分割することにより所定のbit数の分割データを生成する。 The dividing unit 121 generates divided data having a predetermined number of bits by dividing each input data group into a plurality of sections.

テーブル作成部１２２は、登録対象のデータの登録先となるハッシュテーブルを作成し、作成したハッシュテーブルにデータを登録する。具体的には、テーブル作成部１２２は、上記の所定のbit数のハッシュ値とポインタとを対応付けたハッシュテーブルを作成する。そして、テーブル作成部１２２は、作成したハッシュテーブルに、分割部１２１により生成された分割データを登録する。例えば、テーブル作成部１２２は、作成したハッシュテーブルにおいて、当該分割データのハッシュ値に対応付けられるポインタを、当該分割データの分割元のデータにおいて当該分割データに続く分割データのハッシュテーブルへのポインタ（上記のNP）に更新する。テーブル作成部１２２は、このような処理を、登録対象となるデータの末尾の区間の分割データまで繰り返す。これにより、検索対象となるハッシュテーブルがポインタ（上記のNP）により接続された、ハッシュテーブルのツリーが構築される。 The table creation unit 122 creates a hash table that is a registration destination of the data to be registered, and registers the data in the created hash table. Specifically, the table creation unit 122 creates a hash table in which the above-mentioned hash value having a predetermined number of bits and a pointer are associated with each other. Then, the table creation unit 122 registers the split data created by the split unit 121 in the created hash table. For example, the table creation unit 122 sets a pointer associated with the hash value of the divided data in the created hash table to a pointer to a hash table of divided data that follows the divided data in the original data of the divided data ( Update to NP) above. The table creation unit 122 repeats such processing until the divided data in the last section of the data to be registered. As a result, a hash table tree in which the hash tables to be searched are connected by the pointer (NP described above) is constructed.

検索部１２３は、テーブル作成部１２２により作成されたハッシュテーブルのツリーを用いて、検索対象のデータを検索し、出力部１４経由で検索結果を出力する。 The search unit 123 searches the search target data using the tree of the hash table created by the table creation unit 122, and outputs the search result via the output unit 14.

記憶部１３は、制御部１２が動作するために必要な各種データを記憶する。例えば、記憶部１３は、ハッシュ空間の分割数等を記憶する。この分割数は、例えば、入力部１１からの指示入力に基づき適宜変更可能である。また、記憶部１３は、テーブル作成部１２２により作成されたハッシュテーブルを記憶する。 The storage unit 13 stores various data necessary for the control unit 12 to operate. For example, the storage unit 13 stores the number of divisions of the hash space and the like. The number of divisions can be appropriately changed based on an instruction input from the input unit 11, for example. The storage unit 13 also stores the hash table created by the table creation unit 122.

出力部１４は、制御部１２による処理結果を出力する。例えば、出力部１４は、検索部１２３による検索結果を出力する。 The output unit 14 outputs the processing result of the control unit 12. For example, the output unit 14 outputs the search result by the search unit 123.

このような検索装置１０によれば、例えば、64bitのハッシュ空間を４分割し、10000ruleを登録した場合でも、メモリ使用量は最大で15.7GByte程度である。つまり、上記の場合、ハッシュテーブルの数は最大で1＋10000+10000+10000=30001であるので、メモリ使用量を、30001（テーブル数）×65536（1テーブルあたりのindex数）×8Byte（1indexあたりの容量）≒15.7GByte以内に収めることができる。 According to such a search device 10, for example, even if the 64-bit hash space is divided into four and 10,000 rules are registered, the maximum memory usage is about 15.7 GByte. In other words, in the above case, the maximum number of hash tables is 1+10000+10000+10000=30001, so the memory usage is 30001 (the number of tables) x 65536 (the number of indexes per table) x 8 Bytes (per index) It can be stored within the capacity) ≒ 15.7 GByte.

このように検索装置１０は、登録対象のruleに関するハッシュテーブルのみを新規に作成するので、ハッシュテーブルに必要なメモリ領域を低減できる。その結果、ハッシュ関数として恒等関数（y=x）を用いやすくなる。また、従来技術のようにハッシュ空間を分割せずに上記の10000ruleを登録する場合、ハッシュ空間の縮尺度を高める必要がある。よって、ruleの検索時に最大で10000回のハッシュ衝突が発生するので、検索時間が長くなってしまうおそれがある。一方、検索装置１０の場合、ruleの検索時に、（ハッシュ空間の分割数−1）回のNULL判定が発生するものの、ハッシュ空間の縮尺度を高める必要がなくなる。その結果、ハッシュ衝突の発生を低減することができるので、検索時間が長くなることを防止することができる。 In this way, the search device 10 newly creates only the hash table relating to the rule to be registered, so that the memory area required for the hash table can be reduced. As a result, it becomes easy to use the identity function (y=x) as the hash function. Further, when the above 10,000 rules are registered without dividing the hash space as in the prior art, it is necessary to increase the reduction scale of the hash space. Therefore, a maximum of 10,000 hash collisions occur when searching for rules, which may increase search time. On the other hand, in the case of the search device 10, when the rule is searched, the NULL determination occurs (the number of divisions of the hash space-1) times, but it is not necessary to increase the reduction scale of the hash space. As a result, since the occurrence of hash collision can be reduced, it is possible to prevent the search time from increasing.

［第２の実施形態］
次に、第２の実施形態の検索装置１０を説明する。第１の実施形態と同じ構成は同じ符号を付して説明を省略する。 [Second Embodiment]
Next, the search device 10 according to the second embodiment will be described. The same components as those in the first embodiment are designated by the same reference numerals and the description thereof will be omitted.

第２の実施形態の検索装置１０のテーブル作成部１２２は、分割部１２１において、登録対象のrule群それぞれを所定のbit数の区間に区切った分割データを生成した後、区間ごとに、当該区間に属する分割データ群の種類を集計する。そして、テーブル作成部１２２は、ハッシュテーブルのツリーを構築する際、集計した分割データの種類が少ない区間の分割データのハッシュテーブルほど、ツリーのルート側に配置し、集計した分割データの種類が多い区間の分割データのハッシュテーブルほど、ツリーの末端側に配置する。このようにすることで、検索装置１０は、ハッシュテーブルのツリーの使用メモリ量を低減することができる。その結果、検索装置１０は、ハッシュテーブルのツリーに登録可能なrule数を増加させることができる。例えば、検索装置１０においてrule登録に使用することができるメモリ量の総量が所定値（例えば、16GByte）以内に限られている場合あっても、より多くのruleを登録することができる。 The table creation unit 122 of the search device 10 according to the second embodiment, in the dividing unit 121, generates divided data in which each rule group to be registered is divided into sections of a predetermined number of bits, and then the section is divided into sections. The types of divided data groups belonging to are totaled. Then, when the tree of the hash table is constructed, the table creation unit 122 arranges the hash table of the divided data in the section in which the type of the divided data that has been aggregated is smaller on the root side of the tree, and the type of the aggregated data that has been aggregated is larger. The hash table of the divided data of the section is arranged at the end side of the tree. By doing so, the search device 10 can reduce the memory usage of the tree of the hash table. As a result, the search device 10 can increase the number of rules that can be registered in the tree of the hash table. For example, even if the total amount of memory that can be used for rule registration in the search device 10 is limited to within a predetermined value (for example, 16 GBytes), more rules can be registered.

例えば、図１の（１）〜（４）に示すruleを４つの区間（区間１１１，１１２，１１３，１１４）に区切った場合、分割データの種類が最も少ないのは区間１１１であり（２種類）、２番目に分割データの種類が少ないのは区間１１２，１１４であり（３種類）、最も分割データの種類が多いのは区間１１３である（４種類）。 For example, when the rule shown in (1) to (4) of FIG. 1 is divided into four sections (sections 111, 112, 113, 114), the section 111 has the smallest kind of divided data (2 kinds. ) Sections 112 and 114 have the second smallest kind of divided data (3 kinds), and section 113 has the largest kind of divided data (4 kinds).

よって、テーブル作成部１２２は、テーブル作成部１２２は区間１１１の分割データのハッシュテーブルをツリーのルート側に配置し、区間１１３の分割データのハッシュテーブルをツリーの末端側に配置し、区間１１２，１１４の分割データのハッシュテーブルをツリーのルートと末端との間に配置する。 Therefore, the table creation unit 122 arranges the hash table of the divided data of the section 111 on the root side of the tree, and the hash table of the divided data of the section 113 on the end side of the tree. A hash table of 114 divided data is arranged between the root and the end of the tree.

また、検索装置１０は64bitのデータを４分割した際の区間ごとの分割データの種類数がそれぞれ1、10、1000、50000である10万ruleを登録する場合の使用メモリ量の計算例を、図３を参照しながら説明する。 In addition, the search device 10 calculates an example of the amount of memory used when registering 100,000 rules in which the number of types of divided data for each section when 64-bit data is divided into four is 1, 10, 1000, and 50000, respectively. This will be described with reference to FIG.

例えば、分割データの種類数が「50000」の区間における分割データ群のハッシュテーブルをツリーのルート側に配置し、ツリーの末端方向に向かって、種類数の昇順に、種類数が「1000」の区間における分割データのハッシュテーブル→種類数が「10」の区間における分割データのハッシュテーブル→種類数が「1」の区間における分割データのハッシュテーブルという順に配置した場合を考える。この場合、各区間におけるテーブル数の最大値は、図３（ａ）に示すように1、50000、100000、100000なので、ハッシュテーブルの数の合計は最大で1+50000+100000+100000=250001である。よって、ハッシュテーブルのツリーの使用メモリ量は最大で約131.1GByteで、16GByte以内に収まらない可能性がある。 For example, the hash table of the divided data group in the section where the number of types of divided data is “50000” is arranged on the root side of the tree, and the number of types is “1000” in ascending order of the number of types toward the end of the tree. Consider a case where a hash table of divided data in a section→a hash table of divided data in a section of which the number of types is “10”→a hash table of divided data in a section of which the number of types is “1” are arranged in this order. In this case, the maximum number of tables in each section is 1, 50000, 100000, 100000, as shown in FIG. 3A, so the total number of hash tables is 1+50000+100000+100000=250001 at maximum. is there. Therefore, the maximum memory usage of the tree of the hash table is about 131.1 GByte, which may not be within 16 GByte.

一方、検索装置１０のテーブル作成部１２２は、分割データの種類数が「1」の区間における分割データのハッシュテーブルをツリーのルート側に配置し、ツリーの末端方向に向かって、種類数の昇順に、種類数が「10」の区間における分割データのハッシュテーブル→種類数が「1000」の区間における分割データのハッシュテーブル→種類数が「50000」の区間における分割データのハッシュテーブルという順に配置する。この場合、各区間におけるテーブル数の最大値は、図３（ｂ）に示すように1、1、10、10000なので、ハッシュテーブルの数の合計は最大で1+1+10+10000=10011である。よって、ハッシュテーブルのツリーの使用メモリ量は最大で約5.2GByteで、16GByte以内に収まる。 On the other hand, the table creation unit 122 of the search device 10 arranges the hash table of the divided data in the section where the number of types of the divided data is “1” on the root side of the tree, and the ascending order of the number of types toward the end of the tree. In this order, the hash table of the divided data in the section where the number of types is "10" → the hash table of the divided data in the section where the number of types is "1000" → the hash table of the divided data in the section where the number of types is "50000" are arranged in this order. .. In this case, the maximum number of tables in each section is 1, 1, 10, and 10000 as shown in FIG. 3B, so the total number of hash tables is 1+1+10+10000=10011 at maximum. is there. Therefore, the maximum memory usage of the tree of the hash table is about 5.2 GByte, which is within 16 GByte.

［第３の実施形態］
次に、第３の実施形態の検索装置１０を説明する。前記した実施形態と同じ構成は同じ符号を付して説明を省略する。 [Third Embodiment]
Next, the search device 10 according to the third embodiment will be described. The same configurations as those of the above-described embodiment are denoted by the same reference numerals, and description thereof will be omitted.

第１の実施形態で述べたとおり、ハッシュ空間の分割数（例えば、第１の実施形態においては４分割）を増やすほど、ハッシュテーブルのツリーが使用するメモリ領域を節約することができる。そこで、予めハッシュテーブルのツリーが使用する最大メモリ領域（閾値）をコンフィグ等で検索装置１０に指定しておき、ハッシュテーブルのツリーが使用するメモリ領域の総量（使用メモリ量）が上記の閾値を超える場合、検索装置１０は、上記の分割数を増加させてもよい。 As described in the first embodiment, as the number of divisions of the hash space (for example, four divisions in the first embodiment) is increased, the memory area used by the hash table tree can be saved. Therefore, the maximum memory area (threshold value) used by the tree of the hash table is specified in the search device 10 in advance by the config or the like, and the total amount of memory area (used memory amount) used by the tree of the hash table is set to the above threshold value. When it exceeds, the search device 10 may increase the number of divisions.

例えば、第３の実施形態において、検索装置１０は、破線で示す分割制御部１２４をさらに備える。分割制御部１２４は、ハッシュテーブルのツリーの使用メモリの総量が所定の閾値を超える場合、ハッシュ空間の分割数を所定数より増加させる。 For example, in the third embodiment, the search device 10 further includes a division control unit 124 shown by a broken line. The division control unit 124 increases the number of divisions of the hash space above a predetermined number when the total amount of used memory of the tree of the hash table exceeds a predetermined threshold.

例えば、分割制御部１２４は、ハッシュテーブルのツリーの使用メモリの総量が所定の閾値を超えると判断した場合、分割部１２１およびテーブル作成部１２２の用いる分割数が当初「4」であったところ、「8」に増加させる。例えば、分割制御部１２４は、分割部１２１に登録対象のruleそれぞれを8bitずつに分割した分割データを生成するよう指示し、また、テーブル作成部１２２に8bitのハッシュ値とポインタとを対応付けたハッシュテーブルを作成するよう指示する。 For example, when the partition control unit 124 determines that the total amount of used memory of the tree of the hash table exceeds the predetermined threshold value, the number of partitions used by the partition unit 121 and the table creation unit 122 was initially “4”, Increase to "8". For example, the division control unit 124 instructs the division unit 121 to generate division data by dividing each rule to be registered into 8 bits, and associates the table creation unit 122 with the 8-bit hash value and the pointer. Instruct to create a hash table.

これにより検索装置１０は、例えば、登録対象のrule数が多い場合でも、ハッシュテーブルのツリーの使用メモリの総量を所定の閾値以内に収めることができる。その結果、例えば、検索装置１０においてrule登録に使用することができるメモリ量の総量が所定の閾値（例えば、16GByte）以内に限られている場合において、登録対象のrule数が増加したときでも、これに対応することができる。 With this, the search device 10 can keep the total amount of used memory of the tree of the hash table within a predetermined threshold even if the number of rules to be registered is large. As a result, for example, when the total amount of memory that can be used for rule registration in the search device 10 is limited to within a predetermined threshold value (for example, 16 GByte), even when the number of rules to be registered increases, This can be dealt with.

例えば、rule登録に使用することができるメモリ量の総量が16GByte以内に限られている場合において、検索装置１０が、10万ruleをハッシュテーブルのツリーに登録するときを考える。この場合、検索装置１０が、ハッシュ空間の分割数を「4」とすると、例えば、図４（ａ）に示すように、ハッシュテーブルのテーブル数は最大で1+65536+100000+100000=265537である。よって、ハッシュテーブルのツリーに使用メモリ量は、最大で約139.2GByteで、16GByte以内に収まらず、その結果、検索装置１０は、登録対象のruleを登録できない可能性がある。 For example, consider a case in which the search device 10 registers 100,000 rules in the tree of the hash table when the total amount of memory that can be used for rule registration is limited to within 16 GBytes. In this case, if the search device 10 sets the number of divisions of the hash space to “4”, for example, the number of hash tables is 1+65536+100000+100000=265537 at maximum, as shown in FIG. is there. Therefore, the maximum memory used in the tree of the hash table is about 139.2 GBytes, which does not fit within 16 GBytes. As a result, the search device 10 may not be able to register the rule to be registered.

一方、第３の実施形態の検索装置１０のように、ハッシュ空間の分割数を「8」とする（つまり、8bitずつに分割する）と、例えば、図４（ｂ）に示すように、ハッシュテーブルのテーブル数は最大で1+256+65536+100000+…+100000=565793である。よって、ハッシュテーブルのツリーの使用メモリ量は、最大でも約1.2GByteで、16GByte以内に収めることができる。その結果、検索装置１０は、登録対象のruleを登録することができる。 On the other hand, if the number of divisions of the hash space is set to “8” (that is, divided into 8 bits each) as in the search device 10 of the third embodiment, for example, as shown in FIG. The maximum number of tables is 1+256+65536+100000+...+100000=565793. Therefore, the memory usage of the tree of the hash table is about 1.2 GByte at maximum, and can be kept within 16 GByte. As a result, the search device 10 can register the rule to be registered.

［第４の実施形態］
次に、第４の実施形態の検索装置１０を説明する。前記した実施形態と同じ構成は同じ符号を付して説明を省略する。 [Fourth Embodiment]
Next, the search device 10 according to the fourth embodiment will be described. The same configurations as those of the above-described embodiment are denoted by the same reference numerals, and description thereof will be omitted.

前記したとおり検索装置１０は、ハッシュ関数の縮尺度を高めることによっても、ハッシュテーブルのツリーが使用するメモリ領域を節約することができる。そこで、予めハッシュテーブルのツリーが使用する最大メモリ領域（閾値）をコンフィグ等で検索装置１０に指定しておき、ハッシュテーブルのツリーの使用メモリの総量が上記の閾値を超える場合、検索装置１０は、ハッシュ関数の縮尺度を高めるようにしてもよい。 As described above, the search device 10 can also save the memory area used by the tree of the hash table by increasing the reduction scale of the hash function. Therefore, when the maximum memory area (threshold value) used by the tree of the hash table is specified in the search device 10 in advance by the configuration and the total amount of used memory of the tree of the hash table exceeds the above threshold value, the search device 10 The reduction scale of the hash function may be increased.

ここで、ハッシュ関数の縮尺度を高めるとは、例えば、検索装置１０がハッシュ関数にmod N（Nは自然数）を用いていた場合、Nの値を今まで使っていた値よりも小さくする。あるいは、検索装置１０がもともとハッシュ関数に恒等関数を用いていた場合、ハッシュ関数にmod N（Nは自然数）を用いるようにすることである。この場合の実施形態を第４の実施形態として説明する。 Here, to increase the reduction scale of the hash function means, for example, when the search device 10 uses mod N (N is a natural number) for the hash function, the value of N is made smaller than the value used so far. Alternatively, if the search device 10 originally uses the identity function for the hash function, mod N (N is a natural number) is used for the hash function. An embodiment in this case will be described as a fourth embodiment.

例えば、第３の実施形態の項で述べたとおり、検索装置１０が10万ruleを登録する場合において、ハッシュ関数として恒等関数を用いると、ハッシュテーブルのツリーの使用メモリの総量は16GByteを超える可能性がある。しかし、検索装置１０が、例えば、ハッシュ関数としてy=x mod 8192(縮尺 1/8)を用いることで、10万ruleを登録する場合でも、ハッシュテーブルのツリーの使用メモリ量は、最大で約13.6GByteで、に抑えることができる。このことを、図５を用いて説明する。 For example, as described in the section of the third embodiment, when the search device 10 registers 100,000 rules and the identity function is used as the hash function, the total amount of used memory of the tree of the hash table exceeds 16 GByte. there is a possibility. However, even if the search device 10 registers 100,000 rules by using y=x mod 8192 (scale 1/8) as the hash function, the maximum memory usage of the tree of the hash table is about With 13.6GByte, it can be suppressed to. This will be described with reference to FIG.

例えば、図５に示すように、検索装置１０が64bitのハッシュ区間を４分割し、10万ruleを登録する場合において、各ハッシュテーブルに用いるハッシュ関数としてmod 8192(縮尺 1/8)を適用したとき、ハッシュテーブルのテーブル数は最大で1+8192+100000+100000=208193となる。よって、ハッシュテーブルのツリーの使用メモリ量は、最大で約13.6GByteで、16GByte以内に収まる。 For example, as shown in FIG. 5, when the search device 10 divides a 64-bit hash interval into four and registers 100,000 rules, mod 8192 (scale 1/8) is applied as a hash function used for each hash table. At this time, the maximum number of hash tables is 1+8192+100000+100000=208193. Therefore, the memory usage of the tree of the hash table is about 13.6 GByte at maximum, which is within 16 GByte.

[その他の実施形態]
なお、検索装置１０は、ハッシュテーブルのツリーの使用メモリの総量が所定の閾値を超える場合、ハッシュ空間の分割数を増加させる（第３の実施形態参照）か、ハッシュテーブルに用いられるハッシュ関数の縮尺度を高める（第４の実施形態参照）かを、検索装置１０の利用者が設定してもよい。 [Other Embodiments]
Note that the search device 10 increases the number of divisions of the hash space when the total amount of used memory of the tree of the hash table exceeds a predetermined threshold value (see the third embodiment), or the hash function used for the hash table. The user of the search device 10 may set whether to increase the reduction scale (see the fourth embodiment).

この場合、検索装置１０は、ハッシュテーブルのツリーの使用メモリの総量が所定の閾値を超える場合、ハッシュ空間の分割数を増加させるか、ハッシュ関数の縮尺度を高めるかを示す設定情報を入力部１１経由で受け付けると、当該設定情報を記憶部１３に格納しておく。そして、検索装置１０の指示部１２５（図１参照）は、ハッシュテーブルのツリーの使用メモリの総量が所定値を超える場合、記憶部１３に記憶された設定情報に従い、分割制御部１２４に分割数を所定値よりも増加させる指示、または、テーブル作成部１２２に、ハッシュテーブルに用いられるハッシュ関数の縮尺度を所定値よりも高める指示を行う。 In this case, when the total amount of used memory of the tree of the hash table exceeds a predetermined threshold, the search device 10 inputs the setting information indicating whether to increase the number of divisions of the hash space or the reduction scale of the hash function. When it is accepted via 11, the setting information is stored in the storage unit 13. Then, when the total amount of used memory of the tree of the hash table exceeds a predetermined value, the instruction unit 125 (see FIG. 1) of the search device 10 causes the division control unit 124 to divide the number of divisions according to the setting information stored in the storage unit 13. Is increased above a predetermined value, or the table creation unit 122 is instructed to increase the reduction scale of the hash function used in the hash table above a predetermined value.

このようにすることで、ハッシュテーブルのツリーの使用メモリの総量が所定値を超える場合、検索装置１０がハッシュ空間の分割数を増加させる（第３の実施形態参照）方式を採るか、各ハッシュテーブルに用いられるハッシュ関数の縮尺度を高める（第４の実施形態参照）方式を採るかを検索装置１０の利用者が指定することができる。 By doing so, when the total amount of used memory in the tree of the hash table exceeds a predetermined value, the search device 10 adopts a method of increasing the number of divisions of the hash space (see the third embodiment), or The user of the search device 10 can specify whether to adopt the method of increasing the reduction scale of the hash function used in the table (see the fourth embodiment).

なお、検索装置１０は、ハッシュテーブルのツリーの使用メモリの総量に応じて、ハッシュ空間の分割数、および、各ハッシュテーブルに用いられるハッシュ関数の縮尺度を決定してもよい。例えば、検索装置１０は、ハッシュテーブルのツリーの使用メモリの総量が大きいほど（つまり、登録rule数が多いほど）、ハッシュ空間の分割数を増加させ、あるいは、各ハッシュテーブルに用いられるハッシュ関数の縮尺度を高くしてもよい。 The search device 10 may determine the number of divisions of the hash space and the reduction scale of the hash function used for each hash table according to the total amount of used memory of the tree of the hash table. For example, the search device 10 increases the number of divisions of the hash space as the total amount of used memory of the tree of the hash table increases (that is, the number of registered rules increases), or the number of hash functions used for each hash table increases. The reduction scale may be increased.

［プログラム］
また、上記の実施形態で述べた検索装置１０の機能を実現するプログラムを所望の情報処理装置（コンピュータ）にインストールすることによって実装できる。例えば、パッケージソフトウェアやオンラインソフトウェアとして提供される上記のプログラムを情報処理装置に実行させることにより、情報処理装置を検索装置１０として機能させることができる。ここで言う情報処理装置には、デスクトップ型またはノート型のパーソナルコンピュータ、ラック搭載型のサーバコンピュータ等が含まれる。また、その他にも、情報処理装置にはスマートフォン、携帯電話機やＰＨＳ（Personal Handyphone System）等の移動体通信端末、さらには、ＰＤＡ（Personal Digital Assistant）等がその範疇に含まれる。また、検索装置１０を、クラウドサーバに実装してもよい。 [program]
Further, it can be implemented by installing a program that realizes the function of the search device 10 described in the above embodiment into a desired information processing device (computer). For example, the information processing device can be caused to function as the search device 10 by causing the information processing device to execute the above-described program provided as package software or online software. The information processing device mentioned here includes a desktop or notebook personal computer, a rack-mounted server computer, and the like. Further, in addition to the above, the information processing apparatus includes a mobile communication terminal such as a smartphone, a mobile phone, a PHS (Personal Handyphone System), and a PDA (Personal Digital Assistant) in its category. Further, the search device 10 may be mounted on a cloud server.

図６を用いて、上記のプログラムを実行するコンピュータの一例を説明する。図６に示すように、コンピュータ１０００は、例えば、メモリ１０１０と、ＣＰＵ１０２０と、ハードディスクドライブインタフェース１０３０と、ディスクドライブインタフェース１０４０と、シリアルポートインタフェース１０５０と、ビデオアダプタ１０６０と、ネットワークインタフェース１０７０とを有する。これらの各部は、バス１０８０によって接続される。 An example of a computer that executes the above program will be described with reference to FIG. As shown in FIG. 6, the computer 1000 has, for example, a memory 1010, a CPU 1020, a hard disk drive interface 1030, a disk drive interface 1040, a serial port interface 1050, a video adapter 1060, and a network interface 1070. These units are connected by a bus 1080.

メモリ１０１０は、ＲＯＭ（Read Only Memory）１０１１およびＲＡＭ（Random Access Memory）１０１２を含む。ＲＯＭ１０１１は、例えば、ＢＩＯＳ（Basic Input Output System）等のブートプログラムを記憶する。ハードディスクドライブインタフェース１０３０は、ハードディスクドライブ１０９０に接続される。ディスクドライブインタフェース１０４０は、ディスクドライブ１１００に接続される。ディスクドライブ１１００には、例えば、磁気ディスクや光ディスク等の着脱可能な記憶媒体が挿入される。シリアルポートインタフェース１０５０には、例えば、マウス１１１０およびキーボード１１２０が接続される。ビデオアダプタ１０６０には、例えば、ディスプレイ１１３０が接続される。 The memory 1010 includes a ROM (Read Only Memory) 1011 and a RAM (Random Access Memory) 1012. The ROM 1011 stores, for example, a boot program such as a BIOS (Basic Input Output System). The hard disk drive interface 1030 is connected to the hard disk drive 1090. The disk drive interface 1040 is connected to the disk drive 1100. A removable storage medium such as a magnetic disk or an optical disk is inserted into the disk drive 1100. A mouse 1110 and a keyboard 1120 are connected to the serial port interface 1050, for example. A display 1130, for example, is connected to the video adapter 1060.

ここで、図６に示すように、ハードディスクドライブ１０９０は、例えば、ＯＳ１０９１、アプリケーションプログラム１０９２、プログラムモジュール１０９３およびプログラムデータ１０９４を記憶する。前記した実施形態で説明した各種データや情報は、例えばハードディスクドライブ１０９０やメモリ１０１０に記憶される。 Here, as shown in FIG. 6, the hard disk drive 1090 stores, for example, an OS 1091, an application program 1092, a program module 1093, and program data 1094. The various data and information described in the above embodiments are stored in, for example, the hard disk drive 1090 or the memory 1010.

そして、ＣＰＵ１０２０が、ハードディスクドライブ１０９０に記憶されたプログラムモジュール１０９３やプログラムデータ１０９４を必要に応じてＲＡＭ１０１２に読み出して、上述した各手順を実行する。 Then, the CPU 1020 reads the program module 1093 and the program data 1094 stored in the hard disk drive 1090 into the RAM 1012 as necessary, and executes the above-described procedures.

なお、上記のプログラムに係るプログラムモジュール１０９３やプログラムデータ１０９４は、ハードディスクドライブ１０９０に記憶される場合に限られず、例えば、着脱可能な記憶媒体に記憶されて、ディスクドライブ１１００等を介してＣＰＵ１０２０によって読み出されてもよい。あるいは、上記のプログラムに係るプログラムモジュール１０９３やプログラムデータ１０９４は、ＬＡＮやＷＡＮ（Wide Area Network）等のネットワークを介して接続された他のコンピュータに記憶され、ネットワークインタフェース１０７０を介してＣＰＵ１０２０によって読み出されてもよい。 The program module 1093 and the program data 1094 related to the above program are not limited to being stored in the hard disk drive 1090, and may be stored in a removable storage medium and read by the CPU 1020 via the disk drive 1100 or the like. May be issued. Alternatively, the program module 1093 and the program data 1094 related to the above program are stored in another computer connected via a network such as LAN or WAN (Wide Area Network), and read by the CPU 1020 via the network interface 1070. May be done.

１０検索装置
１１入力部
１２制御部
１３記憶部
１２１分割部
１２２テーブル作成部
１２３検索部
１２４分割制御部
１２５指示部 10 search device 11 input unit 12 control unit 13 storage unit 121 division unit 122 table creation unit 123 search unit 124 division control unit 125 instruction unit

Claims

An input unit that receives the input of the data group to be registered in the table,
A division unit that generates division data of a predetermined number of bits by dividing each input data group into a plurality of sections,
When registering the data group, create a hash table that associates the hash value of the predetermined bit number with NULL as a pointer, and from the NULL pointer that is associated with the hash value of the divided data in the created hash table, The total amount of used memory of each of the hash tables is constructed by updating the pointers to the hash tables of the divided data following the divided data in the original data of the divided data and constructing a tree with each of the hash tables as a node. And a table creation unit that reduces hash collisions in the hash table,
A search device comprising:

The table creation unit,
For each section, the types of the divided data belonging to the section are aggregated, and the hash table of the divided data of the section having the smaller types of the aggregated divided data is arranged on the root side of the tree, and the aggregated divided data is arranged. 2. The search according to claim 1, wherein the number of data that can be registered is increased by constructing the tree arranged on the end side of the tree for a hash table of divided data in a section having a large number of types. apparatus.

The table creation unit,
The search apparatus according to claim 1, wherein a hash collision is not generated in the hash table by using an identity function as a hash function used in the hash table.

The search device according to claim 1, further comprising a division control unit that increases the number of the divided sections above a predetermined value when the total amount of used memory of the tree exceeds a predetermined value.

The table creation unit,
The search device according to claim 1, wherein when the total amount of memory used by the tree exceeds a predetermined value, the reduction scale of the hash function used in the hash table is increased above a predetermined value.

When the total amount of memory used by the tree exceeds a predetermined value, an instruction to the division control unit to increase the number of the division sections above a predetermined value according to the setting information stored in the storage unit, or the table creation unit The search device according to claim 4, further comprising an instruction unit that issues an instruction to increase a reduction scale of a hash function used in the hash table above a predetermined value.

A method for creating a hash table executed by a search device, comprising:
A step of receiving the input of the data group to be registered in the table,
Generating divided data with a predetermined number of bits by dividing each input data group into a plurality of sections,
When registering the data group, create a hash table that associates the hash value of the predetermined bit number with NULL as a pointer, and from the NULL pointer that is associated with the hash value of the divided data in the created hash table, The total amount of used memory of each of the hash tables is constructed by updating the pointers to the hash tables of the divided data following the divided data in the original data of the divided data and constructing a tree with each of the hash tables as a node. To reduce hash collisions in the hash table,
A method of creating a hash table characterized by including.