JP2012230493A

JP2012230493A - Retrieval device, retrieval method, and program

Info

Publication number: JP2012230493A
Application number: JP2011097366A
Authority: JP
Inventors: Kentaro Kamado; 健太郎釜洞; Masakazu Hattori; 雅一服部
Original assignee: Toshiba Corp; Toshiba Solutions Corp
Current assignee: Toshiba Corp; Toshiba Digital Solutions Corp
Priority date: 2011-04-25
Filing date: 2011-04-25
Publication date: 2012-11-22
Anticipated expiration: 2031-04-25
Also published as: JP5462215B2

Abstract

PROBLEM TO BE SOLVED: To provide a retrieval device, a retrieval method, and a program which can reduce a data amount held by an index storage unit.SOLUTION: A retrieval device comprises a database, an index storage unit, a reception unit, a calculation unit, a first determination unit, a second determination unit, and an acquisition unit. The database stores a plurality of pieces of data information of which each has a plurality of fields. The index storage unit stores index information associating a partial data string which is a part of a data string registered with a specific field, a hush value of the data string, and position information indicating a position of data information of which the specific field registers the data string. When the first determination unit determines that the partial data string which matches a part of a retrieval data string exists in the index storage unit, and also the second determination unit determines that the hash value corresponding to the partial data string matches a hush value of the retrieval data string, the acquisition unit acquires positional information corresponding to the partial data string.

Description

本発明は、検索装置、検索方法およびプログラムに関する。 The present invention relates to a search device, a search method, and a program.

従来、例えばＢ木などのデータ構造からなる索引（インデックス）が格納された索引記憶部を備え、索引記憶部から読み出した索引を用いて、所望のデータをデータベースから検索する検索装置が知られている。索引記憶部が保持するデータ量が少ないほど、検索時の読み出し量が少なくなるので、検索速度を高速化することが可能になる。このため、索引記憶部が保持するデータ量は少ないことが望まれる。 2. Description of the Related Art Conventionally, there has been known a search apparatus that includes an index storage unit that stores an index (index) having a data structure such as a B-tree, and uses the index read from the index storage unit to search desired data from a database Yes. The smaller the amount of data held in the index storage unit, the smaller the amount of reading at the time of retrieval, so that the retrieval speed can be increased. For this reason, it is desirable that the amount of data held by the index storage unit is small.

例えば、データベースに格納されたテーブルの特定のフィールドの各行に登録されたデータ（キー）と、当該キーの登録場所を示す情報（バリュー）とが格納されるＢ木のリーフノードにおけるキーの格納領域を減らす技術が知られている。この技術では、リーフノードは２層化され、第１層目のデータ構造体には、キーと、当該キーに対応する第２層目のデータ構造体の位置を示す情報（ポインタ）とが格納される。そして、第２層目のデータ構造体には、当該キーに対応する全てのバリューが格納される。例えば、テーブルの特定のフィールドの第２行目および第５行目の各々に当該キーが登録されている場合は、第２行目を示すバリューと第５行目を示すバリューとが、当該キーに対応する第２層目の構造体に格納される。一方、第１層目のデータ構造体に格納される当該キーの数は１つで済むので、結果として、リーフノードにおけるキーの格納領域が低減されるという具合である。 For example, a key storage area in a leaf node of a B tree in which data (key) registered in each row of a specific field of a table stored in the database and information (value) indicating the registration location of the key are stored Techniques to reduce this are known. In this technology, leaf nodes are divided into two layers, and a key and information (pointer) indicating the position of the second layer data structure corresponding to the key are stored in the first layer data structure. Is done. In the second layer data structure, all values corresponding to the key are stored. For example, when the key is registered in each of the second and fifth rows of a specific field of the table, the value indicating the second row and the value indicating the fifth row are the key. Are stored in the second-layer structure corresponding to. On the other hand, since only one key is stored in the first layer data structure, the key storage area in the leaf node is reduced as a result.

特開２０１０−７２８２３号公報JP 2010-72823 A

しかしながら、上述した技術では、同値のキーが殆ど存在しない場合には、索引記憶部は全てのキーを保持する必要があるので、索引記憶部が保持するデータ量を低減することはできないという問題がある。本発明が解決しようとする課題は、索引記憶部が保持するデータ量を低減可能な検索装置、検索方法およびプログラムを提供することである。 However, in the above-described technique, when there is almost no key with the same value, the index storage unit needs to hold all the keys. Therefore, there is a problem in that the amount of data held by the index storage unit cannot be reduced. is there. The problem to be solved by the present invention is to provide a search device, a search method, and a program capable of reducing the amount of data held by an index storage unit.

実施形態の検索装置は、データベースと、索引記憶部と、受付部と、算出部と、第１判定部と、第２判定部と、取得部とを備える。データベースは、それぞれがデータ列を含む複数のフィールドをそれぞれが有する複数のデータ情報を記憶する。索引記憶部は、複数のフィールドのうちの特定のフィールドに登録されたデータ列の一部である部分データ列と、データ列のハッシュ値と、当該データ列が特定のフィールドに登録されたデータ情報のデータベースにおける位置を示す位置情報とが対応付けられた索引情報を記憶する。受付部は、検索データ列を受け付ける。算出部は、検索データ列のハッシュ値を算出する。第１判定部により、検察データ列の一部と一致する部分データ列が索引記憶部に存在すると判定され、第２判定部により、当該部分データ列に対応するハッシュ値と、検索データ列のハッシュ値とが一致すると判定された場合は、取得部は、当該部分データ列に対応する位置情報を取得する。 The search device according to the embodiment includes a database, an index storage unit, a reception unit, a calculation unit, a first determination unit, a second determination unit, and an acquisition unit. The database stores a plurality of data information each having a plurality of fields each including a data string. The index storage unit includes a partial data string that is a part of a data string registered in a specific field of the plurality of fields, a hash value of the data string, and data information in which the data string is registered in the specific field. The index information associated with the position information indicating the position in the database is stored. The reception unit receives a search data string. The calculation unit calculates a hash value of the search data string. The first determination unit determines that a partial data string that matches a part of the prosecution data string exists in the index storage unit, and the second determination unit determines a hash value corresponding to the partial data string and a hash of the search data string When it is determined that the values match, the acquisition unit acquires position information corresponding to the partial data string.

また、実施形態の検索方法は、検索に用いられるデータ列を示す検索データ列を受け付ける第１ステップと、前記検索データ列のハッシュ値を算出する第２ステップと、それぞれがデータ列を含む複数のフィールドをそれぞれが有する複数のデータ情報を記憶するデータベースに存在する複数の前記データ情報のうちの何れかの前記データ情報の特定のフィールドに登録された前記データ列の一部である部分データ列と、当該データ列のハッシュ値と、当該データ情報の前記データベースにおける位置を示す位置情報とが対応付けられた索引情報を記憶する索引記憶部に、前記検察データ列の一部と一致する前記部分データ列が存在するか否かを判定する第３ステップと、前記第３ステップで、前記検察データ列の一部と一致する前記部分データ列が前記索引記憶部に存在すると判定した場合は、当該部分データ列に対応する前記ハッシュ値と、前記検索データ列の前記ハッシュ値とが一致するか否かを判定する第４ステップと、前記第４ステップで、前記部分データ列に対応する前記ハッシュ値と、前記検索データ列の前記ハッシュ値とが一致すると判定した場合は、当該部分データ列に対応する前記位置情報を取得する第５ステップと、を備えることを特徴とする。 The search method of the embodiment includes a first step of receiving a search data string indicating a data string used for the search, a second step of calculating a hash value of the search data string, and a plurality of data each including a data string A partial data sequence that is a part of the data sequence registered in a specific field of any one of the plurality of data information existing in a database storing a plurality of data information each having a field; The partial data that coincides with a part of the prosecution data string in an index storage unit that stores index information in which a hash value of the data string and position information indicating a position of the data information in the database are associated with each other A third step of determining whether or not a column exists, and the partial data that matches a part of the prosecution data sequence in the third step. A fourth step of determining whether or not the hash value corresponding to the partial data string matches the hash value of the search data string; If it is determined in the fourth step that the hash value corresponding to the partial data string matches the hash value of the search data string, the position information corresponding to the partial data string is acquired. And a step.

さらに、実施形態のプログラムは、検索に用いられるデータ列を示す検索データ列を受け付ける第１ステップと、前記検索データ列のハッシュ値を算出する第２ステップと、それぞれがデータ列を含む複数のフィールドをそれぞれが有する複数のデータ情報を記憶するデータベースに存在する複数の前記データ情報のうちの何れかの前記データ情報の特定のフィールドに登録された前記データ列の一部である部分データ列と、当該データ列のハッシュ値と、当該データ情報の前記データベースにおける位置を示す位置情報とが対応付けられた索引情報を記憶する索引記憶部に、前記検察データ列の一部と一致する前記部分データ列が存在するか否かを判定する第３ステップと、前記第３ステップで、前記検察データ列の一部と一致する前記部分データ列が前記索引記憶部に存在すると判定した場合は、当該部分データ列に対応する前記ハッシュ値と、前記検索データ列の前記ハッシュ値とが一致するか否かを判定する第４ステップと、前記第４ステップで、前記部分データ列に対応する前記ハッシュ値と、前記検索データ列の前記ハッシュ値とが一致すると判定した場合は、当該部分データ列に対応する前記位置情報を取得する第５ステップと、をコンピュータに実行させるためのプログラムである。 Furthermore, the program according to the embodiment includes a first step of receiving a search data string indicating a data string used for search, a second step of calculating a hash value of the search data string, and a plurality of fields each including a data string A partial data sequence that is a part of the data sequence registered in a specific field of any one of the plurality of data information existing in a database that stores a plurality of data information each having The partial data string that matches a part of the prosecution data string in an index storage unit that stores index information in which a hash value of the data string is associated with position information indicating a position of the data information in the database A third step for determining whether or not there exists, and the portion that matches a part of the prosecution data string in the third step If it is determined that a data string exists in the index storage unit, a fourth step of determining whether the hash value corresponding to the partial data string matches the hash value of the search data string; If it is determined in the fourth step that the hash value corresponding to the partial data string matches the hash value of the search data string, the position information corresponding to the partial data string is acquired. Is a program for causing a computer to execute steps.

第１実施形態の検索装置の一例を示すブロック図。The block diagram which shows an example of the search device of 1st Embodiment. データベース部の一例を示す図。The figure which shows an example of a database part. データ情報の一例を示す図。The figure which shows an example of data information. 索引記憶部の一例を示す図。The figure which shows an example of an index memory | storage part. 検索処理の一例を示すフローチャートFlow chart showing an example of search processing 登録処理の一例を示すフローチャート。The flowchart which shows an example of a registration process. 削除処理の一例を示すフローチャート。10 is a flowchart illustrating an example of a deletion process. 第２実施形態の索引記憶部の一例を示す図。The figure which shows an example of the index memory | storage part of 2nd Embodiment. 第２実施形態の検索装置の一例を示すブロック図。The block diagram which shows an example of the search device of 2nd Embodiment. 登録処理の一例を示す図。The figure which shows an example of a registration process. 変形例を説明するための図。The figure for demonstrating a modification.

（第１実施形態）
図１は、第１実施形態の検索装置１００の概略構成の一例を示すブロック図である。図１に示すように、検索装置１００は、操作表示部１０と、記憶部２０と、制御部３０とを備える。操作表示部１０は、各種画面や検索装置１００に関する情報（例えば検索結果等）を表示するとともに、ユーザーが各種の操作入力を行うための手段である。詳細な図示は省略するが、操作表示部１０は、各種画面や検索装置１００に関する情報を表示するとともにユーザーからのタッチ入力を受け付ける表示パネルと、例えば各種ボタンやマウスなどの操作デバイスとを備える。 (First embodiment)
FIG. 1 is a block diagram illustrating an example of a schematic configuration of a search device 100 according to the first embodiment. As illustrated in FIG. 1, the search device 100 includes an operation display unit 10, a storage unit 20, and a control unit 30. The operation display unit 10 is a means for displaying various screens and information (for example, search results) regarding the search device 100 and allowing the user to input various operations. Although not shown in detail, the operation display unit 10 includes a display panel that displays various screens and information related to the search device 100 and receives a touch input from a user, and operation devices such as various buttons and a mouse.

記憶部２０は、データベース部２２と索引記憶部２４とを含む。データベース部２２は、複数のフィールドを各々が有する複数のデータ情報を記憶する。図２に示すように、データベース部２２は、複数のデータ情報ＤＱを記憶する。ここでは、データ情報ＤＱは文書を示す情報であり、一例として、ＸＭＬ（Extensible Markup Language）で記述された半構造化データがデータ情報ＤＱとして採用されている。 The storage unit 20 includes a database unit 22 and an index storage unit 24. The database unit 22 stores a plurality of data information each having a plurality of fields. As shown in FIG. 2, the database unit 22 stores a plurality of data information DQ. Here, the data information DQ is information indicating a document, and as an example, semi-structured data described in XML (Extensible Markup Language) is adopted as the data information DQ.

図３は、データ情報ＤＱの一例を示す図である。図３に示すように、データ情報ＤＱは、それぞれにデータ列が登録される複数のフィールドＦを有している。本実施形態では、各データ情報ＤＱが有する複数のフィールドＦのうちの特定のフィールドＦｘに対してのみ索引が作成される。詳細な内容については後述する。ここでは、各データ情報ＤＱの特定のフィールドＦｘは、アプリケーション名（application-title）を構成する文字列（データ列）が登録されるフィールドである。 FIG. 3 is a diagram illustrating an example of the data information DQ. As shown in FIG. 3, the data information DQ has a plurality of fields F in which data strings are registered. In the present embodiment, an index is created only for a specific field Fx among a plurality of fields F included in each data information DQ. Detailed contents will be described later. Here, the specific field Fx of each data information DQ is a field in which a character string (data string) constituting an application name (application-title) is registered.

図１に戻って説明を続ける。索引記憶部２４は、索引（インデックス）を記憶する。索引は、例えばＢ木などのデータ構造である。本実施形態では、索引としてＢ木が用いられるが、これに限らず、索引の種類は任意である。例えばビットマップインデックスや関数インデックスなどを採用することもできる。図４に示すように、索引記憶部２４は、最上位層のルートノード２５と、中間層のブランチノード２６と、最下層のリーフノード２７とを含む。リーフノード２７は、アプリケーション名を構成する文字列の一部である部分文字列と、当該アプリケーション名を構成する文字列のハッシュ値と、当該アプリケーション名が特定のフィールドＦｘに登録されたデータ情報ＤＱのデータベース部２２における位置を示すＩＤ（位置情報）とが対応付けられた索引情報を行ごとに記憶する。本実施形態のＩＤ（位置情報）は、データベース部２２におけるデータ情報ＤＱの位置だけでなく、当該データ情報ＤＱにおける特定のフィールドＦｘの位置も特定する情報である。また、本実施形態では、部分文字列のデータ長は固定長である。ここでは、一例として１０文字分のデータ長に設定されるが、これに限定されるものではない。 Returning to FIG. 1, the description will be continued. The index storage unit 24 stores an index (index). The index is a data structure such as a B-tree, for example. In the present embodiment, a B-tree is used as an index, but this is not a limitation, and the type of index is arbitrary. For example, a bitmap index or a function index can be employed. As shown in FIG. 4, the index storage unit 24 includes a root node 25 in the highest layer, a branch node 26 in the intermediate layer, and a leaf node 27 in the lowest layer. The leaf node 27 includes a partial character string that is a part of a character string that constitutes the application name, a hash value of the character string that constitutes the application name, and data information DQ in which the application name is registered in a specific field Fx. The index information associated with the ID (position information) indicating the position in the database unit 22 is stored for each row. The ID (position information) of the present embodiment is information that specifies not only the position of the data information DQ in the database unit 22 but also the position of a specific field Fx in the data information DQ. In the present embodiment, the data length of the partial character string is a fixed length. Here, as an example, the data length is set to 10 characters, but the present invention is not limited to this.

図１に示す制御部３０は、検索装置１００の各部を制御する手段であり、ＣＰＵ（Central Processing Unit）、ＲＯＭ（Read Only Memory）、および、ＲＡＭ(Random Access Memory)などを含む制御装置で構成される。制御部３０が有する機能としては、
受付部３１、リーフノード特定部３２、算出部３３、第１判定部３４、第２判定部３５、取得部３６、特定部３７、第３判定部３８、第４判定部３９、登録部４０、決定部４１、削除対象特定部４２、削除部４３がある。以上の機能（３１、３２、３３、３４、３５、３６、３７、３８、３９、４０、４１、４２、４３）は、ＣＰＵが、ＲＯＭに格納された制御プログラムをＲＡＭ上に読み出して実行することにより実現される。なお、これに限らず、以上の機能（３１、３２、３３、３４、３５、３６、３７、３８、３９、４０、４１、４２、４３）のうちの少なくとも一部がハードウェア回路で実現されてもよい。 A control unit 30 shown in FIG. 1 is a unit that controls each unit of the search device 100, and includes a control device including a CPU (Central Processing Unit), a ROM (Read Only Memory), a RAM (Random Access Memory), and the like. Is done. As a function which control part 30 has,
Receiving unit 31, leaf node specifying unit 32, calculating unit 33, first determining unit 34, second determining unit 35, acquiring unit 36, specifying unit 37, third determining unit 38, fourth determining unit 39, registration unit 40, There are a determination unit 41, a deletion target specifying unit 42, and a deletion unit 43. The above functions (31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43) are executed by the CPU by reading the control program stored in the ROM onto the RAM. Is realized. Not limited to this, at least a part of the above functions (31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43) is realized by a hardware circuit. May be.

受付部３１は、操作表示部１０からの各種入力を受け付ける。例えば、受付部３１は、検索に用いられるアプリケーション名を構成する文字列（検索文字列と呼ぶ）の入力を受け付けることもできるし、索引情報の登録を行うデータ情報ＤＱの入力を受け付けることもできる。また、削除を行うデータ情報ＤＱの入力を受け付けることもできる。リーフノード特定部３２は、入力された文字列に対応するリーフノード２７を特定する。算出部３３は、入力された文字列のハッシュ値を算出する。 The accepting unit 31 accepts various inputs from the operation display unit 10. For example, the accepting unit 31 can accept an input of a character string (referred to as a search character string) that constitutes an application name used for a search, or can accept an input of data information DQ for registering index information. . It is also possible to accept input of data information DQ to be deleted. The leaf node specifying unit 32 specifies the leaf node 27 corresponding to the input character string. The calculation unit 33 calculates a hash value of the input character string.

第１判定部３４は、検索文字列の一部と一致する部分文字列が索引記憶部２４（リーフノード２７）に存在するか否かを判定する。第２判定部３５は、第１判定部３４により、検索文字列の一部と一致する部分文字列が索引記憶部２４に存在すると判定された場合は、当該部分文字列に対応するハッシュ値と、検索文字列のハッシュ値とが一致するか否かを判定する。取得部３６は、第２判定部３５により、部分文字列に対応するハッシュ値と、検索文字列のハッシュ値とが一致すると判定された場合は、当該部分文字列に対応するＩＤを索引記憶部２４（リーフノード２７）から取得する。また、取得部３６は、当該ＩＤで特定されるデータ情報ＤＱをデータベース部２２から取得することもできる。 The first determination unit 34 determines whether or not a partial character string that matches a part of the search character string exists in the index storage unit 24 (leaf node 27). When the first determination unit 34 determines that a partial character string that matches a part of the search character string exists in the index storage unit 24, the second determination unit 35 calculates the hash value corresponding to the partial character string and Then, it is determined whether or not the hash value of the search character string matches. When the second determination unit 35 determines that the hash value corresponding to the partial character string matches the hash value of the search character string, the acquisition unit 36 uses the ID corresponding to the partial character string as the index storage unit. 24 (leaf node 27). The acquisition unit 36 can also acquire the data information DQ specified by the ID from the database unit 22.

上述の受付部３１、リーフノード特定部３２、算出部３３、第１判定部３４、第２判定部３５、取得部３６は、後述の検索処理を実行する検索処理部であると捉えることもできる。検索処理の詳細な内容については後述する。 The reception unit 31, the leaf node identification unit 32, the calculation unit 33, the first determination unit 34, the second determination unit 35, and the acquisition unit 36 described above can be regarded as a search processing unit that executes a search process described later. . Details of the search process will be described later.

特定部３７は、索引情報の登録を行うデータ情報ＤＱの入力を受付部３１で受け付けた場合、当該データ情報ＤＱの特定のフィールドＦｘに登録されたアプリケーション名を構成する文字列（登録文字列と呼ぶ）と、当該データ情報ＤＱのＩＤ（位置情報）とを特定する。第３判定部３８は、登録文字列の一部と一致する部分文字列が索引記憶部２４（リーフノード２７）に存在するか否かを判定する。第４判定部３９は、第３判定部３８により、登録文字列の一部と一致する部分文字列が索引記憶部２４に存在すると判定された場合は、当該部分文字列に対応するハッシュ値と、登録文字列のハッシュ値とが一致するか否かを判定する。 When the receiving unit 31 receives the input of the data information DQ for registering the index information, the specifying unit 37 includes a character string (a registered character string and a registered character string) that is registered in the specific field Fx of the data information DQ. And the ID (position information) of the data information DQ. The third determination unit 38 determines whether or not a partial character string that matches a part of the registered character string exists in the index storage unit 24 (leaf node 27). If the third determination unit 38 determines that a partial character string that matches a part of the registered character string exists in the index storage unit 24, the fourth determination unit 39 uses the hash value corresponding to the partial character string and Then, it is determined whether or not the hash value of the registered character string matches.

登録部４０は、登録文字列の部分文字列と、当該登録文字列のハッシュ値と、特定部３７で特定されたＩＤとを対応付けた索引情報を、索引記憶部２４（リーフノード２７）に登録する。 The registration unit 40 stores index information in which the partial character string of the registered character string, the hash value of the registered character string, and the ID specified by the specifying unit 37 are associated with each other in the index storage unit 24 (leaf node 27). sign up.

決定部４１は、第３判定部３８により登録文字列の一部と一致する部分文字列が索引記憶部２４に存在しないと判定された場合、または、第４判定部３９により部分文字列に対応するハッシュ値と、登録文字列のハッシュ値とが一致しないと判定された場合は、所定の規則に従って、索引情報の登録場所を決定する。 The determination unit 41 corresponds to the partial character string when the third determination unit 38 determines that the partial character string that matches a part of the registered character string does not exist in the index storage unit 24, or the fourth determination unit 39 When it is determined that the hash value to be registered does not match the hash value of the registered character string, the registration location of the index information is determined according to a predetermined rule.

上述の受付部３１、リーフノード特定部３２、算出部３３、取得部３６、特定部３７、第３判定部３８、第４判定部３９、登録部４０、決定部４１は、後述の登録処理を実行する登録処理部であると捉えることもできる。登録処理の詳細な内容については後述する。 The accepting unit 31, the leaf node specifying unit 32, the calculating unit 33, the acquiring unit 36, the specifying unit 37, the third determining unit 38, the fourth determining unit 39, the registering unit 40, and the determining unit 41 perform registration processing described later. It can also be regarded as a registration processing unit to be executed. Details of the registration process will be described later.

削除対象特定部４２は、索引記憶部２４に記憶された索引情報のうち、削除対象の索引情報を特定する。削除部４３は、削除対象特定部４２で特定された索引情報を索引記憶部２４から削除する。 The deletion target specifying unit 42 specifies the index information to be deleted among the index information stored in the index storage unit 24. The deletion unit 43 deletes the index information specified by the deletion target specifying unit 42 from the index storage unit 24.

上述の受付部３１、リーフノード特定部３２、算出部３３、削除対象特定部４２、削除部４３は、後述の削除処理を実行する削除処理部であると捉えることもできる。削除処理の詳細な内容については後述する。 The reception unit 31, the leaf node specification unit 32, the calculation unit 33, the deletion target specification unit 42, and the deletion unit 43 described above can be regarded as a deletion processing unit that executes a deletion process described later. Details of the deletion process will be described later.

次に、制御部３０が実行する検索処理について説明する。図５は、検索処理の一例を示すフローチャートである。図５に示すように、まず、検索文字列の入力を受付部３１で受け付けると（ステップＳ５１１）、リーフノード特定部３２は、検索文字列に対応するリーフノード２７を特定する（ステップＳ５１２）。本実施形態では、リーフノード特定部３２は、Ｂ木をたどって、対応するリーフノード２７を特定する。より具体的には以下のとおりである。いま、図４に示すように、「データベース管理システム」という文字列を検索文字列として受け付けた場合を想定する。まずルートノード２５では、「データベース管理システム」に対応するブランチノード２６が特定される。ここでは、５０音の各行（あ行、か行、・・・、わ行）ごとに、ブランチノード２６が割り当てられており、検索文字列の先頭の文字が、５０音の各行のうちの何れに属するかに応じて、対応するブランチノード２６が特定される。「データベース管理システム」は、先頭の文字が「デ」であり、「た行」に属するとみなされるので、「データベース管理システム」に対応するブランチノード２６として、「た行」に対応するブランチノード２６が特定される。 Next, search processing executed by the control unit 30 will be described. FIG. 5 is a flowchart illustrating an example of the search process. As shown in FIG. 5, first, when the input of the search character string is received by the receiving unit 31 (step S511), the leaf node specifying unit 32 specifies the leaf node 27 corresponding to the search character string (step S512). In the present embodiment, the leaf node identification unit 32 traces the B tree and identifies the corresponding leaf node 27. More specifically, it is as follows. Assume that a character string “database management system” is received as a search character string as shown in FIG. First, in the root node 25, the branch node 26 corresponding to the “database management system” is specified. Here, a branch node 26 is assigned to each line of 50 sounds (A line, Ka line,..., W line), and the first character of the search character string is any of the 50 sound lines. Corresponding branch node 26 is specified depending on whether it belongs to. The “database management system” has a leading character “de” and is considered to belong to “ta line”, and therefore, as a branch node 26 corresponding to “database management system”, a branch node corresponding to “ta line” 26 is identified.

図４に示すように、ブランチノード２６には、当該ブランチノード２６に割り当てられた複数の文字列（ここでは一例として２文字分の文字列）ごとに、当該文字列に対応するリーフノード２７の位置を示すポインタが格納される。「データベース管理システム」の先頭から数えて２文字分の文字列は「デー」であるので、「デー」に対応するポインタ「ｅｅｅ」で示されるリーフノード２７が、「データベース管理システム」に対応するリーフノード２７となる。以上のようにして、「データベース管理システム」に対応するリーフノード２７が特定される。なお、上述の例に限らず、検索文字列に対応するリーフノード２７の特定方法は任意である。 As shown in FIG. 4, each branch node 26 has a plurality of character strings assigned to the branch node 26 (here, two character strings as an example) of the leaf node 27 corresponding to the character string. A pointer indicating the position is stored. Since the character string of two characters counted from the head of “database management system” is “day”, the leaf node 27 indicated by the pointer “eeee” corresponding to “day” corresponds to “database management system”. It becomes a leaf node 27. As described above, the leaf node 27 corresponding to the “database management system” is specified. The method for specifying the leaf node 27 corresponding to the search character string is not limited to the above example.

次に、第１判定部３４は、検索文字列の一部と一致する部分文字列がリーフノード２７に存在するか否かを判定する（ステップＳ５１３）。例えば、検索文字列が「データベース管理システム」の場合、図４のリーフノード２７の第６行目の索引情報および第７行目の索引情報の各々の部分文字列（「データベース管理シス」）が、「データベース管理システム」の一部と一致するので、ステップＳ５１３の結果は肯定となる。検索文字列の一部と一致する部分文字列がリーフノード２７に存在すると判定された場合、算出部３３は、検索文字列のハッシュ値を算出する（ステップＳ５１４）。一方、検索文字列の一部と一致する部分文字列がリーフノード２７に存在しないと判定された場合、検索処理は終了する。 Next, the first determination unit 34 determines whether or not a partial character string that matches a part of the search character string exists in the leaf node 27 (step S513). For example, when the search character string is “database management system”, the partial character strings (“database management system”) of the sixth row index information and the seventh row index information of the leaf node 27 in FIG. Therefore, the result of step S513 is affirmative. When it is determined that a partial character string that matches a part of the search character string exists in the leaf node 27, the calculation unit 33 calculates a hash value of the search character string (step S514). On the other hand, if it is determined that a partial character string that matches a part of the search character string does not exist in the leaf node 27, the search process ends.

上述のステップＳ５１４の後、第２判定部３５は、検索文字列のハッシュ値と、検索文字列の一部と一致する部分文字列に対応するハッシュ値とを比較し、検索文字列のハッシュ値と一致するハッシュ値を含む索引情報があるか否かを判定する（ステップＳ５１５）。例えば検索文字列が「データベース管理システム」の場合、図４のリーフノード２７の第６行目の索引情報のハッシュ値（Ｈ（データベース管理システム））は、検索文字列（「データベース管理システム」）のハッシュ値と一致する一方、第７行目の索引情報のハッシュ値（Ｈ（データベース管理システムの実行情報を取得する手段を有する記憶装置））は、検索文字列（「データベース管理システム」）のハッシュ値と一致しない。 After the above-described step S514, the second determination unit 35 compares the hash value of the search character string with the hash value corresponding to the partial character string that matches a part of the search character string, and the hash value of the search character string It is determined whether there is index information including a hash value that matches (step S515). For example, when the search character string is “database management system”, the hash value (H (database management system)) of the index information in the sixth row of the leaf node 27 in FIG. 4 is the search character string (“database management system”). On the other hand, the hash value of index information (H (storage device having means for acquiring execution information of the database management system)) in the seventh row is the search string (“database management system”). Does not match the hash value.

上述のステップＳ５１５において、検索文字列のハッシュ値と一致するハッシュ値を含む索引情報があると判定された場合、取得部３６は、当該索引情報のＩＤを取得する（ステップＳ５１６）。次に、取得部３６は、データベース部２２から、その取得したＩＤで特定されるデータ情報ＤＱを取得して操作表示部１０に表示する（ステップＳ５１７）。一方、上述のステップＳ５１５において、検索文字列のハッシュ値と一致する索引情報は存在しないと判定された場合、検索処理は終了する。 When it is determined in step S515 described above that there is index information including a hash value that matches the hash value of the search character string, the acquisition unit 36 acquires the ID of the index information (step S516). Next, the acquisition unit 36 acquires the data information DQ specified by the acquired ID from the database unit 22 and displays it on the operation display unit 10 (step S517). On the other hand, if it is determined in step S515 described above that there is no index information that matches the hash value of the search character string, the search process ends.

次に、制御部３０が実行する登録処理について説明する。図６は、登録処理の一例を示すフローチャートである。図６に示すように、まず、索引情報の登録を行うデータ情報ＤＱの入力を受付部３１で受け付けると（ステップＳ６１１）、特定部３７は、その受け付けたデータ情報ＤＱの特定のフィールドＦｘに登録されたアプリケーション名を構成する文字列（登録文字列と呼ぶ）と、当該データ情報ＤＱのＩＤ（位置情報）を特定する（ステップＳ６１２）。次に、リーフノード特定部３２は、登録文字列に対応するリーフノード２７を特定する（ステップＳ６１３）。この特定方法は、検索文字列に対応するリーフノード２７の特定方法と同様であるので、詳細な説明は省略する。 Next, a registration process executed by the control unit 30 will be described. FIG. 6 is a flowchart illustrating an example of the registration process. As shown in FIG. 6, first, when the input of the data information DQ for registering index information is received by the receiving unit 31 (step S611), the specifying unit 37 registers in the specific field Fx of the received data information DQ. A character string (referred to as a registered character string) constituting the application name and an ID (position information) of the data information DQ are specified (step S612). Next, the leaf node specifying unit 32 specifies the leaf node 27 corresponding to the registered character string (step S613). Since this specifying method is the same as the specifying method of the leaf node 27 corresponding to the search character string, detailed description is omitted.

次に、第３判定部３８は、登録文字列の一部と一致する部分文字列がリーフノード２７に存在するか否かを判定する（ステップＳ６１４）。ステップＳ６１４において、登録文字列の一部と一致する部分文字列がリーフノード２７に存在しないと判定された場合、決定部４１は、所定の規則に従って、登録文字列の索引情報の挿入場所を決定し（ステップＳ６１５）、処理は後述のステップＳ６２０に移行する。いま、登録文字列が「データ抽出装置、抽出方法およびプログラム」であって、登録文字列の５０音順に挿入場所が決定される場合を想定する。図４の例では、登録文字列「データ抽出装置、抽出方法およびプログラム」は、第３行目の索引情報と第４行目の索引情報との間に挿入されることが決定される。なお、これに限らず、所定の規則は任意に設定可能である。 Next, the third determination unit 38 determines whether or not a partial character string that matches a part of the registered character string exists in the leaf node 27 (step S614). If it is determined in step S614 that the partial character string that matches a part of the registered character string does not exist in the leaf node 27, the determination unit 41 determines the insertion location of the index information of the registered character string according to a predetermined rule. (Step S615), and the process proceeds to Step S620 described later. Assume that the registered character string is “data extraction apparatus, extraction method and program”, and the insertion location is determined in the order of the 50 characters of the registered character string. In the example of FIG. 4, it is determined that the registered character string “data extraction device, extraction method, and program” is inserted between the index information on the third row and the index information on the fourth row. However, the present invention is not limited to this, and the predetermined rule can be set arbitrarily.

一方、上述のステップＳ６１４において、登録文字列の一部と一致する部分文字列がリーフノード２７に存在すると判定された場合、算出部３３は、登録文字列のハッシュ値を算出する（ステップＳ６１６）。次に、第４判定部３９は、登録文字列のハッシュ値と、登録文字列の一部と一致する部分文字列に対応するハッシュ値とを比較し、登録文字列のハッシュ値と一致するハッシュ値を含む索引情報があるか否かを判定する（ステップＳ６１７）。 On the other hand, when it is determined in step S614 described above that a partial character string that matches a part of the registered character string exists in the leaf node 27, the calculation unit 33 calculates a hash value of the registered character string (step S616). . Next, the fourth determination unit 39 compares the hash value of the registered character string with the hash value corresponding to the partial character string that matches a part of the registered character string, and the hash value that matches the hash value of the registered character string. It is determined whether there is index information including a value (step S617).

上述のステップＳ６１７において、登録文字列のハッシュ値と一致するハッシュ値を含む索引情報が存在しないと判定された場合、取得部３６は、登録文字列の一部と一致する部分文字列に対応するＩＤで特定されるデータ情報ＤＱをデータベース部２２から取得し、当該データ情報ＤＱの特定のフィールドＦｘに登録されたアプリケーション名を構成する文字列を取得する（ステップＳ６１８）。例えば、図４の例において、登録文字列が「データベース管理システムおよびプログラム」の場合を想定する。この場合、第６行目の索引情報および第７行目の索引情報の各々の部分文字列（「データベース管理シス」）が、登録文字列の一部と一致するものの、ハッシュ値は一致しない。したがって、取得部３６は、第６行目の索引情報および第７行目の索引情報に含まれるＩＤで特定されるデータ情報ＤＱをそれぞれ取得するとともに、当該各データ情報ＤＱにおける特定のフィールドＦｘに登録されたアプリケーション名を構成する文字列を取得する。 If it is determined in step S617 described above that there is no index information including a hash value that matches the hash value of the registered character string, the acquisition unit 36 corresponds to the partial character string that matches a part of the registered character string. The data information DQ specified by the ID is acquired from the database unit 22, and the character string constituting the application name registered in the specific field Fx of the data information DQ is acquired (step S618). For example, in the example of FIG. 4, it is assumed that the registered character string is “database management system and program”. In this case, the partial character strings (“database management system”) of the index information on the sixth line and the index information on the seventh line match a part of the registered character string, but the hash values do not match. Therefore, the acquisition unit 36 acquires the data information DQ specified by the ID included in the index information of the sixth row and the index information of the seventh row, and stores the specific field Fx in the data information DQ. Get the character string that composes the registered application name.

次に、決定部４１は、所定の規則に従って、登録文字列の索引情報の挿入場所を決定し（ステップＳ６１９）、処理は後述のステップＳ６２０に移行する。いま、登録文字列が「データベース管理システムおよびプログラム」であって、登録文字列の５０音順に挿入場所が決定される場合を想定する。図４の例では、登録文字列「データベース管理システムおよびプログラム」は、第６行目の索引情報と第７行目の索引情報との間に挿入されることが決定される。なお、これに限らず、所定の規則は任意に設定可能である。 Next, the determination unit 41 determines the insertion position of the index information of the registered character string according to a predetermined rule (step S619), and the process proceeds to step S620 described later. Assume that the registered character string is “database management system and program”, and the insertion location is determined in the order of the 50 characters of the registered character string. In the example of FIG. 4, it is determined that the registered character string “database management system and program” is inserted between the index information on the sixth line and the index information on the seventh line. However, the present invention is not limited to this, and the predetermined rule can be set arbitrarily.

ステップＳ６２０では、登録部４０は、決定部４１により決定された挿入場所に、登録文字列の部分文字列と、当該登録文字列のハッシュ値と、特定部３７で特定されたＩＤとを対応付けた索引情報を登録する。例えば登録文字列が「データベース管理システムおよびプログラム」である場合、先頭から１０文字分のデータ長の「データベース管理シス」と、「データベース管理システムおよびプログラム」のハッシュ値と、「データベース管理システムおよびプログラム」が特定のフィールドＦｘに登録されたデータ情報ＤＱのＩＤ（特定部３７で特定されたＩＤ）とが対応付けられた索引情報が、決定部４１により決定された挿入場所に登録される。 In step S620, the registration unit 40 associates the partial character string of the registered character string, the hash value of the registered character string, and the ID specified by the specifying unit 37 with the insertion location determined by the determining unit 41. Register index information. For example, when the registered character string is “database management system and program”, “database management system” having a data length of 10 characters from the beginning, a hash value of “database management system and program”, and “database management system and program” "Is associated with the ID of the data information DQ registered in the specific field Fx (the ID specified by the specifying unit 37), and the index information is registered at the insertion location determined by the determining unit 41.

一方、上述のステップＳ６１７において、登録文字列のハッシュ値と一致するハッシュ値を含む索引情報が存在すると判定された場合、登録部４０は、リーフノード２７（索引記憶部２４）のうち、当該索引情報が記憶される行の直後の行または直前の行に、登録文字列の部分文字列と、登録文字列のハッシュ値と、特定部３７で特定されたＩＤとを対応付けた索引情報を登録する（ステップＳ６２０）。 On the other hand, in step S617 described above, when it is determined that there is index information including a hash value that matches the hash value of the registered character string, the registration unit 40 selects the index of the leaf node 27 (index storage unit 24). The index information in which the partial character string of the registered character string, the hash value of the registered character string, and the ID specified by the specifying unit 37 are registered in the line immediately after the line in which the information is stored or in the immediately preceding line. (Step S620).

次に、制御部３０が実行する削除処理について説明する。図７は、削除処理の一例を示すフローチャートである。図７に示すように、まず、削除を行うデータ情報ＤＱの入力を受付部３１で受け付けると（ステップＳ７１１）、特定部３７は、その受け付けたデータ情報ＤＱの特定のフィールドＦｘに登録されたアプリケーション名を構成する文字列（削除文字列と呼ぶ）と、当該データ情報ＤＱのＩＤ（位置情報）を特定する（ステップＳ７１２）。次に、リーフノード特定部３２は、削除文字列に対応するリーフノード２７を特定する（ステップＳ７１３）。 Next, the deletion process executed by the control unit 30 will be described. FIG. 7 is a flowchart illustrating an example of the deletion process. As shown in FIG. 7, first, when the input of the data information DQ to be deleted is received by the receiving unit 31 (step S711), the specifying unit 37 reads the application registered in the specific field Fx of the received data information DQ. A character string constituting the name (referred to as a deleted character string) and an ID (position information) of the data information DQ are specified (step S712). Next, the leaf node specifying unit 32 specifies the leaf node 27 corresponding to the deleted character string (step S713).

次に、削除対象特定部４２は、リーフノード２７に記憶された索引情報のうち削除対象となる索引情報を特定する（ステップＳ７１４）。より具体的には、削除対象特定部４２は、削除文字列のハッシュ値と一致するハッシュ値を含む索引情報を、削除対象の索引情報として特定する。次に、削除部４３は、削除対象特定部４２で特定された索引情報をリーフノード２７から削除する（ステップＳ７１５）。 Next, the deletion target specifying unit 42 specifies index information to be deleted among the index information stored in the leaf node 27 (step S714). More specifically, the deletion target specifying unit 42 specifies index information including a hash value that matches the hash value of the deletion character string as index information to be deleted. Next, the deletion unit 43 deletes the index information specified by the deletion target specifying unit 42 from the leaf node 27 (step S715).

以上に説明したように、本実施形態では、データ情報ＤＱの特定のフィールドＦｘに登録されたアプリケーション名を構成する文字列（データ列）を、そのまま索引記憶部２４に登録することはせずに、アプリケーション名を構成する文字列の一部の部分文字列と、当該文字列のハッシュ値と、ＩＤ（位置情報）とを対応付けて索引記憶部２４に記憶するので、索引記憶部２４が保持するデータ量を低減できる。したがって、本実施形態によれば、検索時の読み出し量が少なくなるので、検索速度を高速化することが可能になるという有利な効果を奏する。 As described above, in the present embodiment, the character string (data string) constituting the application name registered in the specific field Fx of the data information DQ is not registered in the index storage unit 24 as it is. Since the partial character string of the character string constituting the application name, the hash value of the character string, and the ID (position information) are stored in association with each other in the index storage unit 24, the index storage unit 24 holds Data amount to be reduced. Therefore, according to the present embodiment, since the amount of reading at the time of search is reduced, there is an advantageous effect that the search speed can be increased.

（第２実施形態）
次に第２実施形態について説明する。第２実施形態では、図８に示すように、索引記憶部２４（リーフノード２７）に記憶される部分文字列のデータ長は可変に設定される点で上述の第１実施形態と相違する。その他は、第１実施形態と同じであるので、重複する部分については説明を省略する。 (Second Embodiment)
Next, a second embodiment will be described. As shown in FIG. 8, the second embodiment differs from the first embodiment in that the data length of the partial character string stored in the index storage unit 24 (leaf node 27) is variably set. The other parts are the same as those in the first embodiment, and thus the description of the overlapping parts is omitted.

図９は、第２実施形態の検索装置１００の概略構成の一例を示すブロック図である。図９に示すように、制御部３０が有する機能の中に設定部４４が含まれる点で第１実施形態と相違する。設定部４４は、索引記憶部２４（リーフノード２７）に登録する部分文字列のデータ長を可変に設定するとともに、登録する部分文字列を設定する。 FIG. 9 is a block diagram illustrating an example of a schematic configuration of the search device 100 according to the second embodiment. As shown in FIG. 9, the control unit 30 is different from the first embodiment in that the setting unit 44 is included in the functions of the control unit 30. The setting unit 44 variably sets the data length of the partial character string to be registered in the index storage unit 24 (leaf node 27) and sets the partial character string to be registered.

図１０は、第２実施形態における登録処理の一例を示すフローチャートである。なお、検索処理および削除処理は、上述の第１実施形態と同じであるので、ここでは説明を省略する。図１０のステップＳ９１１〜ステップＳ９１４の内容は、図６のステップＳ６１１〜ステップＳ６１４の内容と同じであるので、詳細な説明は省略する。 FIG. 10 is a flowchart illustrating an example of a registration process according to the second embodiment. Note that search processing and deletion processing are the same as those in the first embodiment described above, and thus description thereof is omitted here. The contents of steps S911 to S914 in FIG. 10 are the same as the contents of steps S611 to S614 in FIG.

図１０のステップＳ９１４において、登録文字列の一部と一致する部分文字列がリーフノード２７に存在しないと判定された場合、決定部４１は、所定の規則に従って、登録文字列の索引情報の挿入場所を決定する（ステップＳ９１５）。ステップＳ９１５の内容は図６のステップＳ６１５の内容と同じであるので、詳細な説明は省略する。ステップＳ９１５の後、設定部４４は、登録文字列のデータ長を１文字分のデータ長に設定する。そして、登録文字列の先頭の文字を部分文字列として設定する（ステップＳ９１６）。例えば登録文字列が「データ抽出装置、抽出方法およびプログラム」の場合、先頭の文字である「デ」が部分文字列として設定される。なお、これに限らず、部分文字列の設定方法は任意であり、例えば先頭の文字から２文字分のデータ列を部分文字列として設定することもできる。ステップＳ９１６の後、処理はステップＳ９２２に移行する。ステップＳ９２２の内容は、図６のステップＳ６２０の内容と同じであるので、詳細な説明は省略する。 When it is determined in step S914 in FIG. 10 that a partial character string that matches a part of the registered character string does not exist in the leaf node 27, the determination unit 41 inserts index information of the registered character string according to a predetermined rule. A place is determined (step S915). Since the content of step S915 is the same as the content of step S615 of FIG. 6, detailed description is abbreviate | omitted. After step S915, the setting unit 44 sets the data length of the registered character string to the data length for one character. Then, the first character of the registered character string is set as a partial character string (step S916). For example, when the registered character string is “data extraction device, extraction method, and program”, the first character “de” is set as the partial character string. The method for setting the partial character string is not limited to this, and for example, a data string for two characters from the first character can be set as the partial character string. After step S916, the process proceeds to step S922. Since the content of step S922 is the same as the content of step S620 of FIG. 6, detailed description thereof is omitted.

上述のステップＳ９１４において、登録文字列の一部と一致する部分文字列がリーフノード２７に存在すると判定された場合、算出部３３は、登録文字列のハッシュ値を算出する（ステップＳ９１７）。次に、第４判定部３９は、登録文字列のハッシュ値と、登録文字列の一部と一致する部分文字列に対応するハッシュ値とを比較し、登録文字列のハッシュ値と一致するハッシュ値を含む索引情報があるか否かを判定する（ステップＳ９１８）。 If it is determined in step S914 described above that a partial character string that matches a part of the registered character string exists in the leaf node 27, the calculation unit 33 calculates a hash value of the registered character string (step S917). Next, the fourth determination unit 39 compares the hash value of the registered character string with the hash value corresponding to the partial character string that matches a part of the registered character string, and the hash value that matches the hash value of the registered character string. It is determined whether there is index information including a value (step S918).

上述のステップＳ９１８において、登録文字列のハッシュ値と一致するハッシュ値を含む索引情報が存在しないと判定された場合、設定部４４は、データ長を拡張して、登録する部分文字列を設定する（ステップＳ９１９）。より具体的には、設定部４４は、登録する部分文字列のデータ長を、登録文字列の一部と一致する部分文字列のデータ長よりも大きい値に設定する。 When it is determined in step S918 described above that there is no index information including a hash value that matches the hash value of the registered character string, the setting unit 44 extends the data length and sets the partial character string to be registered. (Step S919). More specifically, the setting unit 44 sets the data length of the partial character string to be registered to a value larger than the data length of the partial character string that matches a part of the registered character string.

例えば、図８の例において、登録文字列が「データベース管理システムおよびプログラム」の場合を想定する。この場合、第３行目、第５行目、第６行目および第７行目の各々の索引情報の部分文字列（「データ」、「デ」、「データベ」、「デー」）が、登録文字列の一部と一致するものの、ハッシュ値は一致しない。したがって、設定部４４は、登録する文字列のデータ長を、登録文字列の一部と一致する部分文字列のうちデータ長が最大のものよりも１文字分だけ長い値に設定する。この場合、登録する部分文字列のデータ長は５文字分の長さとなり、設定部４４は、先頭の文字から数えて５文字分のデータ長の「データベー」を、登録する部分文字列として設定する。なお、これは一例であり、登録する部分文字列の設定方法は任意である。要するに、登録する部分文字列として、登録文字列の一部と一致する部分文字列のデータ長よりも長いデータ長の文字列を設定するものであればよい。 For example, in the example of FIG. 8, it is assumed that the registered character string is “database management system and program”. In this case, partial character strings (“data”, “de”, “data database”, “data”) of the index information of the third line, the fifth line, the sixth line, and the seventh line are Although it matches a part of the registered character string, the hash value does not match. Therefore, the setting unit 44 sets the data length of the character string to be registered to a value that is longer by one character than the longest data length among the partial character strings that match a part of the registered character string. In this case, the data length of the partial character string to be registered is 5 characters long, and the setting unit 44 sets “data base” having a data length of 5 characters counted from the first character as the partial character string to be registered. Set. This is merely an example, and the method for setting the partial character string to be registered is arbitrary. In short, any character string having a data length longer than the data length of the partial character string that matches a part of the registered character string may be set as the partial character string to be registered.

ステップＳ９１９の後、処理はステップＳ９２０に移行する。ステップＳ９２０〜ステップＳ９２２の内容は、図６のステップＳ６１８〜ステップＳ６２０の内容と同じであるので、詳細な説明は省略する。 After step S919, the process proceeds to step S920. The contents of steps S920 to S922 are the same as the contents of steps S618 to S620 of FIG.

また、上述のステップＳ９１８において、登録文字列のハッシュ値と一致するハッシュ値を含む索引情報が存在しないと判定された場合、処理はステップＳ９２２に移行する。図９のステップＳ９２２の内容は、図６のステップＳ６２０の内容と同じであるので、詳細な説明は省略する。 If it is determined in step S918 described above that there is no index information including a hash value that matches the hash value of the registered character string, the process proceeds to step S922. Since the content of step S922 of FIG. 9 is the same as the content of step S620 of FIG. 6, detailed description is omitted.

以上に説明したように、本実施形態では、登録文字列の一部と一致する部分文字列がリーフノード２７に存在するものの、当該登録文字列のハッシュ値と一致するハッシュ値を含む索引情報がリーフノード２７に存在しないと判定された場合は、登録する部分文字列のデータ長は拡張される一方、登録文字列の一部と一致する部分文字列がリーフノード２７に存在しないと判定された場合は、登録する部分文字列のデータ長は抑制される（一例として、本実施形態では１文字分のデータ長に抑制される）。すなわち、必要な部分は登録する部分文字列のデータ長を伸ばしつつ、必要の無い部分で登録する部分文字列のデータ長を抑えることにより、索引記憶部２４（リーフノード２７）の容量を削減できるという利点がある。 As described above, in this embodiment, although a partial character string that matches a part of the registered character string exists in the leaf node 27, the index information including the hash value that matches the hash value of the registered character string is included. When it is determined that the leaf node 27 does not exist, the data length of the partial character string to be registered is expanded, but it is determined that the partial character string that matches a part of the registered character string does not exist in the leaf node 27. In this case, the data length of the partial character string to be registered is suppressed (as an example, the data length is suppressed to one character in this embodiment). That is, the capacity of the index storage unit 24 (leaf node 27) can be reduced by reducing the data length of the partial character string to be registered in the unnecessary part while increasing the data length of the partial character string to be registered in the necessary part. There is an advantage.

（変形例）
以上、本発明の実施形態を説明したが、上述の各実施形態は、例として提示したものであり、発明の範囲を限定することは意図していない。これら新規な実施形態は、その他の様々な形態で実施されることが可能であり、発明の要旨を逸脱しない範囲で、種々の省略、置き換え、変更を行うことができる。これら実施形態やその変形は、発明の範囲や要旨に含まれるとともに、特許請求の範囲に記載された発明とその均等の範囲に含まれる。 (Modification)
As mentioned above, although embodiment of this invention was described, each above-mentioned embodiment was shown as an example and is not intending limiting the range of invention. These novel embodiments can be implemented in various other forms, and various omissions, replacements, and changes can be made without departing from the scope of the invention. These embodiments and modifications thereof are included in the scope and gist of the invention, and are included in the invention described in the claims and the equivalents thereof.

例えば上述の各実施形態では、データベース部２２に記憶されるデータ情報の一例として、文書を示す情報（ドキュメント情報）を挙げて説明したが、これに限らず、データ情報の種類は任意である。例えば図１１に示すように、データ情報は、テーブルデータ２００の各ラインを構成するデータ群Ｇであってもよい。各データ群Ｇ（データ情報）は、それぞれにデータ列が登録される複数のフィールドＦを有する。そして、複数のフィールドＦのうち、特定のフィールドＦｘに対して、上述したような索引を作成することができる。 For example, in each of the above-described embodiments, information (document information) indicating a document has been described as an example of data information stored in the database unit 22, but the present invention is not limited to this, and the type of data information is arbitrary. For example, as shown in FIG. 11, the data information may be a data group G constituting each line of the table data 200. Each data group G (data information) has a plurality of fields F in which data strings are registered. Then, an index as described above can be created for a specific field Fx among the plurality of fields F.

また、上述の各実施形態では、データ情報ＤＱが有する複数のフィールドＦのうちの特定のフィールドＦｘに対してのみ索引が作成されているが、これに限らず、他のフィールドに対しても、上述の各実施形態と同様の索引を作成することもできる。 Further, in each of the above-described embodiments, an index is created only for a specific field Fx of the plurality of fields F included in the data information DQ. An index similar to that in each of the embodiments described above can also be created.

１０操作表示部
２０記憶部
２２データベース部
２４索引記憶部
２５ルートノード
２６ブランチノード
２７リーフノード
３０制御部
３１受付部
３２リーフノード特定部
３３算出部
３４第１判定部
３５第２判定部
３６取得部
３７特定部
３８第３判定部
３９第４判定部
４０登録部
４１決定部
４２削除対象特定部
４３削除部
４４設定部
１００検索装置
２００テーブルデータ DESCRIPTION OF SYMBOLS 10 Operation display part 20 Storage part 22 Database part 24 Index storage part 25 Root node 26 Branch node 27 Leaf node 30 Control part 31 Reception part 32 Leaf node specific | specification part 33 Calculation part 34 1st determination part 35 2nd determination part 36 Acquisition part 37 identification unit 38 third determination unit 39 fourth determination unit 40 registration unit 41 determination unit 42 deletion target identification unit 43 deletion unit 44 setting unit 100 search device 200 table data

Claims

A database storing a plurality of data information each having a plurality of fields each including a data string;
The partial data string that is a part of the data string registered in the specific field of the plurality of fields, the hash value of the data string, and the data information in which the data string is registered in the specific field An index storage unit for storing index information associated with position information indicating a position in the database;
A receiving unit that receives a search data string indicating the data string used for the search;
A calculation unit for calculating a hash value of the search data sequence;
A first determination unit that determines whether or not the partial data sequence that matches a part of the prosecution data sequence exists in the index storage unit;
When the first determination unit determines that the partial data string that matches a part of the prosecution data string exists in the index storage unit, the hash value corresponding to the partial data string and the search data A second determination unit that determines whether or not the hash value of the column matches;
When the second determination unit determines that the hash value corresponding to the partial data string matches the hash value of the search data string, the position information corresponding to the partial data string is acquired. An acquisition unit;
A search device characterized by that.

The receiving unit receives the data information for registering the index information;
When the reception unit receives the data information, a registration data sequence indicating the data sequence registered in the specific field of the data information, a specification unit for specifying the position information of the data information,
A third determination unit that determines whether or not the partial data string that matches a part of the registered data string exists in the index storage unit;
When the third determination unit determines that the partial data sequence that matches a part of the registration data sequence exists in the index storage unit, the hash value corresponding to the partial data sequence and the registration data A fourth determination unit that determines whether or not the hash value of the column matches;
When the third determination unit determines that the partial data sequence that matches a part of the registered data sequence does not exist in the index storage unit, or the fourth determination unit corresponds to the partial data sequence When it is determined that the hash value and the hash value of the registration data string do not match, a determination unit that determines a registration location of the index information according to a predetermined rule;
A registration unit that registers the index information at the registration location determined by the determination unit,
The search device according to claim 1.

When the fourth determination unit determines that the hash value corresponding to the partial data string and the hash value of the registered data string do not match, the data length of the partial data string of the registered data string is set. , Further comprising a setting unit that sets a value larger than the data length of the partial data string that matches a part of the registered data string;
The search device according to claim 2.

Each of the plurality of data information is information indicating a document.
The search device according to claim 1.

The database stores table data in which the plurality of data information is arranged in parallel.
The search device according to claim 1.

The position information also specifies the position of the specific field in the data information.
The search device according to claim 1.

A first step of accepting a search data string indicating a data string used for the search;
A second step of calculating a hash value of the search data string;
Each of the data strings registered in a specific field of the data information of any of the plurality of data information existing in a database storing a plurality of data information each having a plurality of fields each including a data string In the index storage unit that stores the partial data string that is a part, the hash value of the data string, and the position information that indicates the position of the data information in the database, the index storage unit stores the index data of the prosecution data string A third step of determining whether or not the partial data string that matches a part exists;
If it is determined in the third step that the partial data string that matches a part of the prosecution data string exists in the index storage unit, the hash value corresponding to the partial data string and the search data string A fourth step of determining whether or not the hash value matches;
If it is determined in the fourth step that the hash value corresponding to the partial data string matches the hash value of the search data string, the position information corresponding to the partial data string is acquired. And comprising steps
A search method characterized by that.

A first step of accepting a search data string indicating a data string used for the search;
A second step of calculating a hash value of the search data string;
Each of the data strings registered in a specific field of the data information of any of the plurality of data information existing in a database storing a plurality of data information each having a plurality of fields each including a data string In the index storage unit that stores the partial data string that is a part, the hash value of the data string, and the position information that indicates the position of the data information in the database, the index storage unit stores the index data of the prosecution data string A third step of determining whether or not the partial data string that matches a part exists;
If it is determined in the third step that the partial data string that matches a part of the prosecution data string exists in the index storage unit, the hash value corresponding to the partial data string and the search data string A fourth step of determining whether or not the hash value matches;
If it is determined in the fourth step that the hash value corresponding to the partial data string matches the hash value of the search data string, the position information corresponding to the partial data string is acquired. A program for causing a computer to execute steps.