JP2020042722A

JP2020042722A - Data search system and data searching program

Info

Publication number: JP2020042722A
Application number: JP2018171603A
Authority: JP
Inventors: 恵哉生田; Shigeya Ikuta
Original assignee: Toshiba Information Systems Japan Corp
Current assignee: Toshiba Information Systems Japan Corp
Priority date: 2018-09-13
Filing date: 2018-09-13
Publication date: 2020-03-19
Anticipated expiration: 2038-09-13
Also published as: JP6949449B2

Abstract

To provide a data search system allowing for higher-speed searching, and a data searching program.SOLUTION: A data search system includes: first crawler collection means 503 which searches a database 300 in which data to be managed are accumulated, to generate a first index file 501 having a first index table formed by attributing information to be searched in content data of a corresponding table to main key information, as attribute information; main key information acquisition means 205 which searches the first index file 501 in response to a keyword for search, to obtain main key information; database search means 201 which searches the database 300 on the basis of the main key information obtained by the main key information acquisition means 205; and display control means 206 which performs display based on the data acquired by the database search means 201 on display means.SELECTED DRAWING: Figure 1

Description

この発明は、データ検索システム及びデータ検索用プログラムに関するものである。 The present invention relates to a data search system and a data search program.

従来、大量データを擁するリレーショナルデータベースなどのデータベースシステムにおいて曖昧検索を行う場合には、極めて多くの時間を要するという問題があった。 Heretofore, there has been a problem that it takes an extremely long time to perform an ambiguous search in a database system such as a relational database having a large amount of data.

特許文献１には、全文検索エンジンとＲＤＢ（リレーショナルデータベース）を用いて高速検索前処理の状態情報を用いることで、検索時間を短縮することができる文書検索装置が開示されている。 Patent Document 1 discloses a document search device that can reduce a search time by using state information of high-speed search preprocessing using a full-text search engine and an RDB (relational database).

具体的には、複数の文書とフォルダに対し、パターンマッチング部と全文検索部とを備えて、検索対象の種類と状態とを判断して、上記パターンマッチング部と全文検索部とのいずれかにより検索を行うというものである。この特許文献１の発明は、パターンマッチング部は即時検索可能であるが検索時間がかかるというという特徴があり、全文検索部は登録に時間がかかるが検索時間が高速であるという特徴があることに鑑み、これらを切換えてそれぞれを有利な場面で使用するものである。 Specifically, a pattern matching unit and a full-text search unit are provided for a plurality of documents and folders, and the type and state of the search target are determined, and one of the pattern matching unit and the full-text search unit is used. Searching. The invention of Patent Literature 1 is characterized in that the pattern matching unit can perform an immediate search but takes a long search time, and the full-text search unit has a feature that the registration takes a long time but the search time is fast. In view of this, these are switched and each is used in an advantageous situation.

特許文献２には、複数のクライアントと通信可能な全文検索エンジンは文書登録時に全文検索エンジンが文書へのアクセス権を有するクライアントの識別符号（ユーザＩＤ）に制御文字（例えば区切り文字）を付加した文字列を上記文書と共に格納すること、また、全文検索エンジンは文書へのアクセス要求時に検索語に対して上記アクセス要求したクライアントの識別符号に上記制御文字を付与した文字列を追加して全文検索を実行することが、開示されている。 According to Patent Document 2, a full-text search engine capable of communicating with a plurality of clients adds a control character (for example, a delimiter) to an identification code (user ID) of a client having access to the document when the document is registered. The character string is stored together with the document, and the full-text search engine adds a character string obtained by adding the control character to the identification code of the client that has made the access request to the search word when the document is requested to be accessed for full-text search. Is disclosed.

更に上記特許文献２の発明では、全文検索エンジンが、文書本体とは別に当該文書の属性値を格納するカラムを有しており、全文検索エンジンは、文書登録時にクライアントの識別符号と制御文字とからなる文字列を文書の属性値としてカラムに格納し、文書へのアクセス要求時に前記アクセス要求したクライアントの識別符号に制御文字を付与した文字列を上記カラムに対する検索語として全文検索する。以上により、全文検索と共にアクセス制御が同時にできることになる、というものである。 Further, in the invention of Patent Document 2, the full-text search engine has a column for storing the attribute value of the document separately from the document main body. Is stored in a column as an attribute value of the document, and a full-text search is performed as a search term for the column by using a character string in which a control character is added to the identification code of the client who has made the access request when the document is requested to be accessed. As described above, the access control can be performed simultaneously with the full-text search.

また、特許文献３には、金融情報検索システムとして、各銘柄について記載された文書データを保持する文書ＤＢを所定タイミングでクローリングして全文検索用の文書インデックスを作成するクローラを備える検索エンジンが開示されている。この検索エンジンは更に、営業端末から受け付けた検索要求に対して文書インデックスからマッチする文書データに係るレコードのうち、上位の所定の件数を検索結果として応答する検索処理部を有しており、営業端末から受け付けた検索要求においてキーワードの指定がなされていない場合に、検索エンジンによる検索ではなく、文書ＤＢに対して直接に検索処理を行うＤＢ検索部を有するというものである。 Patent Document 3 discloses, as a financial information search system, a search engine including a crawler that crawls a document DB holding document data describing each brand at a predetermined timing and creates a document index for full-text search. Have been. The search engine further includes a search processing unit that responds to the search request received from the sales terminal with a predetermined number of high-order records as search results among records related to the document data that matches from the document index. If a keyword is not specified in a search request received from a terminal, a DB search unit that performs a search process directly on the document DB, instead of a search by a search engine, is provided.

上記特許文献３の発明によれば、検索エンジンによる検索の際にキーワードが指定されていない場合においても、検索結果に対するソートの条件で上位の所定件数に入り得るデータが漏れずに表示されるようになる、という効果を奏することになる。 According to the invention of Patent Literature 3, even when a keyword is not specified at the time of a search by a search engine, data that can be included in a predetermined number of upper ranks in a sort condition for search results is displayed without omission. Will be achieved.

更に、特許文献４には、入力部によって入力されたユーザ指定の検索キーワードのデータ構造上の特徴（データ型等）を解析するデータ型解析部と、リレーショナルデータベースに格納されている検索の対象となるテーブルの各カラムのうち、上記解析された検索キーワードのデータ構造上の特徴（データ型等）に合致するカラムを、上記検索対象列として検出する検索対象列検出部とを備える検索対象列決定装置が開示されている。 Further, Patent Literature 4 discloses a data type analysis unit that analyzes a data structure characteristic (such as a data type) of a user-specified search keyword input by an input unit, and a search target stored in a relational database. A search target column detection unit that detects, as the search target column, a column that matches a data structure characteristic (such as a data type) of the analyzed search keyword among the columns of the table. An apparatus is disclosed.

上記特許文献４の発明は、検索キーワードのデータ構造上の特徴から全文検索の対象とすべきカラムを動的に絞ることにより、全文検索時の応答性能を向上させるというものである。 The invention of Patent Document 4 described above improves response performance at the time of full-text search by dynamically narrowing columns to be subjected to full-text search based on characteristics of the data structure of a search keyword.

特開２００６−７９４２３号公報JP 2006-79423 A 特開２００９−１６９７３６号公報JP 2009-169736 A 特開２０１５−１８５０１３号公報JP-A-2015-185013 特開２０１０−６７２１３号公報JP 2010-67213 A

本発明は、上記のような検索システムより以上に高速な検索を可能とするデータ検索システム及びデータ検索用プログラムを提供することを目的とする。 SUMMARY OF THE INVENTION It is an object of the present invention to provide a data search system and a data search program that enable a higher-speed search than the above search system.

本発明に係るデータ検索システムは、管理対象データが蓄積されたデータベースと、前記データベースを検索し、前記管理対象データの１単位であるテーブルを特定し、特定された全てのテーブルについてのユニークな値を主キー情報として、この主キー情報に該当テーブルの内容データ中の検索対象情報を属性情報として帰属させた第１のインデックステーブルを作成し、この第１のインデックステーブルをまとめた第１のインデックスファイルを生成する第１のクローラ収集手段と、検索すべきキーワードが与えられると、前記第１のインデックスファイルを検索して、当該キーワードに対応するデータを備える第１のインデックステーブルを検出してこの第１のインデックステーブルの主キー情報を求める主キー情報取得手段と、前記主キー情報取得手段が求めた主キー情報に基づき前記データベースを検索し、得られた該当テーブルから前記キーワードに対応するデータを取り出すデータベース検索手段と、情報を表示する表示手段と、前記データベース検索手段が取り出したデータに基づく表示を前記表示手段に行う表示制御手段とを具備することを特徴とする。 A data search system according to the present invention searches a database in which data to be managed is stored and the database, identifies a table that is one unit of the data to be managed, and provides a unique value for all identified tables. Is used as primary key information to create a first index table in which search target information in the content data of the corresponding table is attributed to the primary key information as attribute information, and a first index that summarizes the first index table is created. Given first crawler collecting means for generating a file and a keyword to be searched, the first index file is searched to find a first index table including data corresponding to the keyword, and Primary key information obtaining means for obtaining primary key information of a first index table; A database search unit that searches the database based on the primary key information obtained by the key information acquisition unit and retrieves data corresponding to the keyword from the obtained table, a display unit that displays information, and the database search unit. Display control means for performing display based on the extracted data on the display means.

本発明に係るデータ検索システムの実施形態の構成を示すブロック図。FIG. 1 is a block diagram showing a configuration of an embodiment of a data search system according to the present invention. 本発明に係るデータ検索システムの実施形態において用いられるデータベースの内容の一例を示す図。The figure which shows an example of the content of the database used in embodiment of the data search system which concerns on this invention. 本発明に係るデータ検索システムの実施形態において、データベースのテーブルから第１のインデックスファイル内の第１のインデックステーブルを作成する過程の一例を示す図。FIG. 7 is a diagram showing an example of a process of creating a first index table in a first index file from a table of a database in the embodiment of the data search system according to the present invention. 本発明に係るデータ検索システムの実施形態において用いられるファイル装置に蓄積された添付ファイルの内容の一例を示す図。FIG. 4 is a diagram showing an example of the contents of an attached file stored in a file device used in the embodiment of the data search system according to the present invention. 本発明に係るデータ検索システムの実施形態において、ファイル装置の添付ファイルから第２のインデックスファイル内の第２のインデックステーブルを作成する過程の一例を示す図。FIG. 7 is a diagram showing an example of a process of creating a second index table in a second index file from an attached file of a file device in the embodiment of the data search system according to the present invention. 本発明に係るデータ検索システムの実施形態の動作を示すフローチャート。5 is a flowchart showing the operation of the embodiment of the data search system according to the present invention.

以下添付図面を参照して、本発明に係るデータ検索システム及びデータ検索用プログラムの実施形態を説明する。各図において、同一の構成要素には同一の符号を付して重複する説明を省略する。図１には、本発明に係るデータ検索システムの実施形態の構成図が示されている。実施形態に係るデータ検索システムは、種々のデータが蓄積されたデータベース３００と、このデータベース３００の全ての文の検索を行う全文検索エンジン５００とが備えられている。データベース３００としては、例えばリレーショナルデータベースを採用することができる。 Hereinafter, an embodiment of a data search system and a data search program according to the present invention will be described with reference to the accompanying drawings. In the respective drawings, the same components are denoted by the same reference numerals, and redundant description will be omitted. FIG. 1 shows a configuration diagram of an embodiment of a data search system according to the present invention. The data search system according to the embodiment includes a database 300 in which various data are accumulated, and a full-text search engine 500 that searches all sentences in the database 300. As the database 300, for example, a relational database can be adopted.

ここでは、データベース３００が蓄積している管理対象データが商品カタログのデータであり、例えば、図２に示されるように１単位の管理対象データのテーブルＤ１１、Ｄ１２、Ｄ１３、・・・、Ｄ１ｎが蓄積されている。１つのテーブルについては、ユニークな値に対し、必要な項目が複数配置された構造を有する。本実施形態では、ユニークな値はテーブルの先頭に配置されている商品番号であり、各テーブルＤ１１、Ｄ１２、Ｄ１３、・・・、Ｄ１ｎには、項目のデータとして、「商品名」、「商品名カナ」、「荷姿」、「取扱開始日」、「取扱終了日」、・・・と並んでいる。なお、項目内のデータの並び順は、例示に過ぎない。 Here, the management target data stored in the database 300 is the data of the product catalog. For example, as shown in FIG. 2, one unit of management target data tables D11, D12, D13,. Has been accumulated. One table has a structure in which a plurality of necessary items are arranged for unique values. In the present embodiment, the unique value is a product number arranged at the top of the table, and each of the tables D11, D12, D13,. "Kana", "Package", "Handling start date", "Handling end date", ... Note that the order of the data in the item is merely an example.

全文検索エンジン５００には、第１のクローラ収集手段５０３が備えられている。第１のクローラ収集手段５０３は、上記データベース３００を検索し、上記管理対象データの１単位であるテーブルを特定し、特定された全てのテーブルについてのユニークな値を主キー情報として、この主キー情報に該当テーブルの内容データ中の検索対象情報を属性情報として帰属させた第１のインデックステーブルを作成し、この第１のインデックステーブルをまとめた第１のインデックスファイル５０１を生成するものである。 The full-text search engine 500 includes a first crawler collection unit 503. The first crawler collection unit 503 searches the database 300, specifies a table that is one unit of the management target data, and sets a unique value for all specified tables as primary key information. A first index table in which information to be searched in the content data of the corresponding table is attributed to the information as attribute information is created, and a first index file 501 in which the first index tables are put together is generated.

既に説明したように、データベース３００のテーブルＤ１１には、商品番号と、項目のデータとして、「商品名」、「商品名カナ」、「荷姿」、「取扱開始日」、「取扱終了日」、・・・が記憶されているので、第１のクローラ収集手段５０３は上記「商品名」、「商品名カナ」、「荷姿」、「取扱開始日」、「取扱終了日」、・・・というデータから、ＳＱＬ（Structured Query Language）で指定した検索対象情報に該当する文字列の項目（カラム）の情報を検索して、帰属情報として帰属させて第１のインデックステーブルを作成する。 As described above, in the table D11 of the database 300, the product number and the data of the item include “product name”, “product name Kana”, “package”, “handling start date”, “handling end date”. Are stored, the first crawler collecting means 503 stores the above-mentioned “product name”, “product name Kana”, “package”, “handling start date”, “handling end date”,. From the data, the information of the item (column) of the character string corresponding to the search target information specified by the SQL (Structured Query Language) is searched, and the first index table is created by belonging as the belonging information.

図３に、テーブルＤ１１から第１のインデックステーブルＤ４１を作成する過程を示す。ここでは、ＳＱＬにより指定された検索対象情報に「商品番号」「商品名」「商品カナ」が該当したことを示す。テーブルＤ１２〜Ｄ１ｎについても同様に検索が行われ、ＳＱＬにより指定された検索対象情報に該当する情報が項目（カラム）に含まれていたテーブルに対応してインデックステーブルが作成される。従って、テーブルＤ１１〜Ｄ１ｎの全てについてインデックステーブルが作成される訳ではない。例えば、テーブルＤ１２には、ＳＱＬにより指定された検索対象情報の「商品番号」「商品名」「商品カナ」が含まれていないので、このテーブルに対応するインデックステーブルは作成されない。以上のようにして作成された幾つかのインデックステーブルが全て１つにまとめられて第１のインデックスファイル５０１とされる。 FIG. 3 shows a process of creating the first index table D41 from the table D11. Here, it is indicated that “product number”, “product name”, and “product kana” correspond to the search target information specified by SQL. Searches are similarly performed on the tables D12 to D1n, and an index table is created corresponding to a table in which information (column) corresponding to the search target information specified by SQL is included. Therefore, an index table is not created for all of the tables D11 to D1n. For example, since the table D12 does not include “product number”, “product name”, and “product kana” of the search target information specified by SQL, an index table corresponding to this table is not created. Several index tables created as described above are all combined into one to be a first index file 501.

本実施形態では、データベース３００のデータのディレクトリ配下の添付ファイルが蓄積されたファイル装置４００が設けられている。例えば、ファイル装置４００中の１つの添付ファイルＤ２１は図４に示すようであり、図２に示したテーブルＤ１１のディレクトリ配下の添付ファイルである。そして、添付ファイルＤ２１のユニークな値は、テーブルＤ１１のディレクトリ配下を示すためにテーブルＤ１１と同じ「商品番号」に対し、この添付ファイルＤ２１にユニークな「連番」が付加されたものとなっている。添付ファイルＤ２１には、上記ユニークな値である「商品番号＿連番」以外に、この商品番号に対応付けられているパンフレットや取扱説明書などのデータが保存されている。ファイル装置４００には、この添付ファイルＤ２１と同様に複数の添付ファイルが蓄積されており、その添付ファイルにユニークな「商品番号＿連番」と共に、この商品番号に対応付けられているパンフレットや取扱説明書などのデータが保存されている。この添付ファイルはそれぞれ、データベース３００が蓄積しているテーブルＤ１１、Ｄ１２、Ｄ１３、・・・、Ｄ１ｎのディレクトリ配下として記憶されている。なお、添付ファイルは、テーブルＤ１１、Ｄ１２、Ｄ１３、・・・、Ｄ１ｎの全てに必ずしも対応付けられているものではなく、添付ファイルが対応付けられていないテーブルも存在する。また、添付ファイルの識別子であるユニークな識別情報である「商品番号＿連番」には、「連番」となっていることからも明らかな通り、データベース３００中の１つのテーブルに対して複数の添付ファイルが存在していても良く、この場合は「連番」の部分は「０１」、「０２」、・・・となる。 In the present embodiment, a file device 400 is provided in which an attached file under the data directory of the database 300 is stored. For example, one attached file D21 in the file device 400 is as shown in FIG. 4, and is an attached file under the directory of the table D11 shown in FIG. The unique value of the attached file D21 is obtained by adding a unique “serial number” to the attached file D21 to the same “product number” as the table D11 in order to indicate the directory under the directory of the table D11. I have. In the attached file D21, in addition to the unique value “product number_serial number”, data such as a pamphlet or an instruction manual associated with the product number is stored. A plurality of attached files are stored in the file device 400 in the same manner as the attached file D21. The attached file has a unique “article number_serial number” and a pamphlet or a handling item associated with the article number. Data such as instructions are stored. Each of the attached files is stored under the directory of the tables D11, D12, D13,..., D1n stored in the database 300. Note that the attached file is not always associated with all of the tables D11, D12, D13,..., D1n, and there are tables where the attached file is not associated. In addition, as is clear from the “serial number” in the “article number_serial number” that is the unique identification information that is the identifier of the attached file, a plurality of May be present, and in this case, the “serial number” portion is “01”, “02”,...

全文検索エンジン５００には、第２のクローラ収集手段５０４が備えられている。第２のクローラ収集手段５０４は、上記ファイル装置４００内を検索して、ユニークな値を識別情報に該当添付ファイルの上記所要データを帰属させた第２のインデックステーブルを作成し、この第２のインデックステーブルをまとめた第２のインデックスファイル５０２を生成するものである。 The full-text search engine 500 includes a second crawler collection unit 504. The second crawler collecting means 504 searches the file device 400, creates a second index table in which the unique data is assigned to the identification data and the required data of the attached file, and creates a second index table. A second index file 502 in which the index tables are put together is generated.

既に説明したように、ファイル装置４００には、添付ファイルＤ２１・・・等が蓄積されており、その添付ファイルにユニークな「商品番号＿連番」と共に、この商品番号に対応付けられているパンフレットや取扱説明書などのデータが保存されている。第２のクローラ収集手段５０４は、「商品番号＿連番」に対応付けられているパンフレットや取扱説明書などのデータから、ＳＱＬ（Structured Query Language）で指定された検索対象情報に該当する文字列のデータを帰属情報として帰属させて第２のインデックステーブルを作成する。このＳＱＬ（Structured Query Language）で指定された検索対象情報は、第１のインデックステーブルを作成するときに用いた検索対象情報と異なっても良い。上記第１のクローラ収集手段５０３及び上記第２のクローラ収集手段５０４は、形態素解析とＮ−Ｇｒａｍのいずれかにより検索を行う構成とすることができる。 As described above, the file device 400 stores the attached files D21..., Etc., and the pamphlet associated with this product number together with the unique “product number_serial number” in the attached file. And data such as instruction manuals are stored. The second crawler collection unit 504 extracts a character string corresponding to search target information specified by SQL (Structured Query Language) from data such as a pamphlet or an instruction manual associated with “product number_serial number”. The second index table is created by associating the data of FIG. The search target information specified by the SQL (Structured Query Language) may be different from the search target information used when creating the first index table. The first crawler collection unit 503 and the second crawler collection unit 504 can be configured to perform a search by either morphological analysis or N-Gram.

図５には、ファイル装置４００の添付ファイルＤ２１から第２のインデックステーブルＤ４２を作成する過程を示す。ここでは、ＳＱＬにより指定された検索対象情報は、パンフレットや取扱説明書などのデータが該当したことを示す。第２のインデックステーブルＤ４２は、添付ファイルＤ２１のユニークな値である「商品番号＿連番」を識別情報とし、この識別情報に該当添付ファイルのＳＱＬにより指定された検索対象情報に該当したパンフレットや取扱説明書などのデータを帰属させた第２のインデックステーブルを作成する。ファイル装置４００中の添付ファイルＤ２１以外の図示しない添付ファイルについても同様に検索が行われ、ＳＱＬにより指定された検索対象情報に該当するデータがヒットした場合には、この添付ファイルの識別情報である「商品番号＿連番」に対応してインデックステーブルが作成される。ＳＱＬにより指定された検索対象情報に該当するデータがヒットしない場合には、インデックステーブルは作成されない。従って、ファイル装置４００内の全ての添付ファイルについてインデックステーブルが作成される訳ではない。以上のようにして作成された幾つかのインデックステーブルが全て１つにまとめられて第２のインデックスファイル５０２とされる。 FIG. 5 shows a process of creating the second index table D42 from the attached file D21 of the file device 400. Here, the search target information specified by SQL indicates that data such as a pamphlet or an instruction manual is applicable. The second index table D42 uses “product number_serial number”, which is a unique value of the attached file D21, as identification information. The pamphlet corresponding to the search target information specified by the SQL of the attached file corresponds to the identification information. A second index table to which data such as an instruction manual is assigned is created. A similar search is also performed for an attached file (not shown) other than the attached file D21 in the file device 400. If data corresponding to the search target information specified by SQL is hit, this is identification information of the attached file. An index table is created corresponding to “product number_serial number”. If no data corresponding to the search target information specified by SQL is hit, no index table is created. Therefore, an index table is not created for all attached files in the file device 400. The several index tables created as described above are all combined into one to form the second index file 502.

本実施形態では、本検索装置２００が設けられる。本検索装置２００の「本」の意味は、全文検索エンジン５００によっても検索が行われるため、本来的な検索要求が到来してからの検索が、この本検索装置２００において行われることを示すものである。この本検索装置２００には、検索端末１０１から検索要求とキーワードが与えられる。検索端末１０１としては、ネットワーク等により接続されるパーソナルコンピュータやワークステーション、携帯電話機やスマートフォン等の移動携帯端末などを採用することができる。 In the present embodiment, the present search device 200 is provided. The meaning of “book” in the present search device 200 indicates that the search is also performed by the full-text search engine 500, so that the search after the original search request arrives is performed in the book search device 200. It is. This search device 200 is provided with a search request and a keyword from the search terminal 101. As the search terminal 101, a personal computer or a workstation connected via a network or the like, or a mobile portable terminal such as a mobile phone or a smartphone can be employed.

本検索装置２００には、主キー情報取得手段２０５とデータベース検索手段２０１とが備えられている。主キー情報取得手段２０５は、検索すべきキーワードが与えられると、上記第１のインデックスファイル５０１を検索して、当該キーワードに対応するデータを備える第１のインデックステーブルを検出してこの第１のインデックステーブルの主キー情報を求めるものである。具体的は、主キー情報取得手段２０５は、全文検索エンジン５００へ検索要求を与えて第１のインデックスファイル５０１の検索を行わせ、主キー情報を得るものである。第１のインデックスファイル５０１には、「商品番号」を主キー情報とし、属性情報が属する第１のインデックステーブルが複数格納されているので、この第１のインデックステーブル全てについてキーワードを用いて検索を行い、該当するキーワードが含まれる第１のインデックステーブルを求めて、その主キー情報である「商品番号」を求める。従って、全文検索エンジン５００による第１のインデックスファイル５０１の検索が終了すると、幾つかの主キー情報である「商品番号」が求まっているか、キーワードにヒットする属性情報が無かったために、「商品番号」が求まっていないかである。この情報は上記主キー情報取得手段２０５へ送られる。 The search device 200 includes a primary key information acquisition unit 205 and a database search unit 201. When a keyword to be searched is given, the primary key information obtaining means 205 searches the first index file 501, detects a first index table including data corresponding to the keyword, and detects the first index table. This is for obtaining primary key information of the index table. More specifically, the primary key information obtaining unit 205 obtains primary key information by giving a search request to the full-text search engine 500 to cause the first index file 501 to be searched. In the first index file 501, a plurality of first index tables to which attribute information belongs are stored using "product number" as main key information. Therefore, all the first index tables are searched using a keyword. Then, the first index table including the corresponding keyword is obtained, and the “product number” that is the main key information is obtained. Therefore, when the search of the first index file 501 by the full-text search engine 500 is completed, the "product number", which is some primary key information, has been obtained or the attribute information that hits the keyword has not been found. Is not found. This information is sent to the main key information acquisition means 205.

データベース検索手段２０１は、上記主キー情報取得手段２０５が求めた主キー情報に基づき上記データベース３００を検索し、得られた該当テーブルから上記キーワードに対応するデータを取り出すものである。つまり、データベース検索手段２０１が主キー情報に基づき上記データベース３００を検索するとき、該当のデータが存在しているデータベース３００のテーブルへと高速に確実に行き着くので、このテーブルからキーワードに対応する所望のデータを取り出すことができる。 The database search unit 201 searches the database 300 based on the primary key information obtained by the primary key information acquisition unit 205, and extracts data corresponding to the keyword from the obtained corresponding table. In other words, when the database search means 201 searches the database 300 based on the primary key information, it quickly and reliably reaches a table of the database 300 in which the corresponding data exists. Data can be retrieved.

本検索装置２００には、表示制御手段２０６が設けられている。上記データベース検索手段２０１により得られたデータは、表示制御手段２０６へ送られる。表示制御手段２０６は、上記データベース検索手段２０１が取り出したデータに基づく表示を検索端末１０１に送って、その表示手段において表示を行うようにする。 The search device 200 includes a display control unit 206. The data obtained by the database search means 201 is sent to the display control means 206. The display control means 206 sends a display based on the data retrieved by the database search means 201 to the search terminal 101 so that the display means performs the display.

表示制御手段２０６には、表示データ加工手段２０３と表示処理手段２０４が設けられている。表示データ加工手段２０３は、上記データベース検索手段２０１により得られたたデータ（キーワードに対応してヒットしたデータ）を検索端末１０１に一覧表示するデータとして加工する処理を行う。表示処理手段２０４は、上記加工されたデータを検索端末１０１の表示手段（ＬＥＤ等の表示器）に表示可能な表示データとして送出する。 The display control means 206 includes a display data processing means 203 and a display processing means 204. The display data processing unit 203 performs a process of processing the data (the data hit according to the keyword) obtained by the database search unit 201 as data to be displayed as a list on the search terminal 101. The display processing unit 204 sends the processed data as display data that can be displayed on a display unit (a display such as an LED) of the search terminal 101.

更に、本検索装置２００には、識別情報取得手段２０７と添付ファイル検索手段２０２とが備えられている。識別情報取得手段２０７は、検索すべきキーワードが与えられると、上記第２のインデックスファイル５０２を検索して、当該キーワードに対応するデータを備える第２のインデックステーブルを検出してこの第２のインデックステーブルの識別情報を求めるものである。具体的には、識別情報取得手段２０７は、全文検索エンジン５００へ検索要求を与えて第２のインデックスファイル５０２を検索させて、識別情報を得るものである。第２のインデックスファイル５０２には、「商品番号＿連番」を識別情報とし、パンフレットや取扱説明書などのデータを帰属させた第２のインデックステーブルが複数格納されているので、この第２のインデックステーブル全てについてキーワードを用いて検索を行い、該当するキーワードが含まれる第２のインデックステーブルを求めて、その識別情報である「商品番号＿連番」を求める。従って、全文検索エンジン５００による第２のインデックスファイル５０２の検索が終了すると、幾つかの識別情報である「商品番号＿連番」が求まっているか、キーワードにヒットするデータを有する第２のインデックステーブルが無かったために、「商品番号＿連番」が求まっていないかである。この情報は上記識別情報取得手段２０７へ送られる。 Further, the search device 200 includes an identification information acquisition unit 207 and an attached file search unit 202. When a keyword to be searched is given, the identification information obtaining means 207 searches the second index file 502, detects a second index table including data corresponding to the keyword, and detects the second index table. This is for obtaining the identification information of the table. More specifically, the identification information obtaining unit 207 obtains identification information by giving a search request to the full-text search engine 500 to search the second index file 502. Since the second index file 502 stores a plurality of second index tables to which data such as pamphlets and instruction manuals are assigned using “product number_serial number” as identification information, the second index file is stored in the second index file 502. A search is performed for all of the index tables using keywords, a second index table that includes the corresponding keyword is obtained, and “product number_serial number” as its identification information is obtained. Therefore, when the search of the second index file 502 by the full-text search engine 500 is completed, some identification information “product number_serial number” is obtained, or the second index table having data that hits a keyword is obtained. Is missing, so that "product number_serial number" has not been determined. This information is sent to the identification information acquisition means 207.

添付ファイル検索手段２０２は、上記識別情報取得手段２０７が求めた識別情報に基づき上記ファイル装置４００を検索し、得られた該当添付ファイルから上記キーワードに対応するデータを取り出すものである。つまり、添付ファイル検索手段２０２が求めた識別情報に基づき上記ファイル装置４００を検索するとき、該当するデータが存在しているファイル装置４００のテーブルへと高速に確実に行き着くので、このテーブルからキーワードに対応する所望のデータを取り出すことができる。 The attached file search unit 202 searches the file device 400 based on the identification information obtained by the identification information acquisition unit 207, and extracts data corresponding to the keyword from the obtained attached file. In other words, when searching the file device 400 based on the identification information obtained by the attached file search means 202, the user can quickly and reliably reach the table of the file device 400 in which the corresponding data exists. The corresponding desired data can be retrieved.

このようにして取り出されたデータは、表示制御手段２０６へ送られ、表示制御手段２０６は、上記データベース検索手段２０１が取り出したデータに基づく表示を検索端末１０１に送って、その表示手段において表示を行うようにする。 The data extracted in this way is sent to the display control means 206, and the display control means 206 sends a display based on the data extracted by the database search means 201 to the search terminal 101, and displays the display on the display means. To do.

上記添付ファイル検索手段２０２が取り出したデータについて、表示データ加工手段２０３は、上記データベース検索手段２０１により得られたデータと共に加工を行う。例えば、検索端末１０１に一覧表示するデータ内に、添付ファイル検索手段２０２が取り出したパンフレットや取扱説明書などのデータを最小限個別に含ませて加工する処理を行う。表示処理手段２０４は、上記加工されたデータを検索端末１０１の表示手段（ＬＥＤ等の表示器）に表示可能な表示データとして送出する。 The display data processing unit 203 processes the data extracted by the attached file search unit 202 together with the data obtained by the database search unit 201. For example, a process is performed in which data such as a pamphlet or an instruction manual extracted by the attached file search means 202 is included at least individually in data displayed in a list on the search terminal 101 and processed. The display processing unit 204 sends the processed data as display data that can be displayed on a display unit (a display such as an LED) of the search terminal 101.

以上のような構成において、上記第１のクローラ収集手段５０３は、任意の時刻に処理を行うことができる。例えば、午前０時からの６時間中において１分間隔で処理を行うことができる。また、第２のクローラ収集手段５０４についても、上記と同様に任意の時刻に処理を行うことができる。 In the above configuration, the first crawler collection unit 503 can perform processing at any time. For example, processing can be performed at one-minute intervals during six hours from midnight. Also, the second crawler collecting means 504 can perform processing at an arbitrary time in the same manner as described above.

上記のように、第１のクローラ収集手段５０３により第１のインデックスファイル５０１に第１のインデックステーブルが収集されており、第２のクローラ収集手段５０４により第２のインデックスファイル５０２に第２のインデックステーブルが収集されている。このときに、検索端末１０１からキーワードの検索要求がされると、図６に示すフローチャートに示すような処理が行われる。 As described above, the first index table is collected in the first index file 501 by the first crawler collecting unit 503, and the second index file is stored in the second index file 502 by the second crawler collecting unit 504. Tables are being collected. At this time, when a search request for a keyword is made from the search terminal 101, processing as shown in the flowchart in FIG. 6 is performed.

検索端末１０１からキーワードの検索要求がされると、検索処理が開始される。検索端末１０１から与えられたキーワードに基づき第１のインデックスファイル５０１に対する検索を行い、該当する第１のインデックステーブルにおける主キー情報である「商品番号」を取得して、内部メモリテーブルＡ（図１）へ格納する（Ｓ１１）。 When a search request for a keyword is made from the search terminal 101, a search process is started. The first index file 501 is searched based on the keyword given from the search terminal 101, and the “product number” which is the primary key information in the corresponding first index table is acquired, and the internal memory table A (FIG. ) (S11).

次に、検索端末１０１から与えられたキーワードに基づき第２のインデックスファイル５０２に対する検索を行い、該当する第２のインデックステーブルにおける識別情報である「商品番号＿連番」を取得して、「商品番号」部分のみを内部メモリテーブルＢ（図１）へ格納する（Ｓ１２）。 Next, a search is performed on the second index file 502 based on the keyword provided from the search terminal 101, and “product number_serial number” which is identification information in the corresponding second index table is obtained. Only the "number" portion is stored in the internal memory table B (FIG. 1) (S12).

次に、内部メモリテーブルＡの主キー情報と内部メモリテーブルＢの論理和を作成し、内部メモリテーブルＣ（図１）に格納する（Ｓ１３）。つまり、同じ「商品番号」があれば１つとする。このようにして、データベース３００とファイル装置４００を曖昧検索し、一致した主キー情報（「商品番号」）を高速に得ることができる。次に、内部メモリテーブルＣの主キー情報のみを用いてデータベース３００へアクセスし、得られた該当テーブルから上記キーワードに対応するデータを取り出すものである（Ｓ１４：データベース検索手段２０１）。 Next, a logical sum of the primary key information of the internal memory table A and the internal memory table B is created and stored in the internal memory table C (FIG. 1) (S13). That is, if there is the same “product number”, it is determined to be one. In this way, the database 300 and the file device 400 are vaguely searched, and the matched primary key information (“product number”) can be obtained at high speed. Next, the database 300 is accessed using only the primary key information of the internal memory table C, and data corresponding to the keyword is extracted from the obtained table (S14: database search means 201).

上記で取り出されたデータを加工して（Ｓ１５：表示データ加工手段２０３）、検索端末１０１へ送って表示手段へ表示する（Ｓ１６：表示処理手段２０４）。 The data extracted above is processed (S15: display data processing means 203), sent to the search terminal 101 and displayed on the display means (S16: display processing means 204).

上記表示制御手段２０６は、上記添付ファイル検索手段が取り出したデータがない場合には、上記データベース検索手段が取り出したデータのみに基づく表示を前記表示手段に行うことができる。また、上記表示制御手段２０６は、上記データベース検索手段２０１が取り出したデータ及び上記添付ファイル検索手段２０２が取り出したデータがない場合には、検索結果が得られないことの表示を上記表示手段に行うことができる。 If there is no data retrieved by the attached file retrieval means, the display control means 206 can provide a display based on only the data retrieved by the database retrieval means on the display means. When there is no data retrieved by the database retrieval means 201 and no data retrieved by the attached file retrieval means 202, the display control means 206 displays on the display means that no retrieval result is obtained. be able to.

なお、本実施形態では、識別情報である「商品番号＿連番」を用いてファイル装置４００を検索してデータを得る処理を行わないが、内部メモリテーブルＢの情報を用いてファイル装置４００から添付ファイル検索手段２０２による検索を行って、得られたデータを表示データ加工手段２０３が加工して一覧表示する情報としてまとめても良い。これにより、この情報を用いファイル装置４００へアクセスし対応するパンフレットや取扱説明書などのデータをダウンロードして表示することが可能となる。 In this embodiment, the process of searching the file device 400 using the “product number_serial number” that is the identification information to obtain the data is not performed, but the process performed by the file device 400 using the information of the internal memory table B is not performed. A search by the attached file search unit 202 may be performed, and the obtained data may be processed by the display data processing unit 203 and collected as information to be displayed in a list. This makes it possible to access the file device 400 using this information and download and display the corresponding data such as pamphlets and instruction manuals.

上記の構成に対し本実施形態は、データベース検索手段２０１が第１のインデックスファイル５０１の検索結果である主キー情報を用いるのではなく、主キー情報取得手段２０５の処理の処理を行わずに直接にデータベース３００をＳＱＬによって検索する構成を採ることはない。即ち、このような構成によると、検索端末１０１からのキーワードで列項目である「商品番号」、「商品名」、「商品名カナ」をそれぞれ前方後方中間一致の条件により検索する必要が生じ、データベースシステムとしては負荷が大きくレスポンスの悪いものとなる。検索キーワードによる検索が、データベース３００における複数テーブルに跨るようなものである場合には、直接にデータベース３００を検索するためのＳＱＬが非常に複雑となることから、この点においても上記構成を本実施形態では採用していないことが理解できる。 In contrast to the above configuration, in the present embodiment, the database search unit 201 does not use the primary key information that is the search result of the first index file 501, but directly performs the processing of the process of the primary key information acquisition unit 205. Does not employ a configuration in which the database 300 is searched by SQL. That is, according to such a configuration, it becomes necessary to search for the column items “product number”, “product name”, and “product name Kana” using the keyword from the search terminal 101 under the conditions of front and rear middle match, respectively. As a database system, the load is large and the response is poor. When the search by the search keyword is performed over a plurality of tables in the database 300, the SQL for directly searching the database 300 becomes very complicated. It can be understood that this is not adopted in the form.

以上のように本実施形態によれば、データベース及び全文検索エンジンに特殊・特別な変形・加工を加えることなく、データベースの各テーブルに分散する項目（カラム）を検索対象項目にした曖昧処理により、目的のデータを効率良く高速に検索できる効果（本実施形態の効果という）を奏する。 As described above, according to the present embodiment, without applying any special, special deformation or processing to the database and the full-text search engine, the ambiguous processing in which the items (columns) distributed in each table of the database are used as the search target items, There is an effect that the target data can be searched efficiently and at high speed (referred to as an effect of the present embodiment).

なお、上記の実施形態は、添付ファイルが存在する場合の構成であるが、添付ファイルが無く、ファイル装置４００を備えない構成を採用することができる。この場合には、ファイル装置４００以外に、第２のクローラ収集手段５０４、第２のインデックスファイル５０２、識別情報取得手段２０７、添付ファイル検索手段２０２は不要である。この場合においても、上記実施形態の効果と同じ効果を得ることが可能である。 Note that the above embodiment is a configuration in a case where an attached file exists, but a configuration without an attached file and not having the file device 400 can be adopted. In this case, other than the file device 400, the second crawler collection unit 504, the second index file 502, the identification information acquisition unit 207, and the attached file search unit 202 are unnecessary. Also in this case, it is possible to obtain the same effect as that of the above embodiment.

１０１検索端末
２００本検索装置
２０１データベース検索手段
２０２添付ファイル検索手段
２０３表示データ加工手段
２０４表示処理手段
２０５主キー情報取得手段
２０６表示制御手段
２０７識別情報取得手段
３００データベース
４００ファイル装置
５００全文検索エンジン
５０１第１のインデックスファイル
５０２第２のインデックスファイル
５０３第１のクローラ収集手段
５０４第２のクローラ収集手段 101 search terminal 200 main search device 201 database search means 202 attached file search means 203 display data processing means 204 display processing means 205 primary key information acquisition means 206 display control means 207 identification information acquisition means 300 database 400 file device 500 full text search engine 501 First index file 502 Second index file 503 First crawler collecting means 504 Second crawler collecting means

Claims

A database in which managed data is stored,
The database is searched, a table which is one unit of the management target data is specified, and a unique value for all specified tables is set as primary key information. First crawler collecting means for creating a first index table in which the target information is attributed as attribute information, and generating a first index file in which the first index table is put together;
When a keyword to be searched is given, the first index file is searched, a first index table including data corresponding to the keyword is detected, and a main key information of the first index table is obtained. Key information acquisition means;
A database search unit that searches the database based on the primary key information obtained by the primary key information acquisition unit and retrieves data corresponding to the keyword from the obtained corresponding table;
Display means for displaying information;
A display control means for performing a display on the display means based on the data retrieved by the database search means.

A file device in which attached files under a directory of data of the database table are stored;
The file device is searched to collect required data of the attached file by crawler, and a second index table is created in which a unique value is assigned to identification information to the required data of the attached file. Second crawler collection means for generating a second index file that summarizes the index tables of
When a keyword to be searched is given, the second index file is searched, a second index table including data corresponding to the keyword is detected, and identification information for identifying the second index table is obtained. Acquisition means;
Attached file search means for searching the file device based on the identification information obtained by the identification information obtaining means, and extracting data corresponding to the keyword from the obtained attached file,
2. The data search system according to claim 1, wherein the display unit performs a display on the display unit based on the data extracted by the attached file search unit.

2. The display control unit according to claim 1, wherein when there is no data extracted by the attached file search unit, the display control unit performs display based on only the data extracted by the database search unit on the display unit. 3. Data retrieval system.

When there is no data retrieved by the database retrieval unit and no data retrieved by the attached file retrieval unit, the display control unit displays on the display unit that no retrieval result is obtained. The data search system according to claim 2.

The data search system according to claim 2, wherein the first crawler collection unit and the second crawler collection unit perform a search by one of morphological analysis and N-Gram.

2. The data search system according to claim 1, wherein the first crawler collection unit performs processing at an arbitrary time.

3. The data search system according to claim 2, wherein the first crawler collection unit and the second crawler collection unit perform processing at an arbitrary time.

A computer of a data search system that searches for data in a database in which data to be managed is stored,
The database is searched, a table which is one unit of the management target data is specified, and a unique value for all specified tables is set as primary key information. A first crawler collection unit that creates a first index table in which the target information is attributed as attribute information, and generates a first index file in which the first index table is put together;
When a keyword to be searched is given, the first index file is searched, a first index table including data corresponding to the keyword is detected, and a main key information of the first index table is obtained. Key information acquisition means,
A database search unit that searches the database based on the primary key information obtained by the primary key information acquisition unit and retrieves data corresponding to the keyword from the obtained corresponding table;
A data search program for functioning as display control means for causing a display means to perform display based on data retrieved by the database search means.

Further comprising the computer
The required data of the attached file is crawled and collected by searching the file device in which the attached file under the directory of the data of the database table is stored, and the required data of the attached file is identified by a unique value as identification information. A second crawler collecting means for creating an imputed second index table and generating a second index file summarizing the second index table;
When a keyword to be searched is given, the second index file is searched, a second index table including data corresponding to the keyword is detected, and identification information for identifying the second index table is obtained. Acquisition means,
Searching the file device based on the identification information obtained by the identification information acquisition means, and functioning as an attachment file search means for extracting data corresponding to the keyword from the obtained attachment file;
9. The data search program according to claim 8, wherein the computer is caused to function as the display control unit so that the display unit performs a display based on the data extracted by the attached file search unit.

The computer as the display control unit, wherein when there is no data retrieved by the attached file retrieval unit, the computer is made to function to display on the display unit based only on the data retrieved by the database retrieval unit. The data search program according to claim 8,

When the computer is used as the display control unit, if there is no data retrieved by the database retrieval unit and no data retrieved by the attached file retrieval unit, a display indicating that a retrieval result cannot be obtained is displayed on the display unit. The data search program according to claim 9, wherein the program is caused to function.

10. The data search according to claim 9, wherein the computer functions as the first crawler collection unit and the second crawler collection unit to perform a search by one of morphological analysis and N-Gram. system.

9. The data search program according to claim 8, wherein the computer functions as the first crawler collection unit to perform processing at an arbitrary time.

The computer-readable storage medium according to claim 9, wherein the computer functions as the first crawler collection unit and the second crawler collection unit to perform processing at an arbitrary time.