JPH0785072A

JPH0785072A - Data base selector

Info

Publication number: JPH0785072A
Application number: JP5231930A
Authority: JP
Inventors: Tomohiro Tanaka; 智博田中; Hiroshi Matsuo; 比呂志松尾
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 1993-09-17
Filing date: 1993-09-17
Publication date: 1995-03-31

Abstract

PURPOSE:To provide the data base selector which can automatically select a desired data base and is rich in versatility concerning the addition of a new data base. CONSTITUTION:Concerning each semantic attribute applied to each word provided at a character string analyzing means 2, a point for each data base is applied from a data base decision table preserved in a data base decision table preserving means 5 and the name of a data base, for which the result of summing up the points with respect to the respective data bases of all the semantic attributes provided at the character string analyzing means 2 is more than a prescribed point, is decided.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、データベース選択装置
に係り、特に、利用者の検索要求に適合するデータベー
スを、複数の検索対象のデータベース中より自動的に選
択するデータベース選択装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a database selecting device, and more particularly to a database selecting device which automatically selects a database that meets a user's search request from a plurality of search target databases.

【０００２】[0002]

【従来の技術】従来のデータベース選択装置は、検索要
求に適合するデータベースを、複数の検索対象のデータ
ベースの中から選択する手法として、検索の対象となる
分野の知識と各データベース間の関係を予め構造として
定義しておき、分野を知識に基づいて解析し、さらに、
利用者への問い合わせにより決定することにより、当該
分野と関係付けられたデータベースを選択する（特開平
５−１０１１０１）。2. Description of the Related Art A conventional database selecting apparatus selects a database that meets a search request from a plurality of databases to be searched, in advance, based on the knowledge of the field to be searched and the relationship between the databases. Defined as a structure, analyze the field based on knowledge,
A database associated with the field is selected by making a decision by inquiring the user (Japanese Patent Laid-Open No. 5-101101).

【０００３】図１３は、従来のデータベース選択装置の
構成を示す。同図に示すデータベース選択装置は、検索
分野知識１０、データベース知識１１及び、選択処理部
１２により構成される。FIG. 13 shows the configuration of a conventional database selection device. The database selection device shown in the figure comprises search field knowledge 10, database knowledge 11, and a selection processing unit 12.

【０００４】検索分野知識１０は、所定の分野分類毎
に、複数の検索分野を、所定の関係に基づいて設定され
ている階層を木構造で表現し、木構造の各ノードに検索
分野を位置付けている知識である。The search field knowledge 10 expresses a plurality of search fields for each predetermined field classification in a tree structure of a hierarchy set based on a predetermined relationship, and positions the search field at each node of the tree structure. Is the knowledge.

【０００５】データベース知識１１は、複数のデータベ
ースについて、各データベースと検索分野知識１０で保
持されている検索分野とを対応付けた情報である。この
検索分野を示す情報は、例えば、所要の検索分野を論理
積項の論理和からなる論理式で結合した状態で表す。The database knowledge 11 is information in which a plurality of databases are associated with the search fields held in the search field knowledge 10. The information indicating the search field is represented, for example, in a state in which the required search fields are combined by a logical expression formed by the logical sum of the logical product terms.

【０００６】選択処理部１２は、上記の検索分野知識１
０とデータベース知識１１を用いて以下のような動作を
行うことにより、所望のデータベースを検索する。The selection processing unit 12 uses the search field knowledge 1 described above.
A desired database is searched by performing the following operation using 0 and database knowledge 11.

【０００７】まず、初期状態として、全ての分野の分類
を候補検索分野として、全てのデータベースを候補デー
タベースとする。First, as an initial state, all categories are used as candidate search fields, and all databases are used as candidate databases.

【０００８】データベース知識１１により候補データベ
ースに対応する検索分野が候補検索分野、または、木構
造で候補検索分野の上位または下位につながる検索分野
に該当するとき、当該候補データベースと当該候補検索
分野とが対応するものとし、以下の処理により候補検索
分野が変化しなくなるまで、以下の処理を繰り返す。When the search field corresponding to the candidate database according to the database knowledge 11 corresponds to the candidate search field or the search field connected to the upper or lower part of the candidate search field in the tree structure, the candidate database and the candidate search field are Correspondingly, the following process is repeated until the candidate search field does not change by the following process.

【０００９】候補データベースに対応しない候補検索分
野があれば、当該候補検索分野を削除し、全ての候補デ
ータベースに対応しているか、または、選択されたもの
として削除しても候補データベースが変わらない候補検
索分野があれば、各当該候補検索分野を削除する。木構
造で下位ノードに位置する検索分野があれば、下位ノー
ドがある場合の検索分野を候補検索分野として追加す
る。If there is a candidate search field that does not correspond to the candidate database, the candidate search field is deleted and it corresponds to all candidate databases, or the candidate database does not change even if it is deleted as a selected one. If there is a search field, each candidate search field is deleted. If there is a search field located in a lower node in the tree structure, the search field when there is a lower node is added as a candidate search field.

【００１０】選択処理部１２は、検索者に対して候補検
索分野を表示して、表示された候補検索分野について、
検索者が所望の候補検索分野を選択入力する。また、候
補データベースから選択された候補検索分野に対応しな
い候補データベースを削除する。The selection processing unit 12 displays the candidate search fields to the searcher, and regarding the displayed candidate search fields,
A searcher selects and inputs a desired candidate search field. Further, the candidate database that does not correspond to the candidate search field selected from the candidate database is deleted.

【００１１】上記の処理において、候補データベースの
削除が発生しなくなるか、所定の中止指示を受け取った
ときの候補データベースを選択処理結果とする。In the above processing, the candidate database is not deleted or the candidate database when a predetermined stop instruction is received is set as the selection processing result.

【００１２】[0012]

【発明が解決しようとする課題】しかしながら、上記従
来の技術では、異なる分野のデータベースを新しく追加
する場合には、その知識体系（候補検索分野）を人手に
より再構成する必要がある。また、分野知識とデータベ
ース間の関係付けにおいて、既存の関係と異なった観点
で付与した場合、既存の関係との間に整合性がとれなく
なり、正しい処理ができなくなるなど、新規データベー
スに対して、汎用性に乏しいという問題がある。However, in the above conventional technique, when a database of a different field is newly added, it is necessary to manually reconstruct the knowledge system (candidate search field). In addition, when the relationship between the field knowledge and the database is given from a viewpoint different from the existing relationship, the consistency with the existing relationship is lost and correct processing cannot be performed. There is a problem of poor versatility.

【００１３】本発明は、上記の点に鑑みなされたもの
で、上記従来の問題点を解決し、検索要求のデータベー
スを自動的に選択でき、かつ新規のデータベースの追加
に対して汎用性に富むデータベース選択装置を提供する
ことを目的とする。The present invention has been made in view of the above points, solves the conventional problems described above, can automatically select a database for a search request, and is highly versatile for adding a new database. An object is to provide a database selection device.

【００１４】[0014]

【課題を解決するための手段】図１は、本発明の原理構
成図である。FIG. 1 is a block diagram showing the principle of the present invention.

【００１５】本発明は、利用者の検索要求に適合するデ
ータベースを複数の検索対象データベースの中から自動
的に選択するデータベース選択装置であって、検索者よ
り検索するデータベースに関する文字列を入力するため
の文字列入力手段１と、文字列入力手段１から入力され
た文字列から単語を抽出し、抽出された各単語に対して
意味属性を付与する文字列解析手段２と、各データベー
ス中の文書群から単語を抽出し、抽出した各単語に対し
て、意味属性を付与し、各データベース毎に各意味属性
の出現頻度をカウントする文書解析手段３と、文書解析
手段３によってカウントされた各データベース毎の各意
味属性の出現頻度から、各意味属性毎の各データベース
に対する得点を算出し、意味属性と意味属性の各データ
ベースに対する得点及びデータベースの名称を記述した
データベース決定テーブルを作成するデータベース決定
テーブル作成手段４と、データベース決定テーブル作成
手段４により作成されたデータベース決定テーブルを保
存するデータベース決定テーブル保存手段５と、文字列
解析手段２で得られた各単語に付与されている各意味属
性に対して、データベース決定テーブル保存手段５に保
存されているデータベース決定テーブルを参照して各デ
ータベース毎の得点を付与し、文字列解析手段２で得ら
れたすべての意味属性の各データベースに対する得点を
集計した結果が、所定の得点以上であるデータベースの
名称を決定するデータベース決定手段６と、データベー
ス決定手段６により決定されたデータベースの名称を出
力する出力手段７を有する。The present invention is a database selection device for automatically selecting a database matching a user's search request from a plurality of search target databases, and for inputting a character string relating to the database to be searched by the searcher. Of the character string input means 1, a character string analysis means 2 for extracting words from the character string input from the character string input means 1 and giving a semantic attribute to each extracted word, and a document in each database. A document analysis unit 3 that extracts words from a group, adds a semantic attribute to each extracted word, and counts the appearance frequency of each semantic attribute for each database, and each database counted by the document analysis unit 3. The score for each database for each semantic attribute is calculated from the frequency of occurrence of each semantic attribute for each database, and the score for each database for the semantic attribute and each semantic attribute is calculated. And a database decision table creating means 4 for creating a database decision table describing the name of the database, a database decision table saving means 5 for saving the database decision table created by the database decision table creating means 4, and a character string analyzing means 2 For each semantic attribute given to each word obtained in step 1, a score is given for each database by referring to the database determination table stored in the database determination table storage means 5, and the character string analysis means 2 The result of totaling the scores for each database of all the semantic attributes obtained in step 5 outputs the database determining means 6 for determining the name of the database having a predetermined score or more, and the name of the database determined by the database determining means 6. The output means 7 is provided.

【００１６】[0016]

【作用】本発明は、データベース内の文書群から意味属
性を直接抽出し、その意味属性の出現頻度に基づく得点
からテーブルを作成しておき、そのテーブルを用いてデ
ータベースの選択を行うことにより、新規のデータベー
スが追加された場合でも、そのデータベースに対する意
味属性の得点を算出して、テーブルに追加することによ
り、新規データベースが追加された場合でも、そのデー
タベースに対する意味属性の得点を算出してテーブルに
追加することにより、新規のデータベースの追加に対し
て汎用性に富んだデータベースの選択を行うことができ
る。According to the present invention, the semantic attribute is directly extracted from the document group in the database, the table is created from the score based on the appearance frequency of the semantic attribute, and the database is selected using the table. Even if a new database is added, the score of the semantic attribute for that database is calculated and added to the table. Even if a new database is added, the score of the semantic attribute for that database is calculated and the table is calculated. In addition to the above, it is possible to select a database having high versatility with respect to addition of a new database.

【００１７】[0017]

【実施例】以下、図面とともに、本発明の実施例を詳細
に説明する。Embodiments of the present invention will now be described in detail with reference to the drawings.

【００１８】図２は、本発明の一実施例のデータベース
選択装置の構成を示す。同図に示すデータベース選択装
置は、文字列入力部１、文字列解析部２、文書解析部
３、データベース（以下、ＤＢ）決定テーブル作成部
４、ＤＢ決定テーブル保存部５、ＤＢ決定部６、出力部
７、単語辞書８、ＤＢ決定テーブル９より構成される。FIG. 2 shows the configuration of a database selection device according to an embodiment of the present invention. The database selection device shown in the figure includes a character string input unit 1, a character string analysis unit 2, a document analysis unit 3, a database (hereinafter, DB) determination table creation unit 4, a DB determination table storage unit 5, a DB determination unit 6, It is composed of an output unit 7, a word dictionary 8 and a DB determination table 9.

【００１９】文字列入力部１は、利用者により検索要求
文字列が入力される。In the character string input unit 1, a user inputs a search request character string.

【００２０】文字列解析部２は、文字列入力部１より入
力された検索要求文字列を単語に分割して、単語を抽出
する。さらに、形態素解析処理により抽出した各単語に
対して単語辞書より意味属性を付与し、意味属性の付与
された単語を出力する。なお、形態素解析処理に関して
は、従来より多くの手法が提案されており、ここでは、
その手法は問わない。The character string analysis unit 2 divides the search request character string input from the character string input unit 1 into words and extracts the words. Further, a semantic attribute is added to each word extracted by the morphological analysis processing from the word dictionary, and the word to which the semantic attribute is added is output. Regarding the morphological analysis process, many methods have been proposed, and here,
The method does not matter.

【００２１】文書解析部３は、各ＤＢ中からランダムに
抜き出した複数の文書から、文字列解析部２と同様に抽
出した文書群から単語を抽出し、形態素解析を行い、抽
出した各単語に対して単語辞書８より意味属性を付与
し、各ＤＢ毎に意味属性の出現頻度をカウントする。具
体的には、文書解析部３は、単語の前後関係、出現位置
などの情報に基づいて、キーワードなる単語を抽出する
キーワード自動抽出処理を用いることにより、形態素解
析による単語抽出の代わりに、各ＤＢ中の文書からキー
ワードとなる単語を抽出し、キーワードとなる単語を抽
出し、キーワードとして抽出された単語に、単語辞書８
により意味属性を付与し、各ＤＢごとに意味属性の出現
頻度をカウントすることを許すように構成してもよい。
キーワードの自動抽出処理に関しては、従来より多くの
手法が提案されており、本発明では、その手法の詳細は
問わない。The document analysis unit 3 extracts words from a plurality of documents randomly extracted from each DB from the document group extracted in the same manner as the character string analysis unit 2, performs morphological analysis, and extracts each word. On the other hand, a semantic attribute is added from the word dictionary 8 and the appearance frequency of the semantic attribute is counted for each DB. Specifically, the document analysis unit 3 uses a keyword automatic extraction process for extracting a word that is a keyword based on information such as the context of words and the appearance position, so that instead of word extraction by morphological analysis, A keyword word is extracted from a document in the DB, a keyword word is extracted, and the word dictionary 8 is added to the word extracted as the keyword.
A semantic attribute may be added according to the above, and the appearance frequency of the semantic attribute may be counted for each DB.
Many techniques have been proposed for the automatic keyword extraction process, and the details of the technique are not limited in the present invention.

【００２２】図３は、本発明の一実施例の単語辞書の一
例を示す。単語辞書８は、文字列解析部２及び文書解析
部３で意味属性を付与するために用いられる。図３に示
されるように単語辞書８は、各単語に対して、品詞、意
味属性（１単語に対して複数の記述を許す）を記述した
ものである。例えば、単語が「電話」である場合には、
品詞を“体言”、その意味属性を「電話」、「機器」と
する。FIG. 3 shows an example of a word dictionary according to an embodiment of the present invention. The word dictionary 8 is used by the character string analysis unit 2 and the document analysis unit 3 to add a semantic attribute. As shown in FIG. 3, the word dictionary 8 describes a part of speech and a semantic attribute (a plurality of descriptions are allowed for one word) for each word. For example, if the word is "phone,"
The part-of-speech is “syntax” and its semantic attributes are “telephone” and “device”.

【００２３】ＤＢ決定テーブル作成部４は、文書解析部
３によってカウントされた各意味属性の各ＤＢ毎の出現
頻度から、各意味属性毎に各ＤＢに対する得点を算出
し、意味属性とその意味属性の各ＤＢに対する得点、及
び各ＤＢの名称を記述したＤＢ決定テーブル９を作成す
る。ＤＢ決定テーブル作成部４における得点の算出は、
全体で出現した意味属性の総数に対して、ある意味属性
が特定のＤＢ中に多く出現した場合に、その特定のＤＢ
に対する得点が高くなるような統計処理により行う。The DB determination table creation unit 4 calculates a score for each DB for each meaning attribute from the appearance frequency of each meaning attribute counted by the document analysis unit 3 for each DB, and the meaning attribute and its meaning attribute. The DB determination table 9 describing the score for each DB and the name of each DB is created. The calculation of the score in the DB determination table creation unit 4 is
When a certain semantic attribute appears more frequently in a particular DB than the total number of semantic attributes that have appeared, the specific DB
The statistical processing is performed so that the score for

【００２４】統計処理については、従来より統計学の分
野で数多くの手法が提案されており、それらの手法のう
ち、どの手法を用いてもよい。ここでは、統計処理の一
手法として、以下に示す式（１）、式（２）により算出
した各意味属性の各ＤＢに対する得点により説明する。Regarding the statistical processing, many methods have been conventionally proposed in the field of statistics, and any of these methods may be used. Here, as one method of statistical processing, description will be given by the score for each DB of each semantic attribute calculated by the following equations (1) and (2).

【００２５】[0025]

【数１】 [Equation 1]

【００２６】上記の式において、Ｙ_jkは、意味属性ｊの
ＤＢ_kにおける得点、Ｆ_jkは意味属性ｊのＤＢ_kにおけ
る頻度、Ｍ_jkは、意味属性ｊがＤＢに依存せずにランダ
ムに出現した場合のＤＢ_kにおける頻度（理論度数）、
ｍはＤＢ数、ｎは意味属性の異なり数を表す。In the above equation, Y _jk is the score of the semantic attribute j in DB _k , F _jk is the frequency of the semantic attribute j in DB _k , and M _jk is the random appearance of the semantic attribute j without depending on the DB. Frequency (theoretical frequency) in DB _k when
m is the number of DBs, and n is the number of different semantic attributes.

【００２７】上記の式により、得点がＦがＦ_jk−Ｍ_jk＞
０（ランダムに出現する以上の頻度で出現している）場
合には、高く、Ｆ_jk−Ｍ_jk≦０（ランダムに出現する以
上の頻度で出現していない）場合には、低くなるように
計算される。According to the above equation, the score F is F _jk -M _jk >
When 0 (occurs at a frequency of appearing at random or more), it is high, and when F _jk −M _jk ≦ 0 (does not appear at a frequency of appearing at random), it becomes low. Calculated.

【００２８】図４は、本発明の一実施例のＤＢ決定テー
ブル作成部で作成されるＤＢ決定テーブルの一例を示
す。ＤＢ決定テーブル９は、同図に示すように、各意味
属性に対する各ＤＢの得点を記述したものである。例え
ば、意味属性が「携帯」である場合に、ＤＢ１の「携
帯」という単語の意味属性の出現頻度による得点は、
“２．３”であり、ランダムに出現する以上の頻度で出
現していることを表している。また、ＤＢ２の得点は、
“−０．１”であり、ランダムに出現する以上の頻度で
は、出現していないことを表している。FIG. 4 shows an example of a DB decision table created by the DB decision table creating section according to an embodiment of the present invention. As shown in the figure, the DB determination table 9 describes the score of each DB for each semantic attribute. For example, when the semantic attribute is "mobile", the score according to the appearance frequency of the semantic attribute of the word "mobile" in DB1 is:
The value is "2.3", which means that they appear at a frequency higher than that of random appearance. Also, the score of DB2 is
It is “−0.1”, which means that it does not appear at the frequency of appearing randomly.

【００２９】上記のように、データベース毎の意味属性
の出現頻度による得点によりＤＢ決定テーブル９が生成
される。As described above, the DB determination table 9 is generated by the score according to the appearance frequency of the semantic attribute for each database.

【００３０】ＤＢ決定テーブル保存部５は、ＤＢ決定テ
ーブル作成部４で作成されたＤＢ決定テーブル９を保存
する。ＤＢ決定テーブル９は、対象ＤＢに対して新規追
加、削除等の変更がある時に、ＤＢ決定テーブル作成部
４で新たに作成され、ＤＢ決定テーブル保存部５に保存
されているＤＢ決定テーブル９と書換えられる。対象Ｄ
Ｂに対して、新規追加、削除等の変更がない時は、ＤＢ
決定テーブル作成部４によるＤＢ決定テーブル９の作成
を行わず、ＤＢ決定テーブル保存部５に保存されたＤＢ
決定テーブルを使用する。The DB determination table storage unit 5 stores the DB determination table 9 created by the DB determination table creation unit 4. The DB decision table 9 is created by the DB decision table creation unit 4 when a change such as new addition or deletion is made to the target DB, and the DB decision table 9 stored in the DB decision table storage unit 5 Can be rewritten. Target D
When there is no change such as new addition or deletion for B, DB
The DB stored in the DB determination table storage unit 5 without the generation of the DB determination table 9 by the determination table creation unit 4
Use decision table.

【００３１】ＤＢ決定部６は、文字列解析部２で得られ
た各単語に付与されている各意味属性に対して、ＤＢ決
定テーブル保存部５に保存されているＤＢ決定テーブル
９を用いて、各ＤＢ毎の得点を付与し、文字列解析部２
で得られた全ての意味属性の各ＤＢに対する得点を集計
した結果から予め指定した値以上の得点のＤＢの名称を
出力部７に出力する。The DB determination unit 6 uses the DB determination table 9 stored in the DB determination table storage unit 5 for each semantic attribute given to each word obtained by the character string analysis unit 2. , The score for each DB is added, and the character string analysis unit 2
The name of the DB having a score equal to or greater than the value designated in advance is output to the output unit 7 from the result of totaling the scores of all the semantic attributes obtained for each DB.

【００３２】出力部７は、ＤＢ決定部６から送られたＤ
Ｂの名称を出力する。The output unit 7 receives the D sent from the DB determination unit 6.
The name of B is output.

【００３３】以下に、具体的にデータベースの選択処理
について説明する。ここで、対象とするＤＢは、ＤＢ
１，ＤＢ２，ＤＢ３とし、それぞれのＤＢ中には、図５
に示すような文書が複数格納されているものとし、利用
者の検索要求文字列として、『携帯用の電話』が入力されるものとする。The database selection process will be specifically described below. Here, the target DB is DB
1, DB2, DB3, and in each DB, FIG.
It is assumed that multiple documents such as those shown in are stored, and that "mobile phone" is input as the user search request character string.

【００３４】図６は、本発明の一実施例のデータベース
選択処理の動作を示すフローチャートである。FIG. 6 is a flow chart showing the operation of the database selection processing according to the embodiment of the present invention.

【００３５】ステップ１）文字列入力部１は、利用者か
らの検索要求文字列を入力し、文字列解析部２に転送す
る。Step 1) The character string input unit 1 inputs a search request character string from the user and transfers it to the character string analysis unit 2.

【００３６】ステップ２）文字列解析部２は、図３に示
す単語辞書８を用いて、文字列を単語に分割し、各単語
に対して意味属性を付与し、意味属性の付与された単語
を出力する。図７は、本発明の一実施例の文字列解析部
における解析結果を示す。同図に示すように、形態素解
析により、入力された文字列「携帯用の電話」は、“携
帯”、“用”、“の”、“電話”、“．”に分割され、
それぞれに意味属性『携帯』、『なし』、『なし』、
『電話』・『機器』、『なし』が付与される。従って、
解析結果は、抽出単語“携帯”に対しては意味属性『携
帯』となり、“電話”に対しては意味属性『電話』・
『機器』となる。これらの解析結果は、ＤＢ決定部６に
送出される。Step 2) The character string analysis unit 2 divides the character string into words by using the word dictionary 8 shown in FIG. 3, assigns a semantic attribute to each word, and adds the semantic attribute to the word. Is output. FIG. 7 shows an analysis result in the character string analysis unit according to the embodiment of the present invention. As shown in the figure, by morphological analysis, the input character string "portable phone" is divided into "mobile", "use", "no", "telephone", ".",
Each has a semantic attribute of "mobile", "none", "none",
"Telephone", "Device" and "None" are added. Therefore,
The analysis result has the semantic attribute "mobile" for the extracted word "mobile" and the semantic attribute "telephone" for "phone".
It becomes "equipment". These analysis results are sent to the DB determining unit 6.

【００３７】ステップ３）文書解析部３は、各ＤＢ中の
複数の文書からなる文書群から文字列解析部２と同様の
形態素解析処理により単語を抽出し、抽出した各単語に
ついて意味属性を付与し、各ＤＢ毎に意味属性の出現頻
度をカウントする。対象ＤＢに対して、形態素解析処理
により単語を抽出した場合の文書解析部３の結果を図８
に示す。図８は、本発明の一実施例の文書解析部の解析
結果（その１）を示す。例えば、図８に示すように、図
５に示すＤＢ１の文書群中からランダムに抽出された文
書の一部として、「携帯電話の傾向：携帯用の電話の傾向は、（省略）」「持ち運べる電話（省略）」（「」内が１文書）について、形態素解析処理により、意味属性の付与され
た単語を抽出すると（上記省略部分は省略）、「携帯」
（２つ）、「電話」（３つ）、「傾向」（２つ）、「持
ち運ぶ」（１つ）が得られ、それぞれの意味属性『携
帯』（３つ）、『電話』（３つ）、『機器』（３つ）、
『形成』（２つ）、『運搬』（１つ）が得られる。この
ようにして得られた意味属性の個数をＤＢ１からランダ
ムに抽出された文書全てに対してカウントし、他のＤＢ
に対しても同様の処理を行うことにより、各意味属性の
各ＤＢに対する出現頻度が得られる。Step 3) The document analysis unit 3 extracts words from a document group consisting of a plurality of documents in each DB by the same morphological analysis process as the character string analysis unit 2, and adds a semantic attribute to each extracted word. Then, the appearance frequency of the semantic attribute is counted for each DB. FIG. 8 shows the result of the document analysis unit 3 when a word is extracted from the target DB by the morphological analysis process.
Shown in. FIG. 8 shows an analysis result (No. 1) of the document analysis unit according to the embodiment of the present invention. For example, as shown in FIG. 8, as a part of the documents randomly extracted from the document group of DB1 shown in FIG. 5, “Trend of mobile phone: Trend of mobile phone is (omitted)” “Portable "Phone (omitted)" (1 document in "") is extracted as a word to which a semantic attribute is attached by morphological analysis processing (abbreviated above is omitted), "mobile"
(2), "telephone" (3), "tendency" (2), "carry" (1) are obtained, and each semantic attribute "mobile" (3), "telephone" (3) ), "Equipment" (three),
"Formation" (2) and "Transport" (1) can be obtained. The number of semantic attributes obtained in this way is counted for all documents randomly extracted from DB1 and
By performing the same processing for, the appearance frequency of each semantic attribute for each DB can be obtained.

【００３８】上記のような処理を行うことにより、同図
に示すように、ＤＢについては、同図ａに示すような文
書群が格納されている。この文書群より単語を抽出し、
形態素解析処理により同図ｂに示すように分割された各
々の単語に対して意味属性を付与する。次に、決定され
た意味属性の出現頻度をカウントする。例えば、ＤＢ１
の『運搬』の出現頻度は、１１０回であり、ＤＢ３の
『形成』は８０回出現している。By performing the above-described processing, as shown in the figure, the DB stores a document group as shown in the figure a. Extract words from this document group,
A semantic attribute is given to each word divided by the morphological analysis process as shown in FIG. Next, the appearance frequency of the determined semantic attribute is counted. For example, DB1
The occurrence frequency of “transport” is 110 times, and “formation” of DB3 appears 80 times.

【００３９】また、キーワード自動抽出処理を用いるこ
とにより、文書解析部３は、各ＤＢ中の複数の文書から
なる文書群からキーワード抽出処理により単語を抽出
し、抽出した各単語に対して、意味属性を付与し、各Ｄ
Ｂ毎に意味属性の出現頻度をカウントするようにしても
よい。図９は、本発明の一実施例の文書解析部の解析結
果（その２）を示す。同図は、キーワード自動抽出処理
により単語を抽出した場合の文書解析部３の結果を示
す。Further, by using the keyword automatic extraction process, the document analysis unit 3 extracts a word from the document group consisting of a plurality of documents in each DB by the keyword extraction process, and makes a meaning for each extracted word. Attribute is added to each D
The appearance frequency of the semantic attribute may be counted for each B. FIG. 9 shows an analysis result (No. 2) of the document analysis unit according to the embodiment of the present invention. This figure shows the result of the document analysis unit 3 when a word is extracted by the keyword automatic extraction processing.

【００４０】同図に示すように、図４に示すＤＢ１の文
書群中からランダムに抽出された文書の一部として、「携帯電話の傾向：携帯用の電話の傾向は、（省略）」「持ち運べる電話（省略）」（「」内が１文書）について、キーワード抽出処理により、キーワードとな
る単語を抽出すると（上記省略部分は省略）、キーワー
ドとして登録されていない単語（ここでは、「傾向」、
「持ち運ぶ」）が抽出されず、「携帯」（２つ）、「電
話」（３つ）が得られ、それぞれの意味属性『携帯』
（２つ）、『電話』（３つ）、『機器』（３つ）が得ら
れる。このようにして得られた意味属性の個数をＤＢ１
からランダムに抽出された文書全てに対してカウント
し、他のＤＢに対しても同様の処理を行うことにより、
各意味属性の各ＤＢに対する出現頻度が得られる。文書
解析の結果はＤＢ決定テーブル作成部４に転送される。As shown in the same figure, as a part of the documents randomly extracted from the document group of DB1 shown in FIG. 4, "Tendency of mobile phones: The tendency of mobile phones is (omitted)" If you extract a word that is a keyword (the above omitted part is omitted) by the keyword extraction process for a "portable phone (omitted)" (the one inside ""), the word not registered as a keyword (here, "trend") ,
“Mobile” is not extracted, and “mobile” (2) and “phone” (3) are obtained.
(2), "telephone" (3), and "device" (3) are obtained. The number of semantic attributes obtained in this way is stored in DB1.
By counting all the documents randomly extracted from, and performing the same process for other DB,
The appearance frequency of each semantic attribute for each DB is obtained. The result of the document analysis is transferred to the DB determination table creation unit 4.

【００４１】ステップ４）ＤＢ決定テーブル作成部４
は、文書解析部３の結果から、統計処理により個々の意
味属性の各ＤＢに対する得点を算出し、ＤＢ決定テーブ
ル９を作成する。対象ＤＢに対するＤＢ決定テーブル作
成部４の出力であるＤＢ決定テーブルの一例を図４に示
す。なお、ここでは、説明のため、文書解析部３の結果
として、形態素解析処理により単語を抽出した場合の結
果、即ち、図８の結果を用いている。但し、キーワード
自動抽出処理により単語を抽出した場合の結果、即ち、
図９の結果も同様に扱うことができる。また、ここでの
計算は、各ＤＢから得られた意味属性が図８に示す５種
類（『携帯』、『運搬』、『電話』、『機器』、『形
成』）のみに対して行うものとする。Step 4) DB determination table creating section 4
Calculates the score for each DB of each semantic attribute from the result of the document analysis unit 3 by statistical processing, and creates the DB determination table 9. FIG. 4 shows an example of the DB determination table output from the DB determination table creation unit 4 for the target DB. Here, for the sake of explanation, as a result of the document analysis unit 3, a result when a word is extracted by a morphological analysis process, that is, a result of FIG. 8 is used. However, the result when a word is extracted by the keyword automatic extraction process, that is,
The result of FIG. 9 can be treated similarly. In addition, the calculation here is performed only for the five kinds of semantic attributes obtained from each DB shown in FIG. 8 (“mobile”, “transport”, “telephone”, “apparatus”, and “formation”). And

【００４２】例えば、図８に示す意味属性『携帯』の場
合、まず、全体の各意味属性の各ＤＢに対する出現頻度
と式（２）から各意味属性の各ＤＢに対する頻度Ｍ_jkが
求められ、この頻度Ｍ_jkと、意味属性『携帯』の各ＤＢ
における出現頻度（ＤＢ１では１５０、ＤＢ２では１０
５、ＤＢ３では３８）を式（１）に適用して、意味属性
『携帯』の各ＤＢにおける得点（ＤＢ１では２．３、Ｄ
Ｂ２では−０．１、ＤＢ３では−３．６）が得られる。For example, in the case of the meaning attribute "mobile" shown in FIG. 8, first, the frequency M _{jk of} each meaning attribute with respect to each DB is obtained from the appearance frequency of each meaning attribute with respect to each DB, and Equation (2). This frequency M _jk and each DB of the semantic attribute "mobile"
Appearance frequency (150 in DB1, 10 in DB2
5 and 38 in DB3) is applied to the expression (1), and the score in each DB of the semantic attribute "mobile" (2.3, D in DB1)
In B2, -0.1 is obtained, and in DB3, -3.6) is obtained.

【００４３】ステップ５）ＤＢ決定テーブル保存部５
は、ＤＢ決定テーブル作成部４で作成されたＤＢ決定テ
ーブル９を保存する。Step 5) DB determination table storage unit 5
Saves the DB determination table 9 created by the DB determination table creation unit 4.

【００４４】ステップ６）ＤＢ決定部６は、文字列解析
部２の結果を入力して、入力された各単語の意味属性に
対し、ＤＢ決定テーブル保存部５に保存されているＤＢ
決定テーブル９を用いて、各ＤＢに対する得点を付与
し、入力された全ての意味属性の各ＤＢに対する得点
を、各ＤＢ毎に合計し、その得点の中で正の値を持つ最
も大きな得点で正の値を持つ各ＤＢの得点を割って、所
定の値以上の得点のＤＢの名称を出力する。図１０は、
本発明の一実施例の入力文字列の例に対するＤＢ決定部
の結果を示す。入力文字列の例の場合、意味属性として
『携帯』、『電話』、『機器』が入力され、図４に示す
ＤＢ決定テーブル９を用いて各ＤＢに対して、『携帯』：（２．３，−０．１，−３．６）『電話』：（３．８，２．２，−２７．７）『機器』：（−１．７，０．０，４．６）の得点が得られる。これらを各ＤＢ毎に合計すると、Ｄ
Ｂ１の合計点が４．４、ＤＢ２の合計点が２．１、ＤＢ
３の合計点が−２６．７が得られる。この中で、正の値
を持つ最も大きな得点４．４を持つＤＢ１の得点で各Ｄ
Ｂの得点を除算する。これにより、ＤＢ１が１．００、
ＤＢ２が０．４８という結果が得られ、予め指定した値
が０．５であった場合、０．５より大きな値をもつＤＢ
１を結果として出力する。Step 6) The DB decision unit 6 inputs the result of the character string analysis unit 2 and the DB stored in the DB decision table storage unit 5 for the inputted semantic attribute of each word.
Using the determination table 9, a score is given to each DB, the scores for all DBs of all the inputted semantic attributes are summed up for each DB, and the largest score having a positive value among the scores is given. The score of each DB having a positive value is divided and the name of the DB having a score of a predetermined value or more is output. Figure 10
The result of a DB determination part with respect to the example of the input character string of one Example of this invention is shown. In the case of the example of the input character string, “mobile”, “telephone”, and “device” are input as the semantic attributes, and “mobile”: (2. 3, -0.1, -3.6) "Telephone": (3.8, 2.2, -27.7) "Device": (-1.7, 0.0, 4.6) Is obtained. If these are summed up for each DB, D
The total score of B1 is 4.4, the total score of DB2 is 2.1, DB
A total score of 3 is −26.7. Of these, each D is the score of DB1 having the largest positive score 4.4.
Divide B's score. As a result, DB1 is 1.00,
If the result of DB2 is 0.48 and the pre-specified value is 0.5, DB with a value greater than 0.5
1 is output as a result.

【００４５】ステップ７）出力部７は、ＤＢ決定部６で
決定されたＤＢ１の名称を出力する。Step 7) The output unit 7 outputs the name of DB1 determined by the DB determination unit 6.

【００４６】次に、新規ＤＢとして、ＤＢ４が対象とし
て追加された場合の処理について説明する。Next, the processing when the DB 4 is added as a new DB will be described.

【００４７】新規ＤＢ４中の文書群から、文書解析部３
で形態素解析処理により単語を抽出し、抽出した各単語
に対して、意味属性を付与し、各意味属性の出現頻度を
図８に示す各意味属性の各ＤＢに対する出現頻度に追加
し、ＤＢ決定テーブル作成部４において、得点の計算を
行い、新たにＤＢ決定テーブル９を作成する。From the document group in the new DB 4, the document analysis unit 3
Then, a word is extracted by the morphological analysis process, a semantic attribute is given to each extracted word, the appearance frequency of each semantic attribute is added to the appearance frequency of each semantic attribute shown in FIG. The table creation unit 4 calculates the score and newly creates the DB determination table 9.

【００４８】図１１は、本発明の一実施例の新規ＤＢ追
加後のＤＢ決定テーブルの例を示す。例えば、文書解析
部３で新規ＤＢ４中の文書群から得られたＤＢ４に対す
る各意味属性の出現頻度『携帯』：１２０、『運搬』：
９０、『電話』：１１０、『機器』：６０、『形勢』：
２０を図９に示す出現頻度に追加し、ＤＢ決定テーブル
作成部４におて、全体の各意味属性の各ＤＢに対する出
現頻度と式（２）から各意味属性の各ＤＢに対する頻度
Ｍ_jkが求められ、この頻度Ｍ_jkと、各意味属性の各ＤＢ
における出現頻度を式（１）に適用して、各意味属性の
各ＤＢにおける得点が得られる。意味属性『携帯』の場
合、各ＤＢに対する出現頻度（ＤＢ１で１５０、ＤＢ２
で１０５、ＤＢ３で３８、ＤＢ４で１２０）と、頻度Ｍ
_jkから各ＤＢに対する得点（ＤＢ１で０．５、ＤＢ２で
−０．１０、ＤＢ３で−５．４、ＤＢ４で３．７）が得
られる。FIG. 11 shows an example of a DB determination table after adding a new DB according to an embodiment of the present invention. For example, the frequency of appearance of each semantic attribute in the DB4 obtained from the document group in the new DB4 by the document analysis unit 3 "mobile": 120, "transport":
90, "Telephone": 110, "Device": 60, "Form":
20 is added to the appearance frequency shown in FIG. 9, and in the DB determination table creation unit 4, the appearance frequency for each DB of each semantic attribute of the whole and the frequency M _jk for each DB of each semantic attribute are calculated from Equation (2). Required, this frequency M _jk and each DB of each semantic attribute
The appearance frequency in is applied to the equation (1) to obtain the score in each DB of each semantic attribute. In the case of the semantic attribute "mobile", the appearance frequency for each DB (150 for DB1, DB2
Frequency 105, DB3 38, DB4 120), and frequency M
_The score (0.5 for DB1, -0.10 for DB2, -5.4 for DB3, 3.7 for DB4) for each DB is obtained from _jk .

【００４９】入力文字列の例に対して、ＤＢ決定部６
で、図１１に示す新規ＤＢ追加後のＤＢ決定テーブルを
用いて処理を行った結果を図１２に示す。この場合、意
味属性として、『携帯』、『電話』、『機器』が入力さ
れ、図１１に示すＤＢ決定テーブルを用いて各ＤＢに対
して『携帯』：（０．５、−１．０，−５．４，３．
７）、『電話』：（２．４，１．３，−２９．３，０．
８）、『機器』：（−０．６，０．３，６．７，−２．
６）の得点が得られる。これらの各ＤＢ毎に合計して、
ＤＢ１が２．３、ＤＢ２が０．６、ＤＢ３が−２８．
０、ＤＢ４が１．９という値が得られ、正の値を持つ最
も大きな値２．３（ＤＢ１）で正の値を持つＤＢの得点
を除算し、ＤＢ１が１．００、ＤＢ２が０．２６、ＤＢ
４が０．８３を得る。ここで、予め指定されている値が
０．５である場合に、結果として０．５より大きいＤＢ
１とＤＢ４を出力する。出力部７は、ＤＢ１とＤＢ４の
ＤＢ名称を出力する。For the example of the input character string, the DB determination unit 6
12 shows the result of processing using the DB determination table after the addition of the new DB shown in FIG. In this case, “mobile”, “telephone”, and “device” are input as the semantic attributes, and “mobile”: (0.5, −1.0 for each DB using the DB determination table shown in FIG. 11). , -5.4, 3.
7), "Telephone": (2.4, 1.3, -29.3, 0.
8), "device": (-0.6, 0.3, 6.7, -2.
The score of 6) is obtained. For each of these DBs,
DB1 is 2.3, DB2 is 0.6, and DB3 is -28.
0, DB4 has a value of 1.9, and the DB having a positive value is divided by the largest positive value 2.3 (DB1), and DB1 has a value of 1.00 and DB2 has a value of 0. 26, DB
4 gets 0.83. Here, when the value specified in advance is 0.5, as a result, DB larger than 0.5
1 and DB4 are output. The output unit 7 outputs the DB names of DB1 and DB4.

【００５０】上記のように、本発明は、利用者の検索要
求文字列が入力されると、その文字列より複数の検索対
象ＤＢに対して、各々のＤＢ内の情報から作成したＤＢ
決定用知識として頻度テーブル（ＤＢ決定テーブル）を
用いて、適合するＤＢを選択する。即ち、ＤＢの文書群
の単語に付与された意味属性を用いて、その意味属性の
出現頻度により得点付けを行い、検索要求文字列から得
られた意味属性に対して得点の高いＤＢを結果として出
力することにより、新たにＤＢの追加がある場合でも、
既存の知識（ＤＢ決定テーブル）に対して、新規ＤＢ内
の頻度情報を追加し、数値データの計算を行うだけで、
検索要求対象のＤＢを選択することができ、新規ＤＢの
追加に対して汎用的に用いることができる。As described above, according to the present invention, when a user's search request character string is input, a DB created from the information in each DB for a plurality of search target DBs from the character string.
A matching DB is selected using a frequency table (DB determination table) as the knowledge for determination. That is, by using the semantic attribute added to the words of the document group of the DB, the score is given according to the appearance frequency of the semantic attribute, and the DB having a high score for the semantic attribute obtained from the search request character string is obtained as a result. By outputting, even if a new DB is added,
By adding the frequency information in the new DB to the existing knowledge (DB decision table) and calculating numerical data,
It is possible to select a DB that is a search request target and can be used universally for adding a new DB.

【００５１】[0051]

【発明の効果】上述のように、本発明によれば、ＤＢ内
のデータから意味属性を直接抽出し、その属性の出現頻
度による得点からテーブルを作成することにより、異な
る分野の新規ＤＢが追加された場合でも、そのＤＢに対
する意味属性の得点を算出して、テーブルに追加するこ
とにより、新規ＤＢの追加に対して汎用性に富んだＤＢ
の選択を行うことができる。As described above, according to the present invention, a new DB in a different field is added by directly extracting the semantic attribute from the data in the DB and creating a table from the scores according to the appearance frequency of the attribute. Even if it is done, by calculating the score of the semantic attribute for the DB and adding it to the table, it is possible to add a versatile DB to the addition of a new DB.
Can be selected.

[Brief description of drawings]

【図１】本発明の原理構成図である。FIG. 1 is a principle configuration diagram of the present invention.

【図２】本発明の一実施例のデータベース選択装置の構
成図である。FIG. 2 is a configuration diagram of a database selection device according to an embodiment of the present invention.

【図３】本発明の一実施例の単語辞書の一例を示す図で
ある。FIG. 3 is a diagram showing an example of a word dictionary according to an embodiment of the present invention.

【図４】本発明の一実施例のＤＢ決定テーブル作成部で
作成されるＤＢ決定テーブルの一例を示す図である。FIG. 4 is a diagram showing an example of a DB determination table created by a DB determination table creation unit according to an embodiment of the present invention.

【図５】本発明の一実施例のデータベースの文書例を示
す図である。FIG. 5 is a diagram showing an example of a document in a database according to an embodiment of the present invention.

【図６】本発明の一実施例のデータベース選択処理の動
作を示すフローチャートである。FIG. 6 is a flowchart showing an operation of a database selection process according to an embodiment of the present invention.

【図７】本発明の一実施例の文字列解析部における解析
結果を示す図である。FIG. 7 is a diagram showing an analysis result in a character string analysis unit according to an embodiment of the present invention.

【図８】本発明の一実施例の文書解析部の解析結果を示
す図（その１）である。FIG. 8 is a diagram (part 1) showing an analysis result of the document analysis unit according to the embodiment of the present invention.

【図９】本発明の一実施例の文書解析部の解析結果を示
す図（その２）である。FIG. 9 is a diagram (No. 2) showing the analysis result of the document analysis unit according to the embodiment of the present invention.

【図１０】本発明の一実施例の入力文字列に対するＤＢ
決定部の結果を示す図である。FIG. 10 is a DB for input character strings according to an embodiment of the present invention.
It is a figure which shows the result of a determination part.

【図１１】本発明の一実施例の新規ＤＢ追加後のＤＢ決
定テーブルの例を示す図である。FIG. 11 is a diagram showing an example of a DB determination table after adding a new DB according to an embodiment of the present invention.

【図１２】本発明の一実施例の新規ＤＢ追加後のＤＢ決
定部の結果を示す図である。FIG. 12 is a diagram showing a result of a DB determination unit after adding a new DB according to an embodiment of the present invention.

【図１３】従来のデータベース選択装置の構成図であ
る。FIG. 13 is a configuration diagram of a conventional database selection device.

[Explanation of symbols]

１文字列入力手段、文字列入力部２文字列解析手段、文字列解析部３文書解析手段、文書解析部４データベース決定テーブル作成手段、データベース
決定テーブル作成部５データベース決定テーブル保存手段、データベース
決定テーブル保存部６データベース決定手段、データベース決定部７出力手段、出力部８単語辞書９データベース決定テーブル１０検索分野知識１１選択処理部１２データベース知識1 character string input means, character string input part 2 character string analysis means, character string analysis part 3 document analysis means, document analysis part 4 database decision table creation means, database decision table creation part 5 database decision table storage means, database decision table Storage unit 6 Database determination unit, database determination unit 7 Output unit, output unit 8 Word dictionary 9 Database determination table 10 Search field knowledge 11 Selection processing unit 12 Database knowledge

Claims

[Claims]

1. A database selection device for automatically selecting a database matching a user's search request from a plurality of search target databases, the character being used by a searcher to input a character string relating to the database to be searched. A column input means, a character string analysis means for extracting a word from the character string input from the character string input means, and giving a semantic attribute to each extracted word, and a word from a document group in each database. A document analysis unit that extracts and assigns a semantic attribute to each extracted word, and counts the appearance frequency of each semantic attribute for each database, and each semantic attribute for each database counted by the document analysis unit The score for each database for each semantic attribute is calculated from the appearance frequency of the, and the score and data for each database of the semantic attribute and the semantic attribute are calculated. Database decision table creating means for creating a database decision table describing the name of the database, database decision table saving means for saving the database decision table created by the database decision table creating means, and character string analysis means For each meaning attribute given to each word given, a score is given for each database by referring to the database determination table stored in the database determination table storage means, and the character string analysis means The output of outputting the names of the databases determined by the database determination means for determining the names of the databases whose total score of all the obtained semantic attributes for each database is a predetermined score or more Day characterized by having means Database selection device.