JP2004234582A

JP2004234582A - Dictionary construction method, system, and screen

Info

Publication number: JP2004234582A
Application number: JP2003025359A
Authority: JP
Inventors: Ichiro Harashima; 一郎原島; Norito Watanabe; 範人渡辺; Hiroyuki Yuji; 弘幸湯地
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 2003-02-03
Filing date: 2003-02-03
Publication date: 2004-08-19

Abstract

<P>PROBLEM TO BE SOLVED: To construct a dictionary with further high utility value by easily constructing a field term dictionary which has required an enormous man-hour for work in the past by comparison of a term extracted from history data of a retrieval function with an original dictionary term, and enabling adaptation of a term actually used at present as a registration candidate of the dictionary by using retrieval history data. <P>SOLUTION: This system comprises a means for extracting and storing a retrieval keyword or other retrieval attribute information from the use history data of the retrieval function; a means for comparing an existing dictionary or term classification data of dictionary with the extracted retrieval keyword and extracting and storing only terms not overlapped; a means for displaying the existing dictionary or the term classification data; a means for narrowing and displaying a dictionary registration candidate from the term data from which duplication is removed; an editing means for associating the narrowed registration candidate term with a term in the existing dictionary or the term classification data; and a means for storing the editing results. <P>COPYRIGHT: (C)2004,JPO&NCIPI

Description

【０００１】
【発明の属する技術分野】
本発明は辞書を構築する辞書構築方法，辞書構築システム，画面装置に関する。
【０００２】
【従来の技術】
インターネットやドキュメント管理システムにおいて、情報ソースとなるドキュメントやデータファイル等のコンテンツの量が膨大になってくると、ユーザが必要とする情報を入手するための手間も一般に多くなる。そこで、少ない手間で必要な情報を入手するために、ユーザが利用したい情報の分野毎の専門用語、および、その上位概念や下位概念，別名，類義語等の用語間の関係をあらかじめコンピュータが理解可能なように辞書データベース化しておき、この辞書データベースを情報の検索や抽出，分類に利用している。
【０００３】
しかしながら、上記のような専門用語の辞書データベースを構築する作業は、従来、その分野の有識者により人手で行われ、その作業工数も語数に応じてかなり多いという課題があった。具体的には、一般的な辞書構築方法としては、専門分野のドキュメントを入力として、これを自動的に用語に切り出す処理（形態素解析処理）を行って得られた用語集合に対して、人手で不要用語除去や分類作業を行っていた。
【０００４】
これらの人手作業を低減するために、特開平１１−２９６５４９号公報では概念情報の辞書編集のためのユーザインタフェースについて記載されており、特に関連度を用いて関連する概念情報の候補を一覧する方法が記載されている。
【０００５】
【特許文献１】
特開平１１−２９６５４９号公報
【０００６】
【発明が解決しようとする課題】
従来技術では、特定分野のドキュメント、あるいはドキュメント群を入力として形態素解析を行い、得られた用語集合を用語間の関連度合い等を利用して分類、あるいは分類候補を提示している。
【０００７】
しかしながら、入力を特定分野のドキュメントとすることにより、以下の課題がある。
【０００８】
まず、ドキュメントは用語の集合体であることから、専門用語を抽出する処理として形態素解析処理を用いる必要があるが、これにより一般にノイズ（不要用語）除去の手間が発生し、ドキュメントの規模に応じて増大する傾向にある。このノイズ除去にドキュメント中の用語出現頻度等のパラメータが使われる場合があるが、出現頻度が極端に多い、あるいは少ないことと、専門用語である可能性との関連性は一概に言えない。
【０００９】
また、用語の出現頻度等で一律に傾向を把握することはできても、古い用語と最新の用語を区別することはできず、用語の鮮度維持という観点では従来技術は利用できない。
【００１０】
そこで、本発明の目的は、辞書構築工数を低減する辞書構築方法，システム及び画面を提供することである。
【００１１】
【課題を解決するための手段】
本発明の一つの特徴は、辞書を構築する方法において、検索履歴情報から抽出された検索キーワード又は検索属性情報から、辞書を構築することである。
【００１２】
なお、本発明のその他の特徴は本願特許請求の範囲に記載のとおりである。
【００１３】
【発明の実施の形態】
以下、図面を用いて本発明の実施の形態を説明する。
【００１４】
第１の実施例は、検索者１０（ユーザ）が検索機能を利用した際の検索履歴データを利用して、用語辞書の構築支援を行う例であり、図１はその一例である。
【００１５】
本実施例における検索機能とは、ファイルシステム，ドキュメント管理システム，メールシステム，インターネット等の検索エンジン等において、ユーザが必要とするファイル内の情報をキーワードを入力することで検索する機能を意味する。
【００１６】
ここで「ファイル」とは、ワードプロセッサやエディタ等で作成されたドキュメントデータやＨＴＭＬ（ＨｙｐｅｒＴｅｘｔＭａｒｋｕｐＬａｎｇｕａｇｅ），ＸＭＬ（ｅＸｔｅｎｓｉｂｌｅＭａｒｋｕｐＬａｎｇｕａｇｅ）等のインターネットにおける標準的なドキュメントデータ，ソフトウェアを記述するプログラムデータ，形状データ，解析データ，画像データ，動画データ等、データの１単位を意味する。
【００１７】
また、検索キーワードはキーボード入力，音声入力等、最終的に単語として電子化できるものを意味する。
【００１８】
まず、ユーザ検索者１０が検索機能１１を用いて自由に検索を行う。具体的には、ユーザがインターネットの検索エンジンを使用することである。これにより、検索履歴データ１２が得られる。なお、検索者１０が検索機能１１を用いて検索する回数が多いほど、検索履歴データ１２の量が多く、より充実した辞書を構築することが可能である。
【００１９】
そこで、この検索履歴データ１２から、検索キーワード抽出・保存処理部１３は、検索キーワードを抽出し、その結果を第１登録候補用語データ１４としてデータベースに格納する。次に、用語比較処理部１６は、すでに専門用語辞書が存在する場合は、これを編集前用語データ１５として、第１登録候補用語データ１４と用語の文字列パタンマッチングを行い、編集前用語データ１５に存在しない登録候補用語を抽出し、その結果を第２登録候補用語データ１７としてデータベースに格納する。
【００２０】
また、初回利用で編集前用語データ１５が存在しない場合は、基本となる用語分類データを作成して、編集前用語データ１５としてもよい。
【００２１】
これにより、古い用語と最新の用語を区別することが可能となり、用語の鮮度維持をすることができる。
【００２２】
次に、用語構成表示・編集処理部１８では、最初に編集前用語データ１５を読み込み、次に、辞書編集者１９の指定する絞り込み条件に従って、第２登録候補用語データ１７を読み込む。
【００２３】
ここで登録候補用語の絞り込み方法の例としては、
［表記］［意味］
Ｘ＊：先頭にＸがつく用語すべて
Ｘ？？？：Ｘの後に任意の３文字が続く
等の正規表現を利用して文字列マッチングを行う方法がある。
【００２４】
また、用語構成表示とは、文字列の用語データは一般に上位語，下位語等の分類階層を持つことから、ツリー形式で画面上に表示することを意味する。
【００２５】
以降、辞書編集者の操作により、画面上で、第２登録候補用語データから選択された用語を、ツリー形式で表示された編集前用語データ１５の中の最適なノード（用語）の下に追加することで、用語編集を実行する。
【００２６】
最後に、編集終了後は、編集結果を編集後用語データ１００としてデータベースに保存する。
【００２７】
なお、上記の説明では編集前用語データ１５と編集後用語データ１００は区別したが、一つのデータとして、編集後に上書きしてもよい。
【００２８】
以上の実施の形態より、検索履歴情報から作業工数を少なくて、用語鮮度の高い辞書を構築することが可能となる。
【００２９】
第２の実施例は、第１の実施例に、さらに検索属性データを利用して、辞書編集効率の向上をねらった例であり、図２はその一例である。
【００３０】
検索機能１１によって出力された検索履歴データ１２から、検索キーワード，検索属性データ抽出・保存処理部２０は、検索キーワード、および検索属性データを抽出し、その結果を、検索キーワードは第１登録候補用語データ２１として、検索属性データは２３としてデータベースに格納する。この際、２１と２３のデータ間は用語ＩＤ等で関連付けておく。
【００３１】
ここで検索属性データとは、検索者が検索機能を利用して検索した際の日時，ヒット数等、１回の検索操作に関する情報である。また、検索者を特定できるデータ、たとえば、使用マシンのＩＤ（ＩＰアドレス等）や、システムへのログイン情報から得られるユーザ情報も検索属性データに含まれる。
【００３２】
用語構成表示・編集処理部２４では、最初に編集前用語データ１５を読み込み、次に、辞書編集者１９の指定する絞り込み条件に従って、第２登録候補用語データ２２を読み込む。
【００３３】
ここで登録候補用語の絞り込みとしては、第一の実施例の正規表現を絞り込み条件とする方法の他に、検索日時，検索者，検索ヒット率等の検索属性データ
２３を用いて絞り込み条件とする。検索条件の例としては以下の通り。
検索日時：２０００年１月１日〜２００１年１２月３１日
検索者：山田太郎
検索ヒット数：１０件未満（または以上）
また、システムのユーザ管理情報からユーザの組織情報が得られる場合は、上記の検索者の部分に会社・部・課等の組織情報を指定してもよい。これにより、たとえば「Ａ会社」向け，「Ｂ設計部」向けといった専門辞書の構築が容易になる。また、検索ヒット数を絞り込み条件として用いることにより、まだ、あまり一般的に使われない用語、あるいは逆に、すでに一般的に使われている用語をある程度絞り込める。また、検索ヒット数０件の場合は、検索キーワードが正しくない可能性が高いと判断してもよい。
【００３４】
第３の実施例は、第２の実施例における第１登録候補用語データ２１，第２登録候補用語データ２２，検索属性データ２３のデータ構造の一例であり、図３はその一例である。
【００３５】
テーブル３０は第２の実施例における第１登録候補用語データ２１，第２登録候補用語データ２２のデータ構造で、登録候補用語ＩＤ３１と登録候補用語３２を一つの行として対応付ける。
【００３６】
さらにテーブル３３は第２の実施例における検索属性データ２３のデータ構造で、登録候補用語ＩＤ３４と検索日時３５，検索者３６、等の検索属性データを一つの行として対応付ける。ここでは検索者３６を識別するために、マシンの
ＩＰアドレスを利用している。
【００３７】
このようなデータ構造にすることによって、新たな検索属性項目の追加が容易になる。
【００３８】
第４の実施例は、第２の実施例における検索機能利用時、および辞書構築時の処理の流れを示し、図４はその一例である。
【００３９】
検索機能利用時は、処理４０のように、検索者１０が検索機能を利用して情報を検索した履歴を検索履歴データ１２として保存する。
【００４０】
一方、辞書構築時には、最初に、処理４１のように、検索履歴データ１２から検索キーワード，検索属性データを抽出し、検索キーワードを第１登録候補用語データ２１，検索属性データ２３として保存する。
【００４１】
次に、処理４２のように、編集前用語データ１５と第１登録候補用語データ
２１を比較して、編集前用語データ１５に存在しない登録候補用語を抽出し、第２登録候補用語データ２２として保存する。
【００４２】
さらに、辞書編集時には、処理４３のように、第２登録候補用語データ２２と編集前用語データ１５を読み込んで表示し、辞書編集者１９がその表示を受け付けて、指示を行うことにより用語の編集が行われる。編集後、処理４４のように、編集結果を編集後用語データ１００として保存する。
【００４３】
第５の実施例は、第１，第２の実施例における辞書編集画面の例を示し、図５はその一例である。
【００４４】
画面５０は、編集前用語データ１５をツリー表示・編集するエリア５７（画面左側）と、登録候補用語を選択するエリア（画面右側）に分かれる。
【００４５】
登録候補用語絞り込み条件指定エリア５１では、第１の実施例で示した用語の正規表現による絞り込み条件や、第２の実施例で示した検索属性データによる絞り込み条件を入力する。絞り込み条件入力後、登録候補用語絞り込みボタン５２をマウス等のポインティングデバイスでクリックすることにより、登録候補絞り込み処理が実行される。
【００４６】
次に、表示条件指定エリア５３では、上記で絞り込まれた用語の表示順序等、表示条件を指定する。表示条件としては、単純な用語一覧表示で降順，昇順の他、文字列パタンマッチングにより階層化して表示する方法等がある。表示条件入力後、表示条件反映ボタン５４をマウス等のポインティングデバイスでクリックすることにより、表示反映処理が実行され、登録候補用語選択エリア５５に結果が表示される。
【００４７】
登録候補用語選択エリア５５で、辞書編集者が辞書に登録したい用語５６をマウス等のポインティングデバイスで選択し、これを辞書用語編集エリア５７の該当すると思われる用語５８のところにドラッグ＆ドロップする。その結果、辞書用語編集エリア５７のドロップ先の用語５８の下位階層に、用語５６が追加される。
【００４８】
なお、辞書用語編集エリア５７内でも用語の移動がドラッグ＆ドロップで任意に行え、不要な用語があれば、用語選択後、用語削除ボタン５９をマウス等のポインティングデバイスでクリックすることにより、削除可能である。
【００４９】
最終的に辞書編集作業が終了した時点で、辞書登録ボタン５００をマウス等のポインティングデバイスでクリックすることにより、編集後用語データとして保存される。
【００５０】
これにより、辞書登録用語を登録する際に、編集前用語データ１５（既存の辞書）の用語と関連付けて辞書登録用語を登録することが可能となる。
【００５１】
第６の実施例は、第５の実施例における辞書用語編集エリアの別の画面例を示し、図６はその一例である。
【００５２】
登録候補用語選択エリア５５で、辞書編集者が辞書に登録したい用語５６をマウス等のポインティングデバイスで選択すると、その用語５６が辞書用語編集エリア６０の中央部に表示される。
【００５３】
同時に、用語５６と編集前用語データの各用語との文字列パタンマッチングにより、文字列一致度を算出する。たとえば、登録候補用語が「ＸＸＺ装置」の場合、「Ｘ装置」との文字列一致度の例としては、
一致文字検出方向：後方
位置が一致した文字数：ａ＝２文字
それ以外に一致した文字：ｂ＝１文字
（文字列一致度）＝ｗ１×ａ＋ｗ２×ｂ
ここで、ｗ１，ｗ２は重み（０以上の数値）で、一般に、ｗ１＞ｗ２とする。辞書用語編集エリア６０の中央部からの距離は、例として、
（距離）＝１／（文字列一致度）
のように、文字列一致度の逆数を用いる方法がある。
【００５４】
以上の方法により辞書用語編集エリア６０で各用語の配置位置を決定すると、用語６１に類似する用語が中央部近くに、類似しない用語が遠くに表示される。
【００５５】
辞書編集者は用語６１の近くに表示されている用語の中から該当すると思われる用語６２のところにドラッグ＆ドロップする。その結果、辞書用語編集エリアのドロップ先の用語６２の下位階層に、用語６１が追加される。
【００５６】
これにより、辞書登録用語を登録する際に、編集前用語データ１５（既存の辞書）に含まれる一致度の高い用語と関連付けて辞書登録用語を登録することが可能となる。
【００５７】
第７の実施例は、本発明を辞書構築サービスに適用した場合のシステム構成例であり、図７はその一例である。
【００５８】
検索サービスを行うための検索エンジン用サーバ７２は、インターネット等のネットワーク７４を介して、情報ソース７３から検索用のインデックスを生成しておく。また、情報検索者７６は検索用クライアント７５を通じて、ネットワーク７４を介して検索エンジン用サーバ７２にアクセスする。ここで、システム管理者７０は管理用クライアント７１を通じて検索エンジン用サーバ７２を管理している。
【００５９】
辞書構築サービスを受ける者は、同じ組織に属する情報検索者７６の検索履歴データ７８を、辞書構築サービス提供者７９に提供することを許可する。
【００６０】
辞書構築サービス提供者７９は検索履歴データ７８、および利用者データ７７を、本発明ですでに述べた辞書構築支援システム７００に取り込み、専門用語辞書を構築する。
【００６１】
サービス料金は、最終成果物である辞書の語数や情報検索者７６の人数等でランク分けして設定してもよい。
【００６２】
これにより、顧客に対して、円滑な辞書構築サービスを行うことが可能となる。
【００６３】
以上により、様々な実施の形態について説明したが、これを実現する装置は、専用の装置として構成することも可能であるが、図８に例示するように、キーボード８１と、前述したようなデータや処理プログラムを入力する入力手段，入力されたデータやプログラムをデータベースとして蓄積する記憶部，演算部などを備えたコンピュータ本体８２と、ディスプレイ８３で構成される汎用のコンピュータシステムとその上で稼働する処理プログラムによって実現することが可能である。
【００６４】
このような汎用のコンピュータシステムに処理プログラムを付加して実現するときには、処理プログラムは図９に例示するような磁気ディスク９１や図１０に例示するようなＣＤ−ＲＯＭ１０１などのメディアに記録して配送，保管，実装され、コンピュータ本体８２に設けた磁気ディスク読み取り装置やＣＤ−ＲＯＭ読み取り装置によって読み取って該コンピュータ本体８２内に取り込まれる。通信ネットワークを通じて配送される処理プログラムを入力手段によって取り込んで実現する場合には、取り込んだ処理プログラムを磁気ディスク等のメディアに記憶させて保存することにより、繰り返し使用できるようにする。
【００６５】
【発明の効果】
本発明によれば、辞書構築工数を低減する辞書構築方法，システム及び画面を提供できる。
【図面の簡単な説明】
【図１】本発明の実施例において、ユーザが検索機能を利用した際の検索履歴データを利用して、用語辞書の構築支援を実現するための機能ブロック図の一例である。
【図２】図１において、検索属性データを利用して、辞書編集効率の向上を実現する機能ブロック図の一例である。
【図３】図１，図２におけるデータベースのデータ構造図の一例である。
【図４】図２における検索機能利用時、および辞書構築時の処理の流れを表すフロー図の一例である。
【図５】第１，第２の実施例における辞書編集画面の一例である。
【図６】図５における辞書用語編集エリアの別画面の一例である。
【図７】本発明を辞書構築サービスに適用した場合のシステム構成例である。
【図８】コンピュータシステムの一例。
【図９】磁気ディスクの一例。
【図１０】ＣＤ−ＲＯＭの一例。
【符号の説明】
１０，３６…検索者、１１…検索機能、１２，７８…検索履歴データ、１３…検索キーワード抽出・保存処理部、１４，２１…第１登録候補用語データ、１５…編集前用語データ、１６…用語比較処理部、１７，２２…第２登録候補用語データ、１８，２４…用語構成表示・編集処理部、１９…辞書編集者、２０…検索キーワード，検索属性データ抽出・保存処理部、２３…検索属性データ、３０，３３…テーブル、３１，３４…登録候補用語ＩＤ、３２，５６，６１…登録候補用語、３５…検索日時、４０…図１，図２の検索時の処理ステップ、４１，４２，４３，４４…辞書構築時の処理ステップ、５０…画面、５１…登録候補用語絞り込み条件指定エリア、５２…登録候補用語絞り込みボタン、５３…表示条件指定エリア、５４…表示条件反映ボタン、５５…登録候補用語選択エリア、５７，６０…辞書用語編集エリア、５８，６２…登録先用語（親）、５９…用語削除ボタン、７０…システム管理者、７１…管理用クライアント、７２…検索エンジン用サーバ、７３…情報ソース、７４…ネットワーク、７５…検索用クライアント、７６…情報検索者、７７…利用者データ、７９…辞書構築サービス提供者、８１…キーボード、８２…コンピュータ本体、８３…ディスプレイ、９１…磁気ディスク、１００…編集後用語データ、１０１…ＣＤ−ＲＯＭ、５００…辞書登録ボタン、７００…辞書構築支援システム。[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to a dictionary construction method for constructing a dictionary, a dictionary construction system, and a screen device.
[0002]
[Prior art]
In the Internet and document management systems, when the amount of contents such as documents and data files serving as information sources becomes enormous, the labor required for users to obtain necessary information generally increases. Therefore, in order to obtain necessary information with a small amount of time, the computer can understand in advance the technical terms for each field of information that the user wants to use and the relationships between terms such as superordinate and subordinate concepts, aliases, and synonyms. A dictionary database is prepared in this way, and this dictionary database is used for searching, extracting, and classifying information.
[0003]
However, the task of constructing a dictionary database of technical terms as described above has conventionally been performed manually by experts in the field, and there has been a problem that the number of work steps is considerably large depending on the number of words. Specifically, as a general dictionary construction method, a term set obtained by inputting a document in a specialized field and automatically extracting the term into words (morphological analysis process) is manually input. Unnecessary terms were removed and classified.
[0004]
In order to reduce these manual operations, Japanese Patent Application Laid-Open No. H11-296549 describes a user interface for editing a dictionary of concept information, and in particular, a method of listing related concept information candidates using a degree of association. Is described.
[0005]
[Patent Document 1]
JP-A-11-296549 [0006]
[Problems to be solved by the invention]
In the related art, a morphological analysis is performed by using a document in a specific field or a document group as an input, and the obtained term set is classified using the degree of association between terms or the like, or a classification candidate is presented.
[0007]
However, when the input is a document in a specific field, there are the following problems.
[0008]
First, since a document is a set of terms, it is necessary to use morphological analysis as a process for extracting technical terms. However, this generally requires time and effort to remove noise (unnecessary terms), and depends on the size of the document. Tend to increase. A parameter such as a term appearance frequency in a document may be used for this noise removal. However, the relationship between an extremely high or low appearance frequency and the possibility of a technical term cannot be generally described.
[0009]
Further, even if the tendency can be uniformly grasped based on the frequency of appearance of the term, the old term cannot be distinguished from the latest term, and the conventional technology cannot be used from the viewpoint of maintaining the freshness of the term.
[0010]
Therefore, an object of the present invention is to provide a dictionary construction method, system, and screen for reducing the number of dictionary construction steps.
[0011]
[Means for Solving the Problems]
One feature of the present invention is that in a method for constructing a dictionary, the dictionary is constructed from search keywords or search attribute information extracted from search history information.
[0012]
The other features of the present invention are as described in the claims of the present application.
[0013]
BEST MODE FOR CARRYING OUT THE INVENTION
Hereinafter, embodiments of the present invention will be described with reference to the drawings.
[0014]
The first embodiment is an example in which the searcher 10 (user) uses the search history data at the time of using the search function to support the construction of a term dictionary, and FIG. 1 shows an example thereof.
[0015]
The search function in the present embodiment means a function of searching for information in a file required by a user by inputting a keyword in a search engine such as a file system, a document management system, a mail system, and the Internet.
[0016]
Here, "file" refers to document data created by a word processor or an editor or the like, standard document data on the Internet such as HTML (Hyper Text Markup Language), XML (extensible Markup Language), program data describing software, and program data. One unit of data, such as data, analysis data, image data, and moving image data.
[0017]
The search keyword means a keyword that can be finally digitized as a word, such as a keyboard input or a voice input.
[0018]
First, the user searcher 10 freely searches using the search function 11. Specifically, the user uses an Internet search engine. Thereby, the search history data 12 is obtained. It should be noted that as the number of times the searcher 10 performs a search using the search function 11 increases, the amount of the search history data 12 increases, and a more complete dictionary can be constructed.
[0019]
Therefore, the search keyword extraction / save processing unit 13 extracts the search keyword from the search history data 12 and stores the result as the first registration candidate term data 14 in the database. Next, if a technical term dictionary already exists, the term comparison processing unit 16 performs character string pattern matching of the first registration candidate term data 14 with the term as the pre-edit term data 15, and outputs the term data before edit. The registration candidate terms that do not exist in 15 are extracted, and the result is stored in the database as second registration candidate term data 17.
[0020]
If the pre-edit term data 15 does not exist in the first use, basic term classification data may be created and used as the pre-edit term data 15.
[0021]
This makes it possible to distinguish between the old term and the latest term, and maintain the freshness of the term.
[0022]
Next, the term configuration display / edit processing unit 18 reads the pre-edit term data 15 first, and then reads the second registration candidate term data 17 according to the narrow-down condition specified by the dictionary editor 19.
[0023]
Here, as an example of a method for narrowing the registration candidate terms,
[Notation] [Meaning]
X *: All terms preceded by X are X? ? ? : There is a method of performing character string matching using a regular expression such as X followed by any three characters.
[0024]
The term composition display means that character string term data is generally displayed on a screen in a tree format because it has a classification hierarchy of upper words, lower words, and the like.
[0025]
Thereafter, the term selected from the second registration candidate term data is added below the optimal node (term) in the pre-edit term data 15 displayed in a tree format on the screen by the operation of the dictionary editor. To execute term editing.
[0026]
Finally, after the editing is completed, the edited result is stored in the database as the edited term data 100.
[0027]
In the above description, the pre-edit term data 15 and the post-edit term data 100 are distinguished, but they may be overwritten after the edit as one data.
[0028]
According to the above-described embodiment, it is possible to construct a dictionary with high term freshness while reducing the number of work steps from the search history information.
[0029]
The second embodiment is an example in which the search attribute data is further used to improve the dictionary editing efficiency in the first embodiment, and FIG. 2 shows an example thereof.
[0030]
From the search history data 12 output by the search function 11, the search keyword and search attribute data extraction / save processing unit 20 extracts the search keyword and the search attribute data, and the search keyword is used as the first registration candidate term. As data 21, search attribute data is stored in a database as 23. At this time, the data of 21 and 23 are associated with a term ID or the like.
[0031]
Here, the search attribute data is information on one search operation, such as the date and time and the number of hits when a searcher performs a search using the search function. The search attribute data also includes data that can identify the searcher, for example, the ID (IP address or the like) of the machine used and user information obtained from login information to the system.
[0032]
The term configuration display / edit processing unit 24 first reads the pre-edit term data 15, and then reads the second registration candidate term data 22 according to the narrowing conditions specified by the dictionary editor 19.
[0033]
Here, in order to narrow down the registration candidate terms, in addition to the method of using the regular expression of the first embodiment as a narrowing condition, the narrowing condition is set using search attribute data 23 such as a search date and time, a searcher, and a search hit rate. . Examples of search conditions are as follows.
Search date and time: January 1, 2000-December 31, 2001 Searcher: Taro Yamada Search hits: Less than 10 (or more)
When the organization information of the user can be obtained from the user management information of the system, organization information of a company, department, section, or the like may be specified in the searcher. This facilitates the construction of specialized dictionaries for, for example, “Company A” and “B Design Department”. In addition, by using the number of search hits as a narrowing condition, terms that are not commonly used yet, or conversely, terms that are already generally used can be narrowed down to some extent. When the number of search hits is 0, it may be determined that there is a high possibility that the search keyword is incorrect.
[0034]
The third embodiment is an example of the data structure of the first registration candidate term data 21, the second registration candidate term data 22, and the search attribute data 23 in the second embodiment, and FIG. 3 is an example of this.
[0035]
The table 30 is a data structure of the first registration candidate term data 21 and the second registration candidate term data 22 in the second embodiment, and associates the registration candidate term ID 31 and the registration candidate term 32 as one row.
[0036]
Further, the table 33 has the data structure of the search attribute data 23 in the second embodiment, and associates the registration candidate term ID 34 with the search attribute data such as the search date and time 35 and the searcher 36 as one line. Here, the IP address of the machine is used to identify the searcher 36.
[0037]
With such a data structure, it is easy to add a new search attribute item.
[0038]
The fourth embodiment shows the flow of processing when using the search function and constructing a dictionary in the second embodiment, and FIG. 4 shows an example thereof.
[0039]
When the search function is used, the history in which the searcher 10 has searched for information using the search function is stored as search history data 12 as in process 40.
[0040]
On the other hand, when constructing a dictionary, first, as in process 41, search keywords and search attribute data are extracted from the search history data 12, and the search keywords are stored as first registration candidate term data 21 and search attribute data 23.
[0041]
Next, as in a process 42, the pre-editing term data 15 and the first registered candidate term data 21 are compared to extract a registered candidate term that does not exist in the pre-edited term data 15, and as the second registered candidate term data 22, save.
[0042]
Further, at the time of editing the dictionary, the second registration candidate term data 22 and the pre-edit term data 15 are read and displayed as in processing 43, and the dictionary editor 19 accepts the display and gives an instruction to edit the term. Is performed. After the editing, the editing result is saved as the edited term data 100 as in a process 44.
[0043]
The fifth embodiment shows an example of the dictionary editing screen in the first and second embodiments, and FIG. 5 shows an example thereof.
[0044]
The screen 50 is divided into an area 57 (left side of the screen) for displaying and editing the pre-editing term data 15 in a tree and an area (right side of the screen) for selecting a registration candidate term.
[0045]
In the registration candidate term narrowing condition designation area 51, the narrowing condition by the regular expression of the term shown in the first embodiment and the narrowing condition by the search attribute data shown in the second embodiment are input. After inputting the narrowing-down conditions, a registration candidate narrowing-down process is executed by clicking the registration candidate term narrowing down button 52 with a pointing device such as a mouse.
[0046]
Next, in the display condition specification area 53, display conditions such as the display order of the terms narrowed down above are specified. The display conditions include a simple term list display in descending order and ascending order, as well as a method of displaying in a hierarchical manner by character string pattern matching. After inputting the display conditions, the display reflection process is executed by clicking the display condition reflection button 54 with a pointing device such as a mouse, and the result is displayed in the registration candidate term selection area 55.
[0047]
In the registration candidate term selection area 55, the dictionary editor selects a term 56 to be registered in the dictionary with a pointing device such as a mouse, and drags and drops the term 56 to a term 58 considered to be applicable in the dictionary term editing area 57. As a result, the term 56 is added to the dictionary term editing area 57 in the lower hierarchy of the drop destination term 58.
[0048]
In the dictionary term editing area 57, the terms can be arbitrarily moved by dragging and dropping. If there is an unnecessary term, it can be deleted by selecting the term and then clicking the term delete button 59 with a pointing device such as a mouse. It is.
[0049]
When the dictionary editing operation is finally completed, the dictionary registration button 500 is clicked with a pointing device such as a mouse, and is saved as edited term data.
[0050]
Thereby, when registering the dictionary registration terms, it becomes possible to register the dictionary registration terms in association with the terms in the pre-editing term data 15 (existing dictionary).
[0051]
The sixth embodiment shows another example of the dictionary term editing area screen in the fifth embodiment, and FIG. 6 shows an example thereof.
[0052]
In the registration candidate term selection area 55, when the dictionary editor selects a term 56 to be registered in the dictionary with a pointing device such as a mouse, the term 56 is displayed in the center of the dictionary term editing area 60.
[0053]
At the same time, the degree of character string matching is calculated by character string pattern matching between the term 56 and each term in the term data before editing. For example, when the registration candidate term is “XXZ device”, as an example of the character string matching degree with “X device”,
Matched character detection direction: Number of characters whose back position matched: a = 2 characters Other characters matched: b = 1 character (character string matching degree) = w1 × a + w2 × b
Here, w1 and w2 are weights (numerical values of 0 or more), and generally, w1> w2. The distance from the center of the dictionary term editing area 60 is, for example,
(Distance) = 1 / (character string matching degree)
, There is a method using the reciprocal of the character string matching degree.
[0054]
When the arrangement position of each term is determined in the dictionary term editing area 60 by the above method, terms similar to the term 61 are displayed near the center and terms not similar are displayed far away.
[0055]
The dictionary editor drags and drops the term 62 that is considered to be applicable from the terms displayed near the term 61. As a result, the term 61 is added to the dictionary term editing area in a hierarchy lower than the drop destination term 62.
[0056]
Thereby, when registering the dictionary registration terms, it becomes possible to register the dictionary registration terms in association with the terms having a high degree of matching included in the pre-editing term data 15 (existing dictionary).
[0057]
The seventh embodiment is an example of a system configuration when the present invention is applied to a dictionary construction service, and FIG. 7 is an example of the system.
[0058]
The search engine server 72 for performing the search service generates a search index from the information source 73 via a network 74 such as the Internet. The information searcher 76 accesses the search engine server 72 via the network 74 through the search client 75. Here, the system administrator 70 manages the search engine server 72 through the management client 71.
[0059]
The person who receives the dictionary construction service permits the search history data 78 of the information searcher 76 belonging to the same organization to be provided to the dictionary construction service provider 79.
[0060]
The dictionary construction service provider 79 takes in the search history data 78 and the user data 77 into the dictionary construction support system 700 already described in the present invention, and constructs a technical term dictionary.
[0061]
The service fee may be set by ranking according to the number of words in the dictionary as the final product, the number of information searchers 76, and the like.
[0062]
This makes it possible to provide a smooth dictionary construction service to the customer.
[0063]
As described above, various embodiments have been described. A device for realizing this may be configured as a dedicated device. However, as illustrated in FIG. And a general-purpose computer system including a storage unit for storing input data and programs as a database, an operation unit, and the like, and a display 83 and a general-purpose computer system. It can be realized by a processing program.
[0064]
When the processing program is added to such a general-purpose computer system and realized, the processing program is recorded on a medium such as the magnetic disk 91 illustrated in FIG. 9 or the CD-ROM 101 illustrated in FIG. , Stored, mounted, read by a magnetic disk reader or a CD-ROM reader provided in the computer main body 82, and taken into the computer main body 82. When a processing program delivered via a communication network is implemented by input means, the captured processing program is stored and stored in a medium such as a magnetic disk so that it can be used repeatedly.
[0065]
【The invention's effect】
ADVANTAGE OF THE INVENTION According to this invention, the dictionary construction method, system, and screen which reduce the dictionary construction man-hour can be provided.
[Brief description of the drawings]
FIG. 1 is an example of a functional block diagram for implementing a term dictionary construction support using search history data when a user uses a search function in an embodiment of the present invention.
FIG. 2 is an example of a functional block diagram for realizing improvement in dictionary editing efficiency using search attribute data in FIG.
FIG. 3 is an example of a data structure diagram of a database in FIGS. 1 and 2;
FIG. 4 is an example of a flowchart showing a processing flow when a search function is used and a dictionary is constructed in FIG. 2;
FIG. 5 is an example of a dictionary editing screen in the first and second embodiments.
6 is an example of another screen of the dictionary term editing area in FIG. 5;
FIG. 7 is a system configuration example when the present invention is applied to a dictionary construction service.
FIG. 8 is an example of a computer system.
FIG. 9 is an example of a magnetic disk.
FIG. 10 shows an example of a CD-ROM.
[Explanation of symbols]
10, 36 ... searcher, 11 ... search function, 12, 78 ... search history data, 13 ... search keyword extraction and storage processing unit, 14, 21 ... first registration candidate term data, 15 ... term data before editing, 16 ... Term comparison processing unit, 17, 22: second registration candidate term data, 18, 24: Term configuration display / edit processing unit, 19: dictionary editor, 20: search keyword, search attribute data extraction / storage processing unit, 23 ... Search attribute data, 30, 33 ... table, 31, 34 ... registration candidate term ID, 32, 56, 61 ... registration candidate term, 35 ... search date and time, 40 ... processing steps at the time of search in Figs. 42, 43, 44: processing steps for dictionary construction, 50: screen, 51: registration candidate term narrowing condition designation area, 52: registration candidate term narrowing button, 53: display condition designation area, 54: display Item reflection button, 55: registration candidate term selection area, 57, 60: dictionary term editing area, 58, 62: registration destination term (parent), 59: term deletion button, 70: system administrator, 71: management client, 72 search server, 73 information source, 74 network, 75 search client, 76 information searcher, 77 user data, 79 dictionary service provider, 81 keyboard, 82 computer 83, display, 91, magnetic disk, 100, edited term data, 101, CD-ROM, 500, dictionary registration button, 700, dictionary construction support system.

Claims

In the method of building a dictionary,
A dictionary construction method for constructing a dictionary from search keywords or search attribute information extracted from search history information.

In claim 1,
Inputting the search history information, extracting a first search term, and storing the first search term in the first registration candidate term data;
Comparing the pre-edited term data with the first registered candidate term data, extracting a second search term that is not included in the pre-edited term data from the first registered candidate term data, Processing to save as
Associating the second search term with a term included in the pre-edited term data;
Adding the associated second search term as the edited term data.

In claim 2,
A dictionary construction method comprising a process of displaying the pre-edit term data hierarchically.

In claim 2,
A dictionary construction method, wherein the pre-edit term data and the post-edit term data are the same data.

The process of associating the second search term with a term included in the pre-edited term data according to claim 2,
A process of selecting a registration candidate term of a registration candidate from the second search terms;
Searching the registered candidate terms from the pre-edited term data and presenting them in the order of terms having the highest similarity.

6. The dictionary construction method according to claim 5, wherein searching the registered candidate term from the pre-edit term data includes searching using a regular expression or search attribute data.

The process of associating the second search term with a term included in the pre-edited term data according to claim 2,
A registration candidate term selected from the second search terms is arranged and displayed at the center of the display area, and a match between the selected registration candidate term and each term included in the pre-edit term data is displayed. Calculating a degree, and displaying each term included in the pre-editing term data at a position closer to the center as the degree of matching is higher and closer to the center as the degree of matching is lower. Dictionary construction method.

The process of associating the second search term with a term included in the pre-edited term data according to claim 2,
A process of inputting a narrowing condition in order to narrow down the second search term;
Searching for and presenting a third search term that satisfies the narrowing condition;
Selecting a registration candidate term from the third search term.

The processing for narrowing down the second search term according to claim 8 is:
A process of inputting a condition for narrowing registration candidate terms using search attribute data or a regular expression;
Searching for and presenting a third search term that satisfies the narrowing condition;
Selecting a registration candidate term from the third search term.

A program for causing a computer to execute the dictionary construction process according to claim 2.

A computer-readable storage medium storing a program for causing a computer to execute the dictionary construction process according to claim 2.

In order to narrow the registration candidate terms from the search history data, a part for inputting conditions for narrowing the registration candidate terms,
A screen displaying the narrowed registration candidate terms.

In claim 12,
A screen for inputting display conditions for displaying the registration candidate terms.

An apparatus that receives search history information as input, extracts a first search term, and stores the first search term in first registered candidate term data;
Comparing the pre-edited term data with the first registered candidate term data, extracting a second search term that is not included in the pre-edited term data from the first registered candidate term data, A device for storing in
An apparatus for associating the second search term with a term included in the pre-edited term data,
A device for adding the associated second term to the edited term data.