JP2004078540A

JP2004078540A - Dictionary information processor, dictionary information processing method, its program, and recording medium

Info

Publication number: JP2004078540A
Application number: JP2002237687A
Authority: JP
Inventors: Naoyuki Horai; 蓬莱　尚幸; Kiyoshi Nitta; 新田　清
Original assignee: Celestar Lexico Sciences Inc
Current assignee: Celestar Lexico Sciences Inc
Priority date: 2002-08-16
Filing date: 2002-08-16
Publication date: 2004-03-11

Abstract

<P>PROBLEM TO BE SOLVED: To provide a dictionary information processor or the like capable of automating or semi-automating the preparation of various notation dictionaries and category dictionaries to be used in a bibliographic database retrieval service or a check of the prepared dictionary. <P>SOLUTION: This dictionary information processor automatically prepares notation dictionary information defining a corresponding relation between the normal form and another notation form of each term and category dictionary information defining a category to which the normal form is belonging based on already existing structured data, group, database, and analysis program processing result or the like. Also, this dictionary information processor checks information stored in the notation dictionary information and/or category dictionary information by using various check methods. <P>COPYRIGHT: (C)2004,JPO

Description

【０００１】
【発明の属する技術分野】
本発明は、辞書情報処理装置、辞書情報処理方法、プログラム、および、記録媒体に関し、特に、文献データベース検索サービスにおいて用いられる各種の表記辞書およびカテゴリ辞書の作成や、作成された辞書のチェックを自動化あるいは半自動化することのできる辞書情報処理装置、辞書情報処理方法、プログラム、および、記録媒体に関する。
【０００２】
【従来の技術】
近年、論文などの各種の技術文献を蓄積した文献データベースが構築され、インターネットなどを介して広く利用されている。例えば、米国国立バイオテクノロジーセンター（ＮＣＢＩ）が米国国立医学図書館（ＮＬＭ）等の文献データを提供するＰｕｂＭｅｄなどが存在する（インターネット上のＰｕｂＭｅｄのＵＲＬ：　ｈｔｔｐ：／／ｗｗｗ．ｎｃｂｉ．ｎｌｍ．ｇｏｖ／ｅｎｔｒｅｚ／）。
【０００３】
従来の文献データベースの検索サービスにおいては、検索効率の向上などを図るために、各用語の正規形と表記形との対応を取るための「表記辞書」や、各用語についてカテゴリ分類するための「カテゴリ辞書」などが用いられている。
【０００４】
例えば、既存の表記辞書やカテゴリ辞書を用いたテキストマイニングシステムとして、ＩＢＭ（会社名）のＴＡＫＭＩ（製品名）が存在する（ＩＢＭ東京基礎研究所のテキストマイニング技術紹介のホームページのＵＲＬ：　ｈｔｔｐ：／／ｗｗｗ．ｔｒｌ．ｉｂｍ．ｃｏｍ／ｐｒｏｊｅｃｔｓ／ｓ７７１０／ｔｍ／ｉｎｄｅｘ．ｈｔｍ、ＴＡＫＭＩ紹介のホームページのＵＲＬ：　ｈｔｔｐ：／／ｗｗｗ．ｔｒｌ．ｉｂｍ．ｃｏｍ／ｐｒｏｊｅｃｔｓ／ｓ７７１０／ｔｍ／ｔａｋｍｉ／ｔａｋｍｉ．ｈｔｍ）。
【０００５】
また、医学用語のシソーラス検索サービスとして、ＭｅＳＨ（ＭｅｄｉｃａｌＳｕｂｊｅｃｔ　Ｈｅａｄｉｎｇｓ）などが存在する（ＮＬＭのＭｅＳＨのホームページのＵＲＬ：　ｈｔｔｐ：／／ｗｗｗ．ｎｌｍ．ｎｉｈ．ｇｏｖ／ｍｅｓｈ／ｍｅｓｈｈｏｍｅ．ｈｔｍｌ、ＭｅＳＨの概要を解説した論文のホームページのＵＲＬ：　ｈｔｔｐ：／／ｗｗｗ．ｎｌｍ．ｎｉｈ．ｇｏｖ／ｍｅｓｈ／ｐａｔｔｅｒｎｓ．ｈｔｍｌ、ＭｅＳＨ　ＢｒｏｗｓｅｒサービスのホームページのＵＲＬ：　ｈｔｔｐ：／／ｗｗｗ．ｎｃｂｉ．ｎｉｈ．ｇｏｖ／ｅｎｔｒｅｚ／ｍｅｓｈｂｒｏｗｓｅｒ．ｃｇｉ）。
【０００６】
【発明が解決しようとする課題】
しかしながら、従来の文献データベース検索サービスにおいて用いられる各種の表記辞書やカテゴリ辞書の作成や、作成された辞書のチェックは担当者により手作業で行われることが多いため、最新でかつ広範囲の用語について網羅的かつ高精度な辞書を作成するために多大な時間と手間がかかるというシステム構造上の基本的問題点を有していた。
以下、この問題点の内容について、一層具体的に説明する。
【０００７】
まず、従来の文献データベース検索サービス等において、解析対象文書に記載された各用語について、予め表記辞書に登録された別表記形である場合には、その対応する正規形に変換して検索を行う。すなわち、別表記形を正規形に統一することで、検索精度の向上を図ることができ、また、用語数のカウント等に基づくテキストマイニング精度の向上を図ることができる。
【０００８】
しかしながら、従来の表記辞書の作成は、作業者の手作業により行われる場合が多く、最新でかつ広範囲の用語について網羅的かつ高精度な表記辞書を作成するために多大な時間と手間がかかっていた。
【０００９】
また、従来の文献データベース検索サービス等において、正規形がどのカテゴリに属するかを定義するカテゴリ辞書も必要になる。正規形のカテゴリ構造は、幅も深さも大きな木構造となり正規形とカテゴリは多対多の関係付けとなるため、非常に複雑な構造となるが、従来のカテゴリ辞書の作成も、作業者の手作業により行われる場合が多く、最新でかつ広範囲の正規形とカテゴリの関係について網羅的かつ高精度なカテゴリ辞書を作成するために多大な時間と手間がかかっていた。
【００１０】
また、作成された表記辞書やカテゴリ辞書には、バグやエラーが混入することが多い。また、科学技術の進歩により、従来のカテゴリ分類や定義が訂正・変更されることもある。このような場合に、作成された辞書情報のチェック作業を手作業で行う場合が多く、最新でかつ広範囲の用語について網羅的かつ大量の辞書情報をチェックするために多大な時間と手間がかかっていた。
【００１１】
このように、従来のシステム等は数々の問題点を有しており、その結果、文献データベース検索サービスの利用者および管理者のいずれにとっても、利便性が悪く、また、利用効率が悪いものであった。
【００１２】
なお、これまで説明した従来の技術および発明が解決しようとする課題は、生物や医学や科学等の自然科学系の文献の文献情報データベース検索システムに限られず、全ての分野の文献情報を検索する全てのシステムにおいて、同様に考えることができる。
【００１３】
本発明は上記問題点に鑑みてなされたもので、文献データベース検索サービスにおいて用いられる各種の表記辞書およびカテゴリ辞書の作成や、作成された辞書のチェックを自動化することのできる、辞書情報処理装置、辞書情報処理方法、プログラム、および、記録媒体を提供することを目的としている。
【００１４】
【課題を解決するための手段】
このような目的を達成するため、請求項１に記載の辞書情報処理装置は、各用語の正規形と別表記形との対応関係を定義する表記辞書情報を作成する表記辞書作成手段と、上記正規形の所属するカテゴリを定義するカテゴリ辞書情報を作成するカテゴリ辞書作成手段と、上記表記辞書情報および／または上記カテゴリ辞書情報に格納された情報をチェックする辞書情報チェック手段とを備えたことを特徴とする。
【００１５】
この装置によれば、各用語の正規形と別表記形との対応関係を定義する表記辞書情報を作成し、正規形の所属するカテゴリを定義するカテゴリ辞書情報を作成し、表記辞書情報および／またはカテゴリ辞書情報に格納された情報をチェックするので、文献データベース検索サービスにおいて用いられる各種の表記辞書およびカテゴリ辞書の作成や、作成された辞書のチェックを自動化することができる。また、辞書作成の効率化および高精度化を図ることができる。
【００１６】
また、請求項２に記載の辞書情報処理装置は、請求項１に記載の辞書情報処理装置において、上記表記辞書作成手段は、既存データベースを構成する各フィールドの属性情報に基づいて、各フィールドを正規形とするか、別表記形とするか、または、利用しないかを判断するフィールド属性判断手段をさらに備え、当該フィールド属性判断手段の判断結果に基づいて上記既存データベースの各フィールドから上記表記辞書情報を作成することを特徴とする。
【００１７】
これは表記辞書作成手段の一例を一層具体的に示すものである。この装置によれば、既存データベースを構成する各フィールドの属性情報に基づいて、各フィールドを正規形とするか、別表記形とするか、または、利用しないかを判断し、判断結果に基づいて既存データベースの各フィールドから表記辞書情報を作成するので、既存のデータベースから表記辞書を効率的に作成することができるようになる。
【００１８】
また、請求項３に記載の辞書情報処理装置は、請求項１または２に記載の辞書情報処理装置において、上記表記辞書作成手段は、既存の辞典情報に記載された用語に基づいて、当該用語を正規形とするか、別表記形とするか、または、利用しないかを判断する辞典用語判断手段をさらに備え、当該辞典用語判断手段の判断結果に基づいて上記辞典情報の上記用語から上記表記辞書情報を作成することを特徴とする。
【００１９】
これは表記辞書作成手段の一例を一層具体的に示すものである。この装置によれば、既存の辞典情報に記載された用語（辞典の見出し語や、略語、同義語、類語などの欄内に記載された用語など）に基づいて、当該用語を正規形とするか、別表記形とするか、または、利用しないかを判断し、判断結果に基づいて辞典情報の用語から表記辞書情報を作成するので、既存の辞典情報から表記辞書を効率的に作成することができるようになる。
【００２０】
また、請求項４に記載の辞書情報処理装置は、請求項１から３のいずれか一つに記載の辞書情報処理装置において、上記表記辞書作成手段は、既存のＷｅｂ情報に記載された用語に基づいて、当該用語を正規形とするか、別表記形とするか、または、利用しないかを判断するＷｅｂ用語判断手段をさらに備え、当該Ｗｅｂ用語判断手段の判断結果に基づいて上記Ｗｅｂ情報の上記用語から上記表記辞書情報を作成することを特徴とする。
【００２１】
これは表記辞書作成手段の一例を一層具体的に示すものである。この装置によれば、既存のＷｅｂ情報（例えば、既存のＷｅｂサイトに記載された情報や、辞書に登録する用語を収集することを目的とした、参加者が書き込み可能なＷｅｂサイトに書き込まれた情報等を含む）に記載された用語に基づいて、当該用語を正規形とするか、別表記形とするか、または、利用しないかを判断し、判断結果に基づいてＷｅｂ情報の用語から表記辞書情報を作成するので、既存のＷｅｂ情報から表記辞書を効率的に作成することができるようになる。
【００２２】
また、これにより、各人が有する辞書情報の顕在化および共有化を図ることができるようになる。
【００２３】
また、請求項５に記載の辞書情報処理装置は、請求項１から４のいずれか一つに記載の辞書情報処理装置において、上記カテゴリ辞書作成手段は、既存の構造化データに基づいて、カテゴリ構造情報を作成する構造化データカテゴリ構造情報作成手段をさらに備え、当該構造化データカテゴリ構造情報作成手段により作成された当該カテゴリ構造情報に基づいて上記カテゴリ辞書情報を作成することを特徴とする。
【００２４】
これはカテゴリ辞書作成手段の一例を一層具体的に示すものである。この装置によれば、既存の構造化データに基づいて、カテゴリ構造情報を作成し、作成された当該カテゴリ構造情報に基づいてカテゴリ辞書情報を作成するので、既存の構造化データにより定義された分類等に基づいて効率的にカテゴリ辞書を作成することができるようになる。
【００２５】
また、請求項６に記載の辞書情報処理装置は、請求項５に記載の辞書情報処理装置において、上記構造化データカテゴリ構造情報作成手段は、上記既存の構造化データに根ノードが複数存在する場合には、その上位に仮想根ノードを追加してカテゴリ構造情報を作成することを特徴とする。
【００２６】
これは構造化データカテゴリ構造情報作成手段の一例を一層具体的に示すものである。この装置によれば、既存の構造化データに根ノードが複数存在する場合には、その上位に仮想根ノードを追加してカテゴリ構造情報を作成するので、既存の構造化データにより定義された分類等に基づいてさらに効率的にカテゴリ辞書を作成することができるようになる。
【００２７】
また、請求項７に記載の辞書情報処理装置は、請求項５または６に記載の辞書情報処理装置において、上記構造化データカテゴリ構造情報作成手段は、上記既存の構造化データに合流が存在する場合には、対応する部分構造を当該合流部分に複製することにより合流のない単純な木構造のカテゴリ構造情報を作成することを特徴とする。
【００２８】
これは構造化データカテゴリ構造情報作成手段の一例を一層具体的に示すものである。この装置によれば、既存の構造化データに合流が存在する場合には、対応する部分構造を当該合流部分に複製することにより合流のない単純な木構造のカテゴリ構造情報を作成するので、既存の構造化データにより定義された分類等に基づいてさらに効率的にカテゴリ辞書を作成することができるようになる。
【００２９】
また、請求項８に記載の辞書情報処理装置は、請求項１から７のいずれか一つに記載の辞書情報処理装置において、上記カテゴリ辞書作成手段は、既存の集合データに基づいて、根ノードを集合データ名とし、葉ノードを集合要素名とするカテゴリ構造情報を作成する集合カテゴリ構造情報作成手段をさらに備え、当該集合カテゴリ構造情報作成手段により作成された当該カテゴリ構造情報に基づいて上記カテゴリ辞書情報を作成することを特徴とする。
【００３０】
これはカテゴリ辞書作成手段の一例を一層具体的に示すものである。この装置によれば、既存の集合データに基づいて、根ノードを集合データ名とし、葉ノードを集合要素名とするカテゴリ構造情報を作成し、作成された当該カテゴリ構造情報に基づいてカテゴリ辞書情報を作成するので、既存の集合データにより定義された情報に基づいて効率的にカテゴリ辞書を作成することができるようになる。
【００３１】
また、請求項９に記載の辞書情報処理装置は、請求項１から８のいずれか一つに記載の辞書情報処理装置において、上記カテゴリ辞書作成手段は、ＭｅＳＨタームデータに基づいてカテゴリ構造情報を作成するＭｅＳＨタームカテゴリ構造情報作成手段をさらに備え、当該ＭｅＳＨタームカテゴリ構造情報作成手段により作成された当該カテゴリ構造情報に基づいて上記カテゴリ辞書情報を作成することを特徴とする。
【００３２】
これはカテゴリ辞書作成手段の一例を一層具体的に示すものである。この装置によれば、ＭｅＳＨタームデータに基づいてカテゴリ構造情報を作成し、作成された当該カテゴリ構造情報に基づいてカテゴリ辞書情報を作成するので、既存のＭｅＳＨタームデータにより定義された医薬用語等に基づいて効率的にカテゴリ辞書を作成することができるようになる。
【００３３】
また、請求項１０に記載の辞書情報処理装置は、請求項１から９のいずれか一つに記載の辞書情報処理装置において、上記カテゴリ辞書作成手段は、既存のデータベースに基づいて、根ノードを当該既存のデータベース名または格納された特定のフィールドのフィールド名とし、葉ノードを当該データベースまたは当該フィールドに格納された各格納データとするカテゴリ構造情報を作成するデータベースカテゴリ構造情報作成手段をさらに備え、当該データベースカテゴリ構造情報作成手段により作成された当該カテゴリ構造情報に基づいて上記カテゴリ辞書情報を作成することを特徴とする。
【００３４】
これはカテゴリ辞書作成手段の一例を一層具体的に示すものである。この装置によれば、既存のデータベースに基づいて、根ノードを当該既存のデータベース名または格納された特定のフィールドのフィールド名とし、葉ノードを当該データベースまたは当該フィールドに格納された各格納データとするカテゴリ構造情報を作成し、作成された当該カテゴリ構造情報に基づいてカテゴリ辞書情報を作成するので、既存のデータベースにより定義されたフィールドや格納データ等に基づいて効率的にカテゴリ辞書を作成することができるようになる。
【００３５】
また、請求項１１に記載の辞書情報処理装置は、請求項１から１０のいずれか一つに記載の辞書情報処理装置において、上記カテゴリ辞書作成手段は、既存の解析プログラムの処理結果データに基づいて、根ノードを当該既存の処理プログラム名とし、葉ノードを当該処理結果データとするカテゴリ構造情報を作成する解析プログラムカテゴリ構造情報作成手段をさらに備え、当該解析プログラムカテゴリ構造情報作成手段により作成された当該カテゴリ構造情報に基づいて上記カテゴリ辞書情報を作成することを特徴とする。
【００３６】
これはカテゴリ辞書作成手段の一例を一層具体的に示すものである。この装置によれば、既存の解析プログラムの処理結果データに基づいて、根ノードを当該既存の処理プログラム名とし、葉ノードを当該処理結果データとするカテゴリ構造情報を作成し、作成された当該カテゴリ構造情報に基づいてカテゴリ辞書情報を作成するので、既存の解析プログラムの処理結果データに基づいて効率的にカテゴリ辞書を作成することができるようになる。
【００３７】
また、請求項１２に記載の辞書情報処理装置は、請求項１から１１のいずれか一つに記載の辞書情報処理装置において、上記辞書情報チェック手段は、チェック用語句リスト、チェック用プログラム、チェック用パターンのうち少なくとも一つに基づいて、上記表記辞書情報および／または上記カテゴリ辞書情報を各エントリ単位にチェックするエントリ単位チェック手段をさらに備えたことを特徴とする。
【００３８】
これは辞書情報チェック手段の一例を一層具体的に示すものである。この装置によれば、チェック用語句リスト、チェック用プログラム、チェック用パターンのうち少なくとも一つに基づいて、表記辞書情報および／またはカテゴリ辞書情報を各エントリ単位にチェックするので、チェック項目を予め定めておくことにより辞書情報の高品質化を自動的に図ることができるようになる。
【００３９】
また、これにより、辞書作成においてプログラムのバグ（不具合）や例外処理漏れなどにより混入した不適切なエントリを容易に発見することができるようになる。
【００４０】
また、利用した既存データのエラーに起因する不適切なエントリを容易に発見することができるようになる。
【００４１】
また、テキストマイニング用辞書のエントリとして不適切なエントリを容易に発見することができるようになる。
【００４２】
また、請求項１３に記載の辞書情報処理装置は、請求項１から１２のいずれか一つに記載の辞書情報処理装置において、上記辞書情報チェック手段は、上記表記辞書情報に登録された別表記形が他の正規形として登録されているか否かをチェックする正規形不整合チェック手段をさらに備えたことを特徴とする。
【００４３】
これは辞書情報チェック手段の一例を一層具体的に示すものである。この装置によれば、表記辞書情報に登録された別表記形が他の正規形として登録されているか否かをチェックするので、正規形の不整合を排することにより辞書情報の高品質化を自動的に図ることができるようになる。
【００４４】
また、請求項１４に記載の辞書情報処理装置は、請求項１から１３のいずれか一つに記載の辞書情報処理装置において、上記辞書情報チェック手段は、上記表記辞書情報および／または上記カテゴリ辞書情報に登録された正規形、別表記形、または、カテゴリに関する登録状況および利用状況について統計処理を行い、当該統計処理の結果が予め定めた正常値範囲に入っているか否かをチェックする統計チェック手段をさらに備えたことを特徴とする。
【００４５】
これは辞書情報チェック手段の一例を一層具体的に示すものである。この装置によれば、表記辞書情報および／またはカテゴリ辞書情報に登録された正規形、別表記形、または、カテゴリに関する登録状況および利用状況について統計処理を行い、当該統計処理の結果が予め定めた正常値範囲に入っているか否かをチェックするので、統計手法を用いることにより辞書情報の高品質化を自動的に図ることができるようになる。
【００４６】
また、特に辞書情報の登録エントリが膨大になった場合においても、統計的な手法により登録状況の悪いエントリ（例えば、実体エントリが０である場合など）や、利用状況の悪いエントリ（例えば、アクセス数、抽出数が０である場合など）を容易に発見することができるようになる。
【００４７】
また、請求項１５に記載の辞書情報処理装置は、請求項１から１４のいずれか一つに記載の辞書情報処理装置において、上記辞書情報チェック手段は、上記表記辞書情報および／または上記カテゴリ辞書情報に登録された正規形、別表記形、または、カテゴリに関する共起関係に基づいて類似度を計算する共起チェック手段をさらに備えたことを特徴とする。
【００４８】
これは辞書情報チェック手段の一例を一層具体的に示すものである。この装置によれば、表記辞書情報および／またはカテゴリ辞書情報に登録された正規形、別表記形、または、カテゴリに関する共起関係に基づいて類似度を計算するので、エントリ間の類似度を用いた登録内容のチェックやエントリ同士の統廃合の判断が容易にできるようになる。たとえば、エントリ間の類似度が予め定めた類似度よりも高い場合には、それらのエントリを自動的に統合するようにしてもよい。
【００４９】
また、本発明は辞書情報処理方法に関するものであり、請求項１６に記載の辞書情報処理方法は、各用語の正規形と別表記形との対応関係を定義する表記辞書情報を作成する表記辞書作成ステップと、上記正規形の所属するカテゴリを定義するカテゴリ辞書情報を作成するカテゴリ辞書作成ステップと、上記表記辞書情報および／または上記カテゴリ辞書情報に格納された情報をチェックする辞書情報チェックステップとを含むことを特徴とする。
【００５０】
この方法によれば、各用語の正規形と別表記形との対応関係を定義する表記辞書情報を作成し、正規形の所属するカテゴリを定義するカテゴリ辞書情報を作成し、表記辞書情報および／またはカテゴリ辞書情報に格納された情報をチェックするので、文献データベース検索サービスにおいて用いられる各種の表記辞書およびカテゴリ辞書の作成や、作成された辞書のチェックを自動化することができる。また、辞書作成の効率化および高精度化を図ることができる。
【００５１】
また、請求項１７に記載の辞書情報処理方法は、請求項１６に記載の辞書情報処理方法において、上記表記辞書作成ステップは、既存データベースを構成する各フィールドの属性情報に基づいて、各フィールドを正規形とするか、別表記形とするか、または、利用しないかを判断するフィールド属性判断ステップをさらに含み、当該フィールド属性判断ステップの判断結果に基づいて上記既存データベースの各フィールドから上記表記辞書情報を作成することを特徴とする。
【００５２】
これは表記辞書作成ステップの一例を一層具体的に示すものである。この方法によれば、既存データベースを構成する各フィールドの属性情報に基づいて、各フィールドを正規形とするか、別表記形とするか、または、利用しないかを判断し、判断結果に基づいて既存データベースの各フィールドから表記辞書情報を作成するので、既存のデータベースから表記辞書を効率的に作成することができるようになる。
【００５３】
また、請求項１８に記載の辞書情報処理方法は、請求項１６または１７に記載の辞書情報処理方法において、上記表記辞書作成ステップは、既存の辞典情報に記載された用語に基づいて、当該用語を正規形とするか、別表記形とするか、または、利用しないかを判断する辞典用語判断ステップをさらに含み、当該辞典用語判断ステップの判断結果に基づいて上記辞典情報の上記用語から上記表記辞書情報を作成することを特徴とする。
【００５４】
これは表記辞書作成ステップの一例を一層具体的に示すものである。この方法によれば、既存の辞典情報に記載された用語（辞典の見出し語や、略語、同義語、類語などの欄内に記載された用語など）に基づいて、当該用語を正規形とするか、別表記形とするか、または、利用しないかを判断し、判断結果に基づいて辞典情報の用語から表記辞書情報を作成するので、既存の辞典情報から表記辞書を効率的に作成することができるようになる。
【００５５】
また、請求項１９に記載の辞書情報処理方法は、請求項１６から１８のいずれか一つに記載の辞書情報処理方法において、上記表記辞書作成ステップは、既存のＷｅｂ情報に記載された用語に基づいて、当該用語を正規形とするか、別表記形とするか、または、利用しないかを判断するＷｅｂ用語判断ステップをさらに含み、当該Ｗｅｂ用語判断ステップの判断結果に基づいて上記Ｗｅｂ情報の上記用語から上記表記辞書情報を作成することを特徴とする。
【００５６】
これは表記辞書作成ステップの一例を一層具体的に示すものである。この方法によれば、既存のＷｅｂ情報（例えば、既存のＷｅｂサイトに記載された情報や、辞書に登録する用語を収集することを目的とした、参加者が書き込み可能なＷｅｂサイトに書き込まれた情報等を含む）に記載された用語に基づいて、当該用語を正規形とするか、別表記形とするか、または、利用しないかを判断し、判断結果に基づいてＷｅｂ情報の用語から表記辞書情報を作成するので、既存のＷｅｂ情報から表記辞書を効率的に作成することができるようになる。
【００５７】
また、これにより、各人が有する辞書情報の顕在化および共有化を図ることができるようになる。
【００５８】
また、請求項２０に記載の辞書情報処理方法は、請求項１６から１９のいずれか一つに記載の辞書情報処理方法において、上記カテゴリ辞書作成ステップは、既存の構造化データに基づいて、カテゴリ構造情報を作成する構造化データカテゴリ構造情報作成ステップをさらに含み、当該構造化データカテゴリ構造情報作成ステップにより作成された当該カテゴリ構造情報に基づいて上記カテゴリ辞書情報を作成することを特徴とする。
【００５９】
これはカテゴリ辞書作成ステップの一例を一層具体的に示すものである。この方法によれば、既存の構造化データに基づいて、カテゴリ構造情報を作成し、作成された当該カテゴリ構造情報に基づいてカテゴリ辞書情報を作成するので、既存の構造化データにより定義された分類等に基づいて効率的にカテゴリ辞書を作成することができるようになる。
【００６０】
また、請求項２１に記載の辞書情報処理方法は、請求項２０に記載の辞書情報処理方法において、上記構造化データカテゴリ構造情報作成ステップは、上記既存の構造化データに根ノードが複数存在する場合には、その上位に仮想根ノードを追加してカテゴリ構造情報を作成することを特徴とする。
【００６１】
これは構造化データカテゴリ構造情報作成ステップの一例を一層具体的に示すものである。この方法によれば、既存の構造化データに根ノードが複数存在する場合には、その上位に仮想根ノードを追加してカテゴリ構造情報を作成するので、既存の構造化データにより定義された分類等に基づいてさらに効率的にカテゴリ辞書を作成することができるようになる。
【００６２】
また、請求項２２に記載の辞書情報処理方法は、請求項２０または２１に記載の辞書情報処理方法において、上記構造化データカテゴリ構造情報作成ステップは、上記既存の構造化データに合流が存在する場合には、対応する部分構造を当該合流部分に複製することにより合流のない単純な木構造のカテゴリ構造情報を作成することを特徴とする。
【００６３】
これは構造化データカテゴリ構造情報作成ステップの一例を一層具体的に示すものである。この方法によれば、既存の構造化データに合流が存在する場合には、対応する部分構造を当該合流部分に複製することにより合流のない単純な木構造のカテゴリ構造情報を作成するので、既存の構造化データにより定義された分類等に基づいてさらに効率的にカテゴリ辞書を作成することができるようになる。
【００６４】
また、請求項２３に記載の辞書情報処理方法は、請求項１６から２２のいずれか一つに記載の辞書情報処理方法において、上記カテゴリ辞書作成ステップは、既存の集合データに基づいて、根ノードを集合データ名とし、葉ノードを集合要素名とするカテゴリ構造情報を作成する集合カテゴリ構造情報作成ステップをさらに含み、当該集合カテゴリ構造情報作成ステップにより作成された当該カテゴリ構造情報に基づいて上記カテゴリ辞書情報を作成することを特徴とする。
【００６５】
これはカテゴリ辞書作成ステップの一例を一層具体的に示すものである。この方法によれば、既存の集合データに基づいて、根ノードを集合データ名とし、葉ノードを集合要素名とするカテゴリ構造情報を作成し、作成された当該カテゴリ構造情報に基づいてカテゴリ辞書情報を作成するので、既存の集合データにより定義された情報に基づいて効率的にカテゴリ辞書を作成することができるようになる。
【００６６】
また、請求項２４に記載の辞書情報処理方法は、請求項１６から２３のいずれか一つに記載の辞書情報処理方法において、上記カテゴリ辞書作成ステップは、ＭｅＳＨタームデータに基づいてカテゴリ構造情報を作成するＭｅＳＨタームカテゴリ構造情報作成ステップをさらに含み、当該ＭｅＳＨタームカテゴリ構造情報作成ステップにより作成された当該カテゴリ構造情報に基づいて上記カテゴリ辞書情報を作成することを特徴とする。
【００６７】
これはカテゴリ辞書作成ステップの一例を一層具体的に示すものである。この方法によれば、ＭｅＳＨタームデータに基づいてカテゴリ構造情報を作成し、作成された当該カテゴリ構造情報に基づいてカテゴリ辞書情報を作成するので、既存のＭｅＳＨタームデータにより定義された医薬用語等に基づいて効率的にカテゴリ辞書を作成することができるようになる。
【００６８】
また、請求項２５に記載の辞書情報処理方法は、請求項１６から２４のいずれか一つに記載の辞書情報処理方法において、上記カテゴリ辞書作成ステップは、既存のデータベースに基づいて、根ノードを当該既存のデータベース名または格納された特定のフィールドのフィールド名とし、葉ノードを当該データベースまたは当該フィールドに格納された各格納データとするカテゴリ構造情報を作成するデータベースカテゴリ構造情報作成ステップをさらに含み、当該データベースカテゴリ構造情報作成ステップにより作成された当該カテゴリ構造情報に基づいて上記カテゴリ辞書情報を作成することを特徴とする。
【００６９】
これはカテゴリ辞書作成ステップの一例を一層具体的に示すものである。この方法によれば、既存のデータベースに基づいて、根ノードを当該既存のデータベース名または格納された特定のフィールドのフィールド名とし、葉ノードを当該データベースまたは当該フィールドに格納された各格納データとするカテゴリ構造情報を作成し、作成された当該カテゴリ構造情報に基づいてカテゴリ辞書情報を作成するので、既存のデータベースにより定義されたフィールドや格納データ等に基づいて効率的にカテゴリ辞書を作成することができるようになる。
【００７０】
また、請求項２６に記載の辞書情報処理方法は、請求項１６から２５のいずれか一つに記載の辞書情報処理方法において、上記カテゴリ辞書作成ステップは、既存の解析プログラムの処理結果データに基づいて、根ノードを当該既存の処理プログラム名とし、葉ノードを当該処理結果データとするカテゴリ構造情報を作成する解析プログラムカテゴリ構造情報作成ステップをさらに含み、当該解析プログラムカテゴリ構造情報作成ステップにより作成された当該カテゴリ構造情報に基づいて上記カテゴリ辞書情報を作成することを特徴とする。
【００７１】
これはカテゴリ辞書作成ステップの一例を一層具体的に示すものである。この方法によれば、既存の解析プログラムの処理結果データに基づいて、根ノードを当該既存の処理プログラム名とし、葉ノードを当該処理結果データとするカテゴリ構造情報を作成し、作成された当該カテゴリ構造情報に基づいてカテゴリ辞書情報を作成するので、既存の解析プログラムの処理結果データに基づいて効率的にカテゴリ辞書を作成することができるようになる。
【００７２】
また、請求項２７に記載の辞書情報処理方法は、請求項１６から２６のいずれか一つに記載の辞書情報処理方法において、上記辞書情報チェックステップは、チェック用語句リスト、チェック用プログラム、チェック用パターンのうち少なくとも一つに基づいて、上記表記辞書情報および／または上記カテゴリ辞書情報を各エントリ単位にチェックするエントリ単位チェックステップをさらに含むことを特徴とする。
【００７３】
これは辞書情報チェックステップの一例を一層具体的に示すものである。この方法によれば、チェック用語句リスト、チェック用プログラム、チェック用パターンのうち少なくとも一つに基づいて、表記辞書情報および／またはカテゴリ辞書情報を各エントリ単位にチェックするので、チェック項目を予め定めておくことにより辞書情報の高品質化を自動的に図ることができるようになる。
【００７４】
また、これにより、辞書作成においてプログラムのバグ（不具合）や例外処理漏れなどにより混入した不適切なエントリを容易に発見することができるようになる。
【００７５】
また、利用した既存データのエラーに起因する不適切なエントリを容易に発見することができるようになる。
【００７６】
また、テキストマイニング用辞書のエントリとして不適切なエントリを容易に発見することができるようになる。
【００７７】
また、請求項２８に記載の辞書情報処理方法は、請求項１６から２７のいずれか一つに記載の辞書情報処理方法において、上記辞書情報チェックステップは、上記表記辞書情報に登録された別表記形が他の正規形として登録されているか否かをチェックする正規形不整合チェックステップをさらに含むことを特徴とする。
【００７８】
これは辞書情報チェックステップの一例を一層具体的に示すものである。この方法によれば、表記辞書情報に登録された別表記形が他の正規形として登録されているか否かをチェックするので、正規形の不整合を排することにより辞書情報の高品質化を自動的に図ることができるようになる。
【００７９】
また、請求項２９に記載の辞書情報処理方法は、請求項１６から２８のいずれか一つに記載の辞書情報処理方法において、上記辞書情報チェックステップは、上記表記辞書情報および／または上記カテゴリ辞書情報に登録された正規形、別表記形、または、カテゴリに関する登録状況および利用状況について統計処理を行い、当該統計処理の結果が予め定めた正常値範囲に入っているか否かをチェックする統計チェックステップをさらに含むことを特徴とする。
【００８０】
これは辞書情報チェックステップの一例を一層具体的に示すものである。この方法によれば、表記辞書情報および／またはカテゴリ辞書情報に登録された正規形、別表記形、または、カテゴリに関する登録状況および利用状況について統計処理を行い、当該統計処理の結果が予め定めた正常値範囲に入っているか否かをチェックするので、統計手法を用いることにより辞書情報の高品質化を自動的に図ることができるようになる。
【００８１】
また、特に辞書情報の登録エントリが膨大になった場合においても、統計的な手法により登録状況の悪いエントリ（例えば、実体エントリが０である場合など）や、利用状況の悪いエントリ（例えば、アクセス数、抽出数が０である場合など）を容易に発見することができるようになる。
【００８２】
また、請求項３０に記載の辞書情報処理方法は、請求項１６から２９のいずれか一つに記載の辞書情報処理方法において、上記辞書情報チェックステップは、上記表記辞書情報および／または上記カテゴリ辞書情報に登録された正規形、別表記形、または、カテゴリに関する共起関係に基づいて類似度を計算する共起チェックステップをさらに含むことを特徴とする。
【００８３】
これは辞書情報チェックステップの一例を一層具体的に示すものである。この方法によれば、表記辞書情報および／またはカテゴリ辞書情報に登録された正規形、別表記形、または、カテゴリに関する共起関係に基づいて類似度を計算するので、エントリ間の類似度を用いた登録内容のチェックやエントリ同士の統廃合の判断が容易にできるようになる。たとえば、エントリ間の類似度が予め定めた類似度よりも高い場合には、それらのエントリを自動的に統合するようにしてもよい。
【００８４】
また、本発明はプログラムに関するものであり、請求項３１に記載のプログラムは、各用語の正規形と別表記形との対応関係を定義する表記辞書情報を作成する表記辞書作成ステップと、上記正規形の所属するカテゴリを定義するカテゴリ辞書情報を作成するカテゴリ辞書作成ステップと、上記表記辞書情報および／または上記カテゴリ辞書情報に格納された情報をチェックする辞書情報チェックステップとを含む辞書情報処理方法をコンピュータに実行させることを特徴とする。
【００８５】
このプログラムによれば、各用語の正規形と別表記形との対応関係を定義する表記辞書情報を作成し、正規形の所属するカテゴリを定義するカテゴリ辞書情報を作成し、表記辞書情報および／またはカテゴリ辞書情報に格納された情報をチェックするので、文献データベース検索サービスにおいて用いられる各種の表記辞書およびカテゴリ辞書の作成や、作成された辞書のチェックを自動化することができる。また、辞書作成の効率化および高精度化を図ることができる。
【００８６】
また、請求項３２に記載のプログラムは、請求項３１に記載のプログラムにおいて、上記表記辞書作成ステップは、既存データベースを構成する各フィールドの属性情報に基づいて、各フィールドを正規形とするか、別表記形とするか、または、利用しないかを判断するフィールド属性判断ステップをさらに含み、当該フィールド属性判断ステップの判断結果に基づいて上記既存データベースの各フィールドから上記表記辞書情報を作成することを特徴とする。
【００８７】
これは表記辞書作成ステップの一例を一層具体的に示すものである。このプログラムによれば、既存データベースを構成する各フィールドの属性情報に基づいて、各フィールドを正規形とするか、別表記形とするか、または、利用しないかを判断し、判断結果に基づいて既存データベースの各フィールドから表記辞書情報を作成するので、既存のデータベースから表記辞書を効率的に作成することができるようになる。
【００８８】
また、請求項３３に記載のプログラムは、請求項３１または３２に記載のプログラムにおいて、上記表記辞書作成ステップは、既存の辞典情報に記載された用語に基づいて、当該用語を正規形とするか、別表記形とするか、または、利用しないかを判断する辞典用語判断ステップをさらに含み、当該辞典用語判断ステップの判断結果に基づいて上記辞典情報の上記用語から上記表記辞書情報を作成することを特徴とする。
【００８９】
これは表記辞書作成ステップの一例を一層具体的に示すものである。このプログラムによれば、既存の辞典情報に記載された用語（辞典の見出し語や、略語、同義語、類語などの欄内に記載された用語など）に基づいて、当該用語を正規形とするか、別表記形とするか、または、利用しないかを判断し、判断結果に基づいて辞典情報の用語から表記辞書情報を作成するので、既存の辞典情報から表記辞書を効率的に作成することができるようになる。
【００９０】
また、請求項３４に記載のプログラムは、請求項３１から３３のいずれか一つに記載のプログラムにおいて、上記表記辞書作成ステップは、既存のＷｅｂ情報に記載された用語に基づいて、当該用語を正規形とするか、別表記形とするか、または、利用しないかを判断するＷｅｂ用語判断ステップをさらに含み、当該Ｗｅｂ用語判断ステップの判断結果に基づいて上記Ｗｅｂ情報の上記用語から上記表記辞書情報を作成することを特徴とする。
【００９１】
これは表記辞書作成ステップの一例を一層具体的に示すものである。このプログラムによれば、既存のＷｅｂ情報（例えば、既存のＷｅｂサイトに記載された情報や、辞書に登録する用語を収集することを目的とした、参加者が書き込み可能なＷｅｂサイトに書き込まれた情報等を含む）に記載された用語に基づいて、当該用語を正規形とするか、別表記形とするか、または、利用しないかを判断し、判断結果に基づいてＷｅｂ情報の用語から表記辞書情報を作成するので、既存のＷｅｂ情報から表記辞書を効率的に作成することができるようになる。
【００９２】
また、これにより、各人が有する辞書情報の顕在化および共有化を図ることができるようになる。
【００９３】
また、請求項３５に記載のプログラムは、請求項３１から３４のいずれか一つに記載のプログラムにおいて、上記カテゴリ辞書作成ステップは、既存の構造化データに基づいて、カテゴリ構造情報を作成する構造化データカテゴリ構造情報作成ステップをさらに含み、当該構造化データカテゴリ構造情報作成ステップにより作成された当該カテゴリ構造情報に基づいて上記カテゴリ辞書情報を作成することを特徴とする。
【００９４】
これはカテゴリ辞書作成ステップの一例を一層具体的に示すものである。このプログラムによれば、既存の構造化データに基づいて、カテゴリ構造情報を作成し、作成された当該カテゴリ構造情報に基づいてカテゴリ辞書情報を作成するので、既存の構造化データにより定義された分類等に基づいて効率的にカテゴリ辞書を作成することができるようになる。
【００９５】
また、請求項３６に記載のプログラムは、請求項３５に記載のプログラムにおいて、上記構造化データカテゴリ構造情報作成ステップは、上記既存の構造化データに根ノードが複数存在する場合には、その上位に仮想根ノードを追加してカテゴリ構造情報を作成することを特徴とする。
【００９６】
これは構造化データカテゴリ構造情報作成ステップの一例を一層具体的に示すものである。このプログラムによれば、既存の構造化データに根ノードが複数存在する場合には、その上位に仮想根ノードを追加してカテゴリ構造情報を作成するので、既存の構造化データにより定義された分類等に基づいてさらに効率的にカテゴリ辞書を作成することができるようになる。
【００９７】
また、請求項３７に記載のプログラムは、請求項３５または３６に記載のプログラムにおいて、上記構造化データカテゴリ構造情報作成ステップは、上記既存の構造化データに合流が存在する場合には、対応する部分構造を当該合流部分に複製することにより合流のない単純な木構造のカテゴリ構造情報を作成することを特徴とする。
【００９８】
これは構造化データカテゴリ構造情報作成ステップの一例を一層具体的に示すものである。このプログラムによれば、既存の構造化データに合流が存在する場合には、対応する部分構造を当該合流部分に複製することにより合流のない単純な木構造のカテゴリ構造情報を作成するので、既存の構造化データにより定義された分類等に基づいてさらに効率的にカテゴリ辞書を作成することができるようになる。
【００９９】
また、請求項３８に記載のプログラムは、請求項３１から３７のいずれか一つに記載のプログラムにおいて、上記カテゴリ辞書作成ステップは、既存の集合データに基づいて、根ノードを集合データ名とし、葉ノードを集合要素名とするカテゴリ構造情報を作成する集合カテゴリ構造情報作成ステップをさらに含み、当該集合カテゴリ構造情報作成ステップにより作成された当該カテゴリ構造情報に基づいて上記カテゴリ辞書情報を作成することを特徴とする。
【０１００】
これはカテゴリ辞書作成ステップの一例を一層具体的に示すものである。このプログラムによれば、既存の集合データに基づいて、根ノードを集合データ名とし、葉ノードを集合要素名とするカテゴリ構造情報を作成し、作成された当該カテゴリ構造情報に基づいてカテゴリ辞書情報を作成するので、既存の集合データにより定義された情報に基づいて効率的にカテゴリ辞書を作成することができるようになる。
【０１０１】
また、請求項３９に記載のプログラムは、請求項３１から３８のいずれか一つに記載のプログラムにおいて、上記カテゴリ辞書作成ステップは、ＭｅＳＨタームデータに基づいてカテゴリ構造情報を作成するＭｅＳＨタームカテゴリ構造情報作成ステップをさらに含み、当該ＭｅＳＨタームカテゴリ構造情報作成ステップにより作成された当該カテゴリ構造情報に基づいて上記カテゴリ辞書情報を作成することを特徴とする。
【０１０２】
これはカテゴリ辞書作成ステップの一例を一層具体的に示すものである。このプログラムによれば、ＭｅＳＨタームデータに基づいてカテゴリ構造情報を作成し、作成された当該カテゴリ構造情報に基づいてカテゴリ辞書情報を作成するので、既存のＭｅＳＨタームデータにより定義された医薬用語等に基づいて効率的にカテゴリ辞書を作成することができるようになる。
【０１０３】
また、請求項４０に記載のプログラムは、請求項３１から３９のいずれか一つに記載のプログラムにおいて、上記カテゴリ辞書作成ステップは、既存のデータベースに基づいて、根ノードを当該既存のデータベース名または格納された特定のフィールドのフィールド名とし、葉ノードを当該データベースまたは当該フィールドに格納された各格納データとするカテゴリ構造情報を作成するデータベースカテゴリ構造情報作成ステップをさらに含み、当該データベースカテゴリ構造情報作成ステップにより作成された当該カテゴリ構造情報に基づいて上記カテゴリ辞書情報を作成することを特徴とする。
【０１０４】
これはカテゴリ辞書作成ステップの一例を一層具体的に示すものである。このプログラムによれば、既存のデータベースに基づいて、根ノードを当該既存のデータベース名または格納された特定のフィールドのフィールド名とし、葉ノードを当該データベースまたは当該フィールドに格納された各格納データとするカテゴリ構造情報を作成し、作成された当該カテゴリ構造情報に基づいてカテゴリ辞書情報を作成するので、既存のデータベースにより定義されたフィールドや格納データ等に基づいて効率的にカテゴリ辞書を作成することができるようになる。
【０１０５】
また、請求項４１に記載のプログラムは、請求項３１から４０のいずれか一つに記載のプログラムにおいて、上記カテゴリ辞書作成ステップは、既存の解析プログラムの処理結果データに基づいて、根ノードを当該既存の処理プログラム名とし、葉ノードを当該処理結果データとするカテゴリ構造情報を作成する解析プログラムカテゴリ構造情報作成ステップをさらに含み、当該解析プログラムカテゴリ構造情報作成ステップにより作成された当該カテゴリ構造情報に基づいて上記カテゴリ辞書情報を作成することを特徴とする。
【０１０６】
これはカテゴリ辞書作成ステップの一例を一層具体的に示すものである。このプログラムによれば、既存の解析プログラムの処理結果データに基づいて、根ノードを当該既存の処理プログラム名とし、葉ノードを当該処理結果データとするカテゴリ構造情報を作成し、作成された当該カテゴリ構造情報に基づいてカテゴリ辞書情報を作成するので、既存の解析プログラムの処理結果データに基づいて効率的にカテゴリ辞書を作成することができるようになる。
【０１０７】
また、請求項４２に記載のプログラムは、請求項３１から４１のいずれか一つに記載のプログラムにおいて、上記辞書情報チェックステップは、チェック用語句リスト、チェック用プログラム、チェック用パターンのうち少なくとも一つに基づいて、上記表記辞書情報および／または上記カテゴリ辞書情報を各エントリ単位にチェックするエントリ単位チェックステップをさらに含むことを特徴とする。
【０１０８】
これは辞書情報チェックステップの一例を一層具体的に示すものである。このプログラムによれば、チェック用語句リスト、チェック用プログラム、チェック用パターンのうち少なくとも一つに基づいて、表記辞書情報および／またはカテゴリ辞書情報を各エントリ単位にチェックするので、チェック項目を予め定めておくことにより辞書情報の高品質化を自動的に図ることができるようになる。
【０１０９】
また、これにより、辞書作成においてプログラムのバグ（不具合）や例外処理漏れなどにより混入した不適切なエントリを容易に発見することができるようになる。
【０１１０】
また、利用した既存データのエラーに起因する不適切なエントリを容易に発見することができるようになる。
【０１１１】
また、テキストマイニング用辞書のエントリとして不適切なエントリを容易に発見することができるようになる。
【０１１２】
また、請求項４３に記載のプログラムは、請求項３１から４２のいずれか一つに記載のプログラムにおいて、上記辞書情報チェックステップは、上記表記辞書情報に登録された別表記形が他の正規形として登録されているか否かをチェックする正規形不整合チェックステップをさらに含むことを特徴とする。
【０１１３】
これは辞書情報チェックステップの一例を一層具体的に示すものである。このプログラムによれば、表記辞書情報に登録された別表記形が他の正規形として登録されているか否かをチェックするので、正規形の不整合を排することにより辞書情報の高品質化を自動的に図ることができるようになる。
【０１１４】
また、請求項４４に記載のプログラムは、請求項３１から４３のいずれか一つに記載のプログラムにおいて、上記辞書情報チェックステップは、上記表記辞書情報および／または上記カテゴリ辞書情報に登録された正規形、別表記形、または、カテゴリに関する登録状況および利用状況について統計処理を行い、当該統計処理の結果が予め定めた正常値範囲に入っているか否かをチェックする統計チェックステップをさらに含むことを特徴とする。
【０１１５】
これは辞書情報チェックステップの一例を一層具体的に示すものである。このプログラムによれば、表記辞書情報および／またはカテゴリ辞書情報に登録された正規形、別表記形、または、カテゴリに関する登録状況および利用状況について統計処理を行い、当該統計処理の結果が予め定めた正常値範囲に入っているか否かをチェックするので、統計手法を用いることにより辞書情報の高品質化を自動的に図ることができるようになる。
【０１１６】
また、特に辞書情報の登録エントリが膨大になった場合においても、統計的な手法により登録状況の悪いエントリ（例えば、実体エントリが０である場合など）や、利用状況の悪いエントリ（例えば、アクセス数、抽出数が０である場合など）を容易に発見することができるようになる。
【０１１７】
また、請求項４５に記載のプログラムは、請求項３１から４４のいずれか一つに記載のプログラムにおいて、上記辞書情報チェックステップは、上記表記辞書情報および／または上記カテゴリ辞書情報に登録された正規形、別表記形、または、カテゴリに関する共起関係に基づいて類似度を計算する共起チェックステップをさらに含むことを特徴とする。
【０１１８】
これは辞書情報チェックステップの一例を一層具体的に示すものである。このプログラムによれば、表記辞書情報および／またはカテゴリ辞書情報に登録された正規形、別表記形、または、カテゴリに関する共起関係に基づいて類似度を計算するので、エントリ間の類似度を用いた登録内容のチェックやエントリ同士の統廃合の判断が容易にできるようになる。たとえば、エントリ間の類似度が予め定めた類似度よりも高い場合には、それらのエントリを自動的に統合するようにしてもよい。
【０１１９】
また、本発明は記録媒体に関するものであり、請求項４６に記載の記録媒体は、上記請求項３１から４５のいずれか一つに記載されたプログラムを記録したことを特徴とする。
【０１２０】
この記録媒体によれば、当該記録媒体に記録されたプログラムをコンピュータに読み取らせて実行することによって、請求項３１から４５のいずれか一つに記載されたプログラムをコンピュータを利用して実現することができ、これら各方法と同様の効果を得ることができる。
【０１２１】
【発明の実施の形態】
以下に、本発明にかかる辞書情報処理装置、辞書情報処理方法、プログラム、および、記録媒体の実施の形態を図面に基づいて詳細に説明する。なお、この実施の形態によりこの発明が限定されるものではない。
特に、以下の実施の形態においては、本発明を、生物や医学や科学等の自然科学系の文献の文献情報データベース検索システムに適用した例について説明するが、この場合に限られず、全ての分野の文献情報を検索する全てのシステムにおいて、同様に適用することができる。
【０１２２】
［本発明の概要］
以下、本発明の概要について説明し、その後、本発明の構成および処理等について詳細に説明する。図１は本発明の基本原理を示す原理構成図である。
【０１２３】
本発明は、概略的に、以下の基本的特徴を有する。すなわち、本発明は、既存の構造化データ、集合、データベース、解析プログラム処理結果等に基づいて、各用語の正規形と別表記形との対応関係を定義する表記辞書情報、および、正規形の所属するカテゴリを定義するカテゴリ辞書情報を自動的に作成する。
【０１２４】
そして、本発明は、各種のチェック手法を用いて表記辞書情報および／またはカテゴリ辞書情報に格納された情報を自動的にあるいは半自動的にチェックする。ここで、本発明のチェック手法として、例えば、チェック用語句リスト、チェック用プログラム、チェック用パターン等に基づいて、表記辞書情報およびカテゴリ辞書情報の各エントリをチェックしてもよく、また、辞書情報の全体に対して正規形不整合チェック、統計チェック、共起チェック等を行ってもよい。
ここで、辞書情報の作成手法やチェック手法の詳細については、後述する。
【０１２５】
［システム構成］
まず、本システムの構成について説明する。図２は、本発明が適用される本システムの構成の一例を示すブロック図であり、該構成のうち本発明に関係する部分のみを概念的に示している。本システムは、概略的に、辞書情報処理装置１００と、文献情報や配列情報や立体構造情報等に関する外部データベースや各種の検索サービス等の外部プログラム等を提供する外部システム２００とを、ネットワーク３００を介して通信可能に接続して構成されている。
【０１２６】
図２においてネットワーク３００は、辞書情報処理装置１００と外部システム２００とを相互に接続する機能を有し、例えば、インターネット等である。
【０１２７】
図２において外部システム２００は、ネットワーク３００を介して、辞書情報処理装置１００と相互に接続され、利用者に対して配列情報等に関する外部データベースやホモロジー検索やモチーフ検索等の外部プログラムを実行するウェブサイトを提供する機能を有する。
【０１２８】
ここで、外部システム２００は、ＷＥＢサーバやＡＳＰサーバ等として構成してもよく、そのハードウェア構成は、一般に市販されるワークステーション、パーソナルコンピュータ等の情報処理装置およびその付属装置により構成してもよい。また、外部システム２００の各機能は、外部システム２００のハードウェア構成中のＣＰＵ、ディスク装置、メモリ装置、入力装置、出力装置、通信制御装置等およびそれらを制御するプログラム等により実現される。
【０１２９】
図２において辞書情報処理装置１００は、概略的に、辞書情報処理装置１００の全体を統括的に制御するＣＰＵ等の制御部１０２、通信回線等に接続されるルータ等の通信装置（図示せず）に接続される通信制御インターフェース部１０４、入力装置１１２や出力装置１１４に接続される入出力制御インターフェース部１０８、および、各種のデータベースやテーブルなどを格納する記憶部１０６を備えて構成されており、これら各部は任意の通信路を介して通信可能に接続されている。さらに、この辞書情報処理装置１００は、ルータ等の通信装置および専用線等の有線または無線の通信回線を介して、ネットワーク３００に通信可能に接続されている。
【０１３０】
記憶部１０６に格納される各種のデータベースやテーブル（表記辞書情報ファイル１０６ａ〜チェック用パターンファイル１０６ｆ）は、固定ディスク装置等のストレージ手段であり、各種処理に用いる各種のプログラムやテーブルやファイルやデータベースやウェブページ用ファイル等を格納する。
【０１３１】
これら記憶部１０６の各構成要素のうち、表記辞書情報ファイル１０６ａは、各用語の正規形と別表記形との対応関係を定義する表記辞書情報を格納した表記辞書情報格納手段である。
【０１３２】
また、カテゴリ辞書情報ファイル１０６ｂは、正規形の所属するカテゴリを定義するカテゴリ辞書情報を格納するカテゴリ辞書情報格納手段である。
【０１３３】
また、文書情報ファイル１０６ｃは、解析対象の文書情報などの情報等を格納する文書情報格納手段である。
【０１３４】
また、既存情報格納ファイル１０６ｄは、既存の構造化データ、集合、データベース、解析対象プログラム処理結果、辞典等に関する情報を格納する既存情報格納手段である。
【０１３５】
また、チェック用語句リストファイル１０６ｅは、チェック用語句リストを格納したチェック用語句リスト格納手段である。
【０１３６】
また、チェック用パターンファイル１０６ｆは、チェック用パターンを格納したチェック用パターン格納手段である。
【０１３７】
また、図２において、通信制御インターフェース部１０４は、辞書情報処理装置１００とネットワーク３００（またはルータ等の通信装置）との間における通信制御を行う。すなわち、通信制御インターフェース部１０４は、他の端末と通信回線を介してデータを通信する機能を有する。
【０１３８】
また、図２において、入出力制御インターフェース部１０８は、入力装置１１２や出力装置１１４の制御を行う。ここで、出力装置１１４としては、モニタ（家庭用テレビを含む）の他、スピーカを用いることができる（なお、以下においては出力装置１１４をモニタとして記載する場合がある）。また、入力装置１１２としては、キーボード、マウス、および、マイク等を用いることができる。また、モニタも、マウスと協働してポインティングデバイス機能を実現する。
【０１３９】
また、図２において、制御部１０２は、ＯＳ（Ｏｐｅｒａｔｉｎｇ　Ｓｙｓｔｅｍ）等の制御プログラム、各種の処理手順等を規定したプログラム、および所要データを格納するための内部メモリを有し、これらのプログラム等により、種々の処理を実行するための情報処理を行う。制御部１０２は、機能概念的に、表記辞書作成部１０２ａ、カテゴリ辞書作成部１０２ｂ、辞書情報チェック部１０２ｃ、処理結果出力部１０２ｄ、解析プログラム部１０２ｅ、および、名寄せ処理部１０２ｆを備えて構成されている。
【０１４０】
このうち、表記辞書作成部１０２ａは、各用語の正規形と別表記形との対応関係を定義する表記辞書情報を作成する表記辞書作成手段である。ここで、表記辞書作成部１０２ａは、図３に示すように、フィールド属性判断部１０２ｇ、辞典用語判断部１０２ｈ、および、Ｗｅｂ用語判断部１０２ｉを備えて構成されている。フィールド属性判断部１０２ｇは、既存データベースを構成する各フィールドの属性情報に基づいて、各フィールドを正規形とするか、別表記形とするか、または、利用しないかを判断するフィールド属性判断手段である。また、辞典用語判断部１０２ｈは、既存の辞典情報に記載された用語（辞典の見出し語や、略語、同義語、類語などの欄内に記載された用語など）に基づいて、当該用語を正規形とするか、別表記形とするか、または、利用しないかを判断する辞典用語判断手段である。また、Ｗｅｂ用語判断部１０２ｉは、既存のＷｅｂ情報に記載された用語に基づいて、当該用語を正規形とするか、別表記形とするか、または、利用しないかを判断するＷｅｂ用語判断手段である。
【０１４１】
また、カテゴリ辞書作成部１０２ｂは、正規形の所属するカテゴリを定義するカテゴリ辞書情報を作成するカテゴリ辞書作成手段である。ここで、カテゴリ辞書作成部１０２ｂは、図４に示すように、構造化データカテゴリ構造情報作成部１０２ｊ、集合カテゴリ構造情報作成部１０２ｋ、ＭｅＳＨタームカテゴリ構造情報作成部１０２ｍ、データベースカテゴリ構造情報作成部１０２ｎ、および、解析プログラムカテゴリ構造情報作成部１０２ｐを備えて構成される。構造化データカテゴリ構造情報作成部１０２ｊは、既存の構造化データに基づいて、カテゴリ構造情報を作成する構造化データカテゴリ構造情報作成手段である。また、集合カテゴリ構造情報作成部１０２ｋは、既存の集合データに基づいて、根ノードを集合データ名とし、葉ノードを集合要素名とするカテゴリ構造情報を作成する集合カテゴリ構造情報作成手段である。また、ＭｅＳＨタームカテゴリ構造情報作成部１０２ｍは、ＭｅＳＨタームデータに基づいてカテゴリ構造情報を作成するＭｅＳＨタームカテゴリ構造情報作成手段である。また、データベースカテゴリ構造情報作成部１０２ｎは、既存のデータベースに基づいて、根ノードを当該既存のデータベース名または格納された特定のフィールドのフィールド名とし、葉ノードを当該データベースまたは当該フィールドに格納された各格納データとするカテゴリ構造情報を作成するデータベースカテゴリ構造情報作成手段である。また、解析プログラムカテゴリ構造情報作成部１０２ｐは、既存の解析プログラムの処理結果データに基づいて、根ノードを当該既存の処理プログラム名とし、葉ノードを当該処理結果データとするカテゴリ構造情報を作成する解析プログラムカテゴリ構造情報作成手段である。
【０１４２】
また、辞書情報チェック部１０２ｃは、表記辞書情報および／またはカテゴリ辞書情報に格納された情報をチェックする辞書情報チェック手段である。ここで、辞書情報チェック部１０２ｃは、図５に示すように、正規形不整合チェック部１０２ｒ、統計チェック部１０２ｓ、共起チェック部１０２ｔ、および、エントリ単位チェック部１０２ｕを備えて構成される。正規形不整合チェック部１０２ｒは、表記辞書情報に登録された別表記形が他の正規形として登録されているか否かをチェックする正規形不整合チェック手段である。また、統計チェック部１０２ｓは、表記辞書情報および／またはカテゴリ辞書情報に登録された正規形、別表記形、または、カテゴリに関する登録状況および利用状況について統計処理を行い、当該統計処理の結果が予め定めた正常値範囲に入っているか否かをチェックする統計チェック手段である。また、共起チェック部１０２ｔは、表記辞書情報および／またはカテゴリ辞書情報に登録された正規形、別表記形、または、カテゴリに関する共起関係に基づいて類似度を計算する共起チェック手段である。また、エントリ単位チェック部１０２ｕは、チェック用語句リスト、チェック用プログラム、チェック用パターンのうち少なくとも一つに基づいて、表記辞書情報および／またはカテゴリ辞書情報を各エントリ単位にチェックするエントリ単位チェック手段である。
【０１４３】
また、処理結果出力部１０２ｄは、処理結果を出力装置１１４に出力する処理結果出力手段である。
【０１４４】
また、解析プログラム部１０２ｅは、各種の解析プログラムを実行する解析プログラム実行手段である。
【０１４５】
また、名寄せ処理部１０２ｆは、正規形辞書に登録された各用語について小文字化や単数形化することにより同一の用語となるものを名寄せする名寄せ処理手段である。
なお、これら各部によって行なわれる処理の詳細については、後述する。
【０１４６】
［システムの処理］
次に、このように構成された本実施の形態における本システムの処理の一例について、以下に図６〜図２９などを参照して詳細に説明する。
【０１４７】
［既存データベースを用いた表記辞書情報の自動作成処理］
まず、既存データベースを用いた表記辞書情報の自動作成処理の詳細について図６および図７を参照して説明する。図６および図７は、本実施形態における本システムの既存データベースを用いた表記辞書情報の自動作成処理の一例を示す概念図である。
【０１４８】
まず、図６に示すように、辞書情報処理装置１００は、フィールド属性判断部１０２ｇの処理により、既存情報格納ファイル１０６ｄ等に格納された既存データベースや外部システム２００の外部データベース等に格納された各フィールドの属性情報に基づいて、各フィールドを正規形とするか、別表記形とするか、または、利用しないかを判断する。
【０１４９】
そして、辞書情報処理装置１００は、表記辞書作成部１０２ａの処理により、当該判断結果に基づいて既存データベースの各フィールドから表記辞書情報を作成して、表記辞書情報ファイル１０６ａに格納する。ここで、ゲノム情報データベースなどの既存データベースを用いる場合には、レコードＩＤや、Ａｃｃｅｓｓｉｏｎ番号などのレコードや遺伝子等に一意に対応付けられたフィールドは、当該レコードや遺伝子等を正規形とした際の別表記形として表記辞書情報を作成してもよい。
【０１５０】
また、図７に示すように、既存データベースに格納されたレコードが他のデータベース（図７の例ではデータベース１）のレコード（図７の例ではレコードＸ）を参照する場合には、参照先のレコード（図７の例ではデータベース１のレコードＸ）に基づいて作成した表記辞書情報を参照することにより、既登録の表示辞書情報を有効利用することができる。
これにて、既存データベースを用いた表記辞書情報の自動作成処理が終了する。
【０１５１】
［既存の辞典情報を用いた表記辞書情報の自動作成処理］
次に、既存の辞典情報を用いた表記辞書情報の自動作成処理の詳細について図８を参照して説明する。図８は、本実施形態における本システムの既存の辞典情報を用いた表記辞書情報の自動作成処理の一例を示す概念図である。
【０１５２】
図８に示すように、辞書情報処理装置１００は、辞典用語判断部１０２ｈの処理により、既存情報格納ファイル１０６ｄ等に格納された既存の辞典情報に記載された用語（辞典の見出し語や、略語、同義語、類語などの欄内に記載された用語など）に基づいて、当該用語を正規形とするか、別表記形とするか、または、利用しないかを判断する。例えば、各辞典情報の見出し語を「正規形」とし、同義語などを「別表記形」とし、また、意味や例文等を「利用しない」と判断する。
【０１５３】
そして、辞書情報処理装置１００は、表記辞書作成部１０２ａの処理により、当該判断結果に基づいて既存辞典情報の用語から表記辞書情報を作成して、表記辞書情報ファイル１０６ａに格納する。なお、辞書情報は電子媒体の辞書を用いてもよく、また、紙媒体の辞書をスキャナー等の入力装置１１２を用いて読み込み既知のテキスト化ツール（ＯＣＲ）などを利用して電子化したものを用いてもよい。
これにて、既存の辞典情報を用いた表記辞書情報の自動作成処理が終了する。
【０１５４】
［既存のＷｅｂ情報を用いた表記辞書情報の自動作成処理］
次に、既存のＷｅｂ情報を用いた表記辞書情報の自動作成処理の詳細について図９を参照して説明する。図９は、本実施形態における本システムの既存のＷｅｂ情報を用いた表記辞書情報の自動作成処理の一例を示す概念図である。
【０１５５】
図９に示すように、辞書情報処理装置１００は、Ｗｅｂ用語判断部１０２ｉの処理により、既存情報格納ファイル１０６ｄ等に格納された既存のＷｅｂ情報（例えば、既存のＷｅｂサイトに記載された情報や、辞書に登録する用語を収集することを目的とした、参加者が書き込み可能なＷｅｂサイトに書き込まれた情報等を含む）に記載された用語に基づいて、当該用語を正規形とするか、別表記形とするか、または、利用しないかを判断する。また、Ｗｅｂ用語判断部１０２ｉは、この参加者が書き込み可能なＷｅｂサイトを参加者の端末に表示する機能、参加者がＷｅｂサイト上に書き込むための編集機能、書き込まれた情報の収集機能などを提供するが、これらの機能は既存のＷｅｂサイト運用技術を利用することにより実現できる。
【０１５６】
そして、辞書情報処理装置１００は、表記辞書作成部１０２ａの処理により、当該判断結果に基づいて既存Ｗｅｂ情報の用語から表記辞書情報を作成して、表記辞書情報ファイル１０６ａに格納する。例えば、各Ｗｅｂページの作成者で本サービスに参加する者が作成した参加者固有の辞書を集積して表記辞書を作成してもよい。すなわち、各参加者固有辞書に登録された用語のうち、当該用語を正規形とするか、別表記形とするか、または、利用しないかを判断し、選択された正規形および別表記形の情報に基づいて表記辞書情報を作成する。これにより、各人が有する辞書情報の顕在化および共有化を図ることができるようになる。
これにて、既存のＷｅｂ情報を用いた表記辞書情報の自動作成処理が終了する。
【０１５７】
［既存の構造化データを用いたカテゴリ辞書情報の自動作成処理］
次に、既存の構造化データを用いたカテゴリ辞書情報の自動作成処理の詳細について図１０〜図１２を参照して説明する。図１０〜図１２は、本実施形態における本システムの既存の構造化データを用いたカテゴリ辞書情報の自動作成処理の一例を示す概念図である。
【０１５８】
まず、図１０に示すように、辞書情報処理装置１００は、構造化データカテゴリ構造情報作成部１０２ｊの処理により、既存情報格納ファイル１０６ｄ等に格納された既存の構造化データに基づいて、カテゴリ構造情報を作成する。そして、カテゴリ辞書作成部１０２ｂは、当該カテゴリ構造情報に基づいてカテゴリ辞書情報を作成し、カテゴリ辞書情報ファイル１０６ｂに格納する。ここで、図１０に示すように、作業手順としてはカテゴリ構造を作成した後にカテゴリ辞書を作成するが、データの依存関係（何に基づいてかにを作成するか）としてはカテゴリ構造もカテゴリ辞書も既存の構造化データに基づいて作成されている。
【０１５９】
また、図１１に示すように、既存の構造化データに根ノードが複数存在する場合（このような構造を森構造と呼ぶことがある）には、構造化データカテゴリ構造情報作成部１０２ｊは、その上位に仮想根ノードを追加してカテゴリ構造情報を作成する。これにより、常にカテゴリ構造を単純な木構造として扱うことができるため検索アルゴリズムを単純化することが可能になる。
【０１６０】
また、図１２に示すように、既存の構造化データに合流が存在する場合（このような構造をＤＡＧ（有向非周回グラフ）構造と呼ぶことがある。）には、構造化データカテゴリ構造情報作成部１０２ｊは、対応する部分構造を当該合流部分に複製することにより合流のない単純な木構造のカテゴリ構造情報を作成する。これにより、常にカテゴリ構造を単純な木構造として扱うことができるため検索アルゴリズムを単純化することが可能になる。
これにて、既存の構造化データを用いたカテゴリ辞書情報の自動作成処理が終了する。
【０１６１】
［既存の集合データを用いたカテゴリ辞書情報の自動作成処理］
次に、既存の集合データを用いたカテゴリ辞書情報の自動作成処理の詳細について図１３を参照して説明する。図１３は、本実施形態における本システムの既存の集合データを用いたカテゴリ辞書情報の自動作成処理の一例を示す概念図である。
【０１６２】
まず、辞書情報処理装置１００は、集合カテゴリ構造情報作成部１０２ｋの処理により、既存情報格納ファイル１０６ｄ等に格納された既存の集合データに基づいて、根ノードを集合データ名とし、葉ノードを集合要素名とするカテゴリ構造情報を作成する。そして、カテゴリ辞書作成部１０２ｂは、当該カテゴリ構造情報に基づいてカテゴリ辞書情報を作成し、カテゴリ辞書情報ファイル１０６ｂに格納する。ここで、図１３に示すように、作業手順としてはカテゴリ構造を作成した後にカテゴリ辞書を作成するが、データの依存関係（何に基づいてかにを作成するか）としてはカテゴリ構造もカテゴリ辞書も既存の集合データに基づいて作成されている。
【０１６３】
例えば、「ゲノム解読済み生物」という既存の集合の中に、｛線虫，ヒト，大腸菌｝という集合要素があった場合には、根ノードを「ゲノム解読済み生物」とし、葉ノードを「線虫、ヒト、大腸菌」とするカテゴリ構造情報を作成して、当該カテゴリ構造情報に基づいてカテゴリ辞書情報を作成する。
これにて、既存の集合データを用いたカテゴリ辞書情報の自動作成処理が終了する。
【０１６４】
［既存のＭｅＳＨタームデータを用いたカテゴリ辞書情報の自動作成処理］
次に、既存のＭｅＳＨタームデータを用いたカテゴリ辞書情報の自動作成処理の詳細について図１４〜図１６を参照して説明する。図１４〜図１６は、本実施形態における本システムの既存のＭｅＳＨタームデータを用いたカテゴリ辞書情報の自動作成処理の一例を示す概念図である。
【０１６５】
まず、辞書情報処理装置１００は、図１４に示すように、ＭｅＳＨタームカテゴリ構造情報作成部１０２ｍの処理により、既存情報格納ファイル１０６ｄなどに格納された既存のＭｅＳＨタームデータ等の複雑なデータ構造を持つデータに基づいてカテゴリ構造情報を作成する。
【０１６６】
ここで、ＭｅＳＨでは、ＭｅＳＨタームの主たる構造を、ＤｔｅｒｍのＤＡＧ構造により示す。このようなＤｔｅｒｍのＤＡＧ構造は、図１５に示すように上述した方法により合流のない単純な木構造にすることによりカテゴリ構造となりうる。また、各Ｄｔｅｒｍ毎に付加可能なＱｔｅｒｍが規定されており、ＤｔｅｒｍとＱｔｅｒｍとの対応関係が定義されている。ここで、単純にＣｔｅｒｍとＱｔｅｒｍの間の関係を無視してカテゴリ辞書情報を作成し、カテゴリ辞書情報ファイル１０６ｂに格納する方法がある。図１６に示すように、ＤｔｅｒｍとＱｔｅｒｍとの対応関係も同様にカテゴリ構造となりうる。また、Ｃｔｅｒｍは、ＤｔｅｒｍとＱｔｅｒｍの組（単数または複数）と対応する語句であり、正規形の候補となりうる。このように、ＭｅＳＨタームカテゴリ構造情報作成部１０２ｍは、Ｄｔｅｒｍ、Ｑｔｅｒｍ、および、Ｃｔｅｒｍからカテゴリ構造情報を作成する。そして、カテゴリ辞書作成部１０２ｂは、当該カテゴリ構造情報に基づいてカテゴリ辞書情報を作成し、カテゴリ辞書情報ファイル１０６ｂに格納する。
これにて、既存のＭｅＳＨタームデータを用いたカテゴリ辞書情報の自動作成処理が終了する。
【０１６７】
［既存のデータベースを用いたカテゴリ辞書情報の自動作成処理］
次に、既存のデータベースを用いたカテゴリ辞書情報の自動作成処理の詳細について図１７および図１８を参照して説明する。図１７および図１８は、本実施形態における本システムの既存のデータベースを用いたカテゴリ辞書情報の自動作成処理の一例を示す概念図である。
【０１６８】
まず、図１７に示すように、辞書情報処理装置１００は、データベースカテゴリ構造情報作成部１０２ｎの処理により、既存情報格納ファイル１０６ｄに格納された既存のデータベースまたは外部システム２００に格納された外部データベースに基づいて、根ノードを当該既存のデータベース名または格納された特定のフィールドのフィールド名とし、葉ノードを当該データベースまたは当該フィールドに格納された各格納データとするカテゴリ構造情報を作成する。ここで、図１７に示すように、作業手順としてはカテゴリ構造を作成した後にカテゴリ辞書を作成するが、データの依存関係（何に基づいてかにを作成するか）としてはカテゴリ構造もカテゴリ辞書も既存のデータベースに基づいて作成されている。
【０１６９】
ここで、既存のデータベースは、例えば、Ｐｒｏｓｉｔｅ、Ｐｆａｍ、ＳＭＡＲＴなどの特徴構造を格納したデータベースであってもよい。
【０１７０】
また、図１８に示すように、例えば文献名や発現部位名等の有限の制御語句の値が格納されたフィールドがある場合には、根ノードを制御語句フィールド名、葉ノードを制御語句、正規形を名称フィールドの値として、カテゴリ構造情報を作成してもよい。そして、カテゴリ辞書作成部１０２ｂは、当該カテゴリ構造情報に基づいてカテゴリ辞書情報を作成し、カテゴリ辞書情報ファイル１０６ｂに格納する。
これにて、既存のデータベースを用いたカテゴリ辞書情報の自動作成処理が終了する。
【０１７１】
［既存の解析プログラムの処理結果データを用いたカテゴリ辞書情報の自動作成処理］
次に、既存の解析プログラムの処理結果データを用いたカテゴリ辞書情報の自動作成処理の詳細について図１９を参照して説明する。図１９は、本実施形態における本システムの既存の解析プログラムの処理結果データを用いたカテゴリ辞書情報の自動作成処理の一例を示す概念図である。
【０１７２】
まず、図１９に示すように、辞書情報処理装置１００は、解析プログラムカテゴリ構造情報作成部１０２ｐの処理により、解析プログラム部１０２ｅにより実行された既存の解析プログラムの処理結果データに基づいて、根ノードを当該既存の処理プログラム名とし、葉ノードを当該処理結果データとするカテゴリ構造情報を作成する。そして、カテゴリ辞書作成部１０２ｂは、当該カテゴリ構造情報に基づいてカテゴリ辞書情報を作成し、カテゴリ辞書情報ファイル１０６ｂに格納する。
これにて、既存の解析プログラムの処理結果データを用いたカテゴリ辞書情報の自動作成処理が終了する。
【０１７３】
［エントリ単位の辞書情報チェック処理］
次に、エントリ単位の辞書情報チェック処理の詳細について図２０〜図２２を参照して説明する。図２０〜図２２は、本実施形態における本システムのエントリ単位の辞書情報チェック処理の一例を示す概念図である。
【０１７４】
まず、図２０に示すように、辞書情報処理装置１００は、エントリ単位チェック部１０２ｕの処理により、チェック用語句リストファイル１０６ｅに格納されたチェック用語句リストに基づいて、表記辞書情報ファイル１０６ａに格納された表記辞書情報および／またはカテゴリ辞書情報ファイル１０６ｂに格納されたカテゴリ辞書情報を各エントリ単位にチェックする。ここで、チェック用語句リストは、例えば、前置詞、冠詞、代名詞などの正規形や別表記形として登録してはいけない用語をリストとして保存したものである。
【０１７５】
また、図２１に示すように、辞書情報処理装置１００は、エントリ単位チェック部１０２ｕの処理により、チェック用パターンファイル１０６ｆに格納されたチェック用パターンや、チェック用プログラムに基づいて、表記辞書情報ファイル１０６ａに格納された表記辞書情報および／またはカテゴリ辞書情報ファイル１０６ｂに格納されたカテゴリ辞書情報を各エントリ単位にチェックする。ここで、チェック用パターンは、使用してはいけない数表現や記号列表現などのパターン（たとえば、正規表現などで記述される）を登録したものである。また、チェック用プログラムは、正規形の複数形が別の正規形として登録されたもの等をチェックするプログラムである。また、チェック用プログラムは、図２２に示すように、各正規形や別表記形の文字列長、単語数、文字種毎の文字数等を計測して、予め定めた計測項目ごとの正常範囲に入るか否かをチェックし、異常なチェック結果を出力するための計測プログラムであってもよい。
これにて、エントリ単位の辞書情報チェック処理が終了する。
【０１７６】
［正規形不整合チェック処理］
次に、正規形不整合チェック処理の詳細について図２３を参照して説明する。図２３は、本実施形態における本システムの正規形不整合チェック処理の一例を示す概念図である。
まず、図２３に示すように、辞書情報処理装置１００は、正規形不整合チェック部１０２ｒの処理により、表記辞書情報ファイル１０６ａに格納された表記辞書情報に登録された別表記形が他の正規形として登録されているか否かをチェックする。これにより、正規形が他の正規形の別表記形とされて、表記辞書上に重複登録されているものをチェックすることができるようになる。
これにて、正規形不整合チェック処理が終了する。
【０１７７】
［統計チェック処理］
次に、統計チェック処理の詳細について図２４および図２５を参照して説明する。図２４および図２５は、本実施形態における本システムの統計チェック処理の一例を示す概念図である。
【０１７８】
まず、図２４に示すように、辞書情報処理装置１００は、統計チェック部１０２ｓの処理により、表記辞書情報ファイル１０６ａの格納された表記辞書情報および／またはカテゴリ辞書情報ファイル１０６ｂに格納されたカテゴリ辞書情報に登録された正規形、別表記形、または、カテゴリに関する登録状況および利用状況について統計量を求める統計処理を行い、当該統計処理の結果が予め定めた正常値範囲に入っているか否かをチェックする。
【０１７９】
ここで、登録状況に関する統計処理として、例えば、図２４に示すように、同一別表記形に対する正規形数、同一正規形に対するカテゴリ数、同一カテゴリに対する正規形数などについて統計処理を行ってもよい。
【０１８０】
また、利用状況に関する統計処理として、統計チェック部１０２ｓは、例えば、図２５に示すように、文書情報ファイル１０６ｃに格納された文書情報の原データ毎、および、辞書のエントリ毎に辞書引きのヒット回数をカウントしてマトリックスを作成し、縦方向または横方向で集計や分布を見る等の統計処理を行ってもよい。また、ここで、統計チェック部１０２ｓは、縦方向または横方向で集計を取る際には、単純に数の合計をとってもよく、また、０以外のマスの数をカウントしてもよい。また、統計チェック部１０２ｓは、情報の種類（例えば、正規形、表記辞書名、パーザで抽出した情報、ｎ項関係の情報など）毎に単純な数の合計や０以外のマスの数をカウントしてもよい。また、統計チェック部１０２ｓは、統計量の算出の際には、縦方向または横方向ごとに、最大値、最小値、平均値、または、分布等を計算してもよく、また、情報の種類毎または表全体で、最大値、最小値、平均値、または、分布等を計算してもよい。
【０１８１】
また、カテゴリ辞書に関する統計処理として、統計チェック部１０２ｓは、文書情報ファイル１０６ｃに格納された文書情報の原データ毎、および、カテゴリ辞書のノード毎に抽出回数をカウントしてもよい。また、統計チェック部１０２ｓは、マトリックスを作成し、縦方向または横方向で集計や分布を見る等の統計処理を行ってもよい。ここで、統計チェック部１０２ｓは、縦方向または横方向で集計を取る際には、単純に数の合計をとってもよく、また、０以外のマスの数をカウントしてもよい。また、統計チェック部１０２ｓは、部分木毎に、単純に数の合計をとってもよく、また、０以外のマスの数をカウントしてもよい。また、統計チェック部１０２ｓは、統計量の算出の際には、縦方向または横方向ごとに、最大値、最小値、平均値、または、分布等を計算してもよく、また、情報の種類毎または表全体で、最大値、最小値、平均値、または、分布等を計算してもよい。
【０１８２】
また、統計チェック部１０２ｓは、原データごと、または、情報のｍ項組毎に、テキスト上で連続した箇所から抽出された回数のカウントをとってもよい。これにより、連語や出現順番に意味がある用語の組が正しく登録されているかをチェックすることができる。
【０１８３】
また、統計チェック部１０２ｓは、原データごとに、辞書引きで当たらなかった箇所や、情報が抽出されなかった箇所の単語数をカウントして統計処理をしてもよく、また、カテゴリが付与されなかった正規形数や、ｎ項関係の要素にならなかった正規形数をカウントして統計処理をしてもよい。
これにて、統計チェック処理が終了する。
【０１８４】
［共起チェック処理］
次に、共起チェック処理の詳細について図２６および図２７を参照して説明する。図２６および図２７は、本実施形態における本システムの共起チェック処理の一例を示す概念図である。
【０１８５】
まず、図２６に示すように、辞書情報処理装置１００は、共起チェック部１０２ｔの処理により、同じ別表記形を持つ表記辞書エントリや、同じ正規形を持つカテゴリなどの共起関係に基づいてそれぞれの類似度を計算する。例えば、図２６の例を表記辞書（ＸＸＸが正規形、ＹＹＹ群が別表記形）に当てはめると、正規形Ａと正規形Ｂが同一の別表記形Ｗをもつことから共起関係を持っている。そして、正規形Ａと正規形Ｂが全て同じ別表記形を持つ場合には、正規形Ａと正規形Ｂは同一ということになり、また、違うものが含まれる場合には類似ということになる。また、図２６の例をカテゴリ辞書（ＸＸＸがカテゴリ、ＹＹＹ群が正規形）に当てはめると、カテゴリＡとカテゴリＢが同一の正規形Ｗをもつことから共起関係を持っている。そして、カテゴリＡとカテゴリＢが全て同じ正規形を持つ場合には、カテゴリＡとカテゴリＢは同一ということになり、また、違うものが含まれる場合には類似ということになる。
【０１８６】
ここで、類似度の計算は、図２７に示すように、一致数で示してよく（図２７の例１では、ＸとＷの２つが一致しているので、類似度が２となる）、また、一致割合で示してもよい（図２７の例２では、全要素数１３のうち一致数が２であるので、２／１３）。
これにて、共起チェック処理が終了する。
【０１８７】
［ロジックを用いた名寄せ処理］
次に、ロジックを用いた名寄せ処理の詳細について図２８を参照して説明する。図２８は、本実施形態における本システムのロジックを用いた名寄せ処理の一例を示す概念図である。
まず、図２８に示すように、辞書情報処理装置１００は、名寄せ処理部１０２ｆの処理により、各辞書チェック項目における語句の同一性の判断において、小文字化、単数形化などを行うことにより、チェック精度の向上を図っている。
これにて、ロジックを用いた名寄せ処理が終了する。
【０１８８】
［チェック結果の出力処理］
次に、チェック結果の出力処理の詳細について図２９を参照して説明する。図２９は、本実施形態における本システムのチェック結果の出力処理の一例を示す概念図である。
辞書情報処理装置１００は、図２９に示すように、処理結果出力部１０２ｄの処理により、辞書情報チェック部１０２ｃによるチェック結果などについて、予め定めた正常値範囲を超える場合には、チェック結果を出力装置１１４に出力する。
これにて、チェック結果の出力処理が終了する。
【０１８９】
［他の実施の形態］
さて、これまで本発明の実施の形態について説明したが、本発明は、上述した実施の形態以外にも、上記特許請求の範囲に記載した技術的思想の範囲内において種々の異なる実施の形態にて実施されてよいものである。
【０１９０】
例えば、辞書情報処理装置１００がスタンドアローンの形態で処理を行う場合を一例に説明したが、辞書情報処理装置１００とは別筐体で構成されるクライアント端末からの要求に応じて処理を行い、その処理結果を当該クライアント端末に返却するように構成してもよい。
【０１９１】
また、実施形態において説明した各処理のうち、自動的に行なわれるものとして説明した処理の全部または一部を手動的に行うこともでき、あるいは、手動的に行なわれるものとして説明した処理の全部または一部を公知の方法で自動的に行うこともできる。
この他、上記文書中や図面中で示した処理手順、制御手順、具体的名称、各種の登録データや検索条件等のパラメータを含む情報、画面例、データベース構成については、特記する場合を除いて任意に変更することができる。
【０１９２】
また、辞書情報処理装置１００に関して、図示の各構成要素は機能概念的なものであり、必ずしも物理的に図示の如く構成されていることを要しない。
例えば、辞書情報処理装置１００の各部または各装置が備える処理機能、特に制御部１０２にて行なわれる各処理機能については、その全部または任意の一部を、ＣＰＵ（Ｃｅｎｔｒａｌ　Ｐｒｏｃｅｓｓｉｎｇ　Ｕｎｉｔ）および当該ＣＰＵにて解釈実行されるプログラムにて実現することができ、あるいは、ワイヤードロジックによるハードウェアとして実現することも可能である。なお、プログラムは、後述する記録媒体に記録されており、必要に応じて辞書情報処理装置１００に機械的に読み取られる。
【０１９３】
すなわち、ＲＯＭまたはＨＤなどの記憶部１０６などには、ＯＳ（Ｏｐｅｒａｔｉｎｇ　Ｓｙｓｔｅｍ）と協働してＣＰＵに命令を与え、各種処理を行うためのコンピュータプログラムが記録されている。このコンピュータプログラムは、ＲＡＭ等にロードされることによって実行され、ＣＰＵと協働して制御部１０２を構成する。また、このコンピュータプログラムは、辞書情報処理装置１００に対して任意のネットワーク３００を介して接続されたアプリケーションプログラムサーバに記録されてもよく、必要に応じてその全部または一部をダウンロードすることも可能である。
【０１９４】
また、本発明にかかるプログラムを、コンピュータ読み取り可能な記録媒体に格納することもできる。ここで、この「記録媒体」とは、フレキシブルディスク、光磁気ディスク、ＲＯＭ、ＥＰＲＯＭ、ＥＥＰＲＯＭ、ＣＤ−ＲＯＭ、ＭＯ、ＤＶＤ等の任意の「可搬用の物理媒体」や、各種コンピュータシステムに内蔵されるＲＯＭ、ＲＡＭ、ＨＤ等の任意の「固定用の物理媒体」、あるいは、ＬＡＮ、ＷＡＮ、インターネットに代表されるネットワークを介してプログラムを送信する場合の通信回線や搬送波のように、短期にプログラムを保持する「通信媒体」を含むものとする。
【０１９５】
また、「プログラム」とは、任意の言語や記述方法にて記述されたデータ処理方法であり、ソースコードやバイナリコード等の形式を問わない。なお、「プログラム」は必ずしも単一的に構成されるものに限られず、複数のモジュールやライブラリとして分散構成されるものや、ＯＳ（Ｏｐｅｒａｔｉｎｇ　Ｓｙｓｔｅｍ）に代表される別個のプログラムと協働してその機能を達成するものをも含む。なお、実施の形態に示した各装置において記録媒体を読み取るための具体的な構成、読み取り手順、あるいは、読み取り後のインストール手順等については、周知の構成や手順を用いることができる。
【０１９６】
記憶部１０６に格納される各種のデータベース等（表記辞書情報ファイル１０６ａ〜チェック用パターンファイル１０６ｆ）は、ＲＡＭ、ＲＯＭ等のメモリ装置、ハードディスク等の固定ディスク装置、フレキシブルディスク、光ディスク等のストレージ手段であり、各種処理やウェブサイト提供に用いる各種のプログラムやテーブルやファイルやデータベースやウェブページ用ファイル等を格納する。
【０１９７】
また、辞書情報処理装置１００は、既知のパーソナルコンピュータ、ワークステーション等の情報処理端末等の情報処理装置にプリンタやモニタやイメージスキャナ等の周辺装置を接続し、該情報処理装置に本発明の方法を実現させるソフトウェア（プログラム、データ等を含む）を実装することにより実現してもよい。
【０１９８】
さらに、辞書情報処理装置１００の分散・統合の具体的形態は図示のものに限られず、その全部または一部を、各種の負荷等に応じた任意の単位で、機能的または物理的に分散・統合して構成することができる。例えば、各データベースを独立したデータベース装置として独立に構成してもよく、また、処理の一部をＣＧＩ（Ｃｏｍｍｏｎ　Ｇａｔｅｗａｙ　Ｉｎｔｅｒｆａｃｅ）を用いて実現してもよい。
【０１９９】
また、ネットワーク３００は、辞書情報処理装置１００と外部システム２００とを相互に接続する機能を有し、例えば、インターネットや、イントラネットや、ＬＡＮ（有線／無線の双方を含む）や、ＶＡＮや、パソコン通信網や、公衆電話網（アナログ／デジタルの双方を含む）や、専用回線網（アナログ／デジタルの双方を含む）や、ＣＡＴＶ網や、ＩＭＴ２０００方式、ＧＳＭ方式またはＰＤＣ／ＰＤＣ―Ｐ方式等の携帯回線交換網／携帯パケット交換網や、無線呼出網や、Ｂｌｕｅｔｏｏｔｈ等の局所無線網や、ＰＨＳ網や、ＣＳ、ＢＳまたはＩＳＤＢ等の衛星通信網等のうちいずれかを含んでもよい。すなわち、本システムは、有線・無線を問わず任意のネットワークを介して、各種データを送受信することができる。
【０２００】
【発明の効果】
以上詳細に説明したように、本発明によれば、各用語の正規形と別表記形との対応関係を定義する表記辞書情報を作成し、正規形の所属するカテゴリを定義するカテゴリ辞書情報を作成し、表記辞書情報および／またはカテゴリ辞書情報に格納された情報をチェックするので、文献データベース検索サービスにおいて用いられる各種の表記辞書およびカテゴリ辞書の作成や、作成された辞書のチェックを自動化することができる辞書情報処理装置、辞書情報処理方法、プログラム、および、記録媒体を提供することができる。
【０２０１】
また、これにより、辞書作成の効率化および高精度化を図ることができる辞書情報処理装置、辞書情報処理方法、プログラム、および、記録媒体を提供することができる。
【０２０２】
また、本発明によれば、既存データベースを構成する各フィールドの属性情報に基づいて、各フィールドを正規形とするか、別表記形とするか、または、利用しないかを判断し、判断結果に基づいて既存データベースの各フィールドから表記辞書情報を作成するので、既存のデータベースから表記辞書を効率的に作成することができる辞書情報処理装置、辞書情報処理方法、プログラム、および、記録媒体を提供することができる。
【０２０３】
また、本発明によれば、既存の辞典情報に記載された用語（辞典の見出し語や、略語、同義語、類語などの欄内に記載された用語など）に基づいて、当該用語を正規形とするか、別表記形とするか、または、利用しないかを判断し、判断結果に基づいて辞典情報の用語から表記辞書情報を作成するので、既存の辞典情報から表記辞書を効率的に作成することができる辞書情報処理装置、辞書情報処理方法、プログラム、および、記録媒体を提供することができる。
【０２０４】
また、本発明によれば、既存のＷｅｂ情報（例えば、既存のＷｅｂサイトに記載された情報や、辞書に登録する用語を収集することを目的とした、参加者が書き込み可能なＷｅｂサイトに書き込まれた情報等を含む）に記載された用語に基づいて、当該用語を正規形とするか、別表記形とするか、または、利用しないかを判断し、判断結果に基づいてＷｅｂ情報の用語から表記辞書情報を作成するので、既存のＷｅｂ情報から表記辞書を効率的に作成することができる辞書情報処理装置、辞書情報処理方法、プログラム、および、記録媒体を提供することができる。
【０２０５】
また、本発明によれば、各人が有する辞書情報の顕在化および共有化を図ることができる辞書情報処理装置、辞書情報処理方法、プログラム、および、記録媒体を提供することができる。
【０２０６】
また、本発明によれば、既存の構造化データに基づいて、カテゴリ構造情報を作成し、作成された当該カテゴリ構造情報に基づいてカテゴリ辞書情報を作成するので、既存の構造化データにより定義された分類等に基づいて効率的にカテゴリ辞書を作成することができる辞書情報処理装置、辞書情報処理方法、プログラム、および、記録媒体を提供することができる。
【０２０７】
また、本発明によれば、既存の構造化データに根ノードが複数存在する場合には、その上位に仮想根ノードを追加してカテゴリ構造情報を作成するので、既存の構造化データにより定義された分類等に基づいてさらに効率的にカテゴリ辞書を作成することができる辞書情報処理装置、辞書情報処理方法、プログラム、および、記録媒体を提供することができる。
【０２０８】
また、本発明によれば、既存の構造化データに合流が存在する場合には、対応する部分構造を当該合流部分に複製することにより合流のない単純な木構造のカテゴリ構造情報を作成するので、既存の構造化データにより定義された分類等に基づいてさらに効率的にカテゴリ辞書を作成することができる辞書情報処理装置、辞書情報処理方法、プログラム、および、記録媒体を提供することができる。
【０２０９】
また、本発明によれば、既存の集合データに基づいて、根ノードを集合データ名とし、葉ノードを集合要素名とするカテゴリ構造情報を作成し、作成された当該カテゴリ構造情報に基づいてカテゴリ辞書情報を作成するので、既存の集合データにより定義された情報に基づいて効率的にカテゴリ辞書を作成することができる辞書情報処理装置、辞書情報処理方法、プログラム、および、記録媒体を提供することができる。
【０２１０】
また、本発明によれば、ＭｅＳＨタームデータに基づいてカテゴリ構造情報を作成し、作成された当該カテゴリ構造情報に基づいてカテゴリ辞書情報を作成するので、既存のＭｅＳＨタームデータにより定義された医薬用語等に基づいて効率的にカテゴリ辞書を作成することができる辞書情報処理装置、辞書情報処理方法、プログラム、および、記録媒体を提供することができる。
【０２１１】
また、本発明によれば、既存のデータベースに基づいて、根ノードを当該既存のデータベース名または格納された特定のフィールドのフィールド名とし、葉ノードを当該データベースまたは当該フィールドに格納された各格納データとするカテゴリ構造情報を作成し、作成された当該カテゴリ構造情報に基づいてカテゴリ辞書情報を作成するので、既存のデータベースにより定義されたフィールドや格納データ等に基づいて効率的にカテゴリ辞書を作成することができる辞書情報処理装置、辞書情報処理方法、プログラム、および、記録媒体を提供することができる。
【０２１２】
また、本発明によれば、既存の解析プログラムの処理結果データに基づいて、根ノードを当該既存の処理プログラム名とし、葉ノードを当該処理結果データとするカテゴリ構造情報を作成し、作成された当該カテゴリ構造情報に基づいてカテゴリ辞書情報を作成するので、既存の解析プログラムの処理結果データに基づいて効率的にカテゴリ辞書を作成することができる辞書情報処理装置、辞書情報処理方法、プログラム、および、記録媒体を提供することができる。
【０２１３】
また、本発明によれば、チェック用語句リスト、チェック用プログラム、チェック用パターンのうち少なくとも一つに基づいて、表記辞書情報および／またはカテゴリ辞書情報を各エントリ単位にチェックするので、チェック項目を予め定めておくことにより辞書情報の高品質化を自動的に図ることができる辞書情報処理装置、辞書情報処理方法、プログラム、および、記録媒体を提供することができる。
【０２１４】
また、本発明によれば、辞書作成においてプログラムのバグ（不具合）や例外処理漏れなどにより混入した不適切なエントリを容易に発見することができる辞書情報処理装置、辞書情報処理方法、プログラム、および、記録媒体を提供することができる。
【０２１５】
また、本発明によれば、利用した既存データのエラーに起因する不適切なエントリを容易に発見することができる辞書情報処理装置、辞書情報処理方法、プログラム、および、記録媒体を提供することができる。
【０２１６】
また、本発明によれば、テキストマイニング用辞書のエントリとして不適切なエントリを容易に発見することができる辞書情報処理装置、辞書情報処理方法、プログラム、および、記録媒体を提供することができる。
【０２１７】
また、本発明によれば、表記辞書情報に登録された別表記形が他の正規形として登録されているか否かをチェックするので、正規形の不整合を排することにより辞書情報の高品質化を自動的に図ることができる辞書情報処理装置、辞書情報処理方法、プログラム、および、記録媒体を提供することができる。
【０２１８】
また、本発明によれば、表記辞書情報および／またはカテゴリ辞書情報に登録された正規形、別表記形、または、カテゴリに関する登録状況および利用状況について統計処理を行い、当該統計処理の結果が予め定めた正常値範囲に入っているか否かをチェックするので、統計手法を用いることにより辞書情報の高品質化を自動的に図ることができる辞書情報処理装置、辞書情報処理方法、プログラム、および、記録媒体を提供することができる。
【０２１９】
また、本発明によれば、特に辞書情報の登録エントリが膨大になった場合においても、統計的な手法により登録状況の悪いエントリ（例えば、実体エントリが０である場合など）や、利用状況の悪いエントリ（例えば、アクセス数、抽出数が０である場合など）を容易に発見することができる辞書情報処理装置、辞書情報処理方法、プログラム、および、記録媒体を提供することができる。
【０２２０】
さらに、本発明によれば、表記辞書情報および／またはカテゴリ辞書情報に登録された正規形、別表記形、または、カテゴリに関する共起関係に基づいて類似度を計算するので、エントリ間の類似度を用いた登録内容のチェックやエントリ同士の統廃合の判断が容易にできる辞書情報処理装置、辞書情報処理方法、プログラム、および、記録媒体を提供することができる。
【図面の簡単な説明】
【図１】本発明の基本原理を示す原理構成図である。
【図２】本発明が適用される本システムの構成の一例を示すブロック図である。
【図３】本発明が適用される表記辞書作成部１０２ａの構成の一例を示すブロック図である。
【図４】本発明が適用されるカテゴリ辞書作成部１０２ｂの構成の一例を示すブロック図である。
【図５】本発明が適用される辞書情報チェック部１０２ｃの構成の一例を示すブロック図である。
【図６】本実施形態における本システムの既存データベースを用いた表記辞書情報の自動作成処理の一例を示す概念図である。
【図７】本実施形態における本システムの既存データベースを用いた表記辞書情報の自動作成処理の一例を示す概念図である。
【図８】本実施形態における本システムの既存の辞典情報を用いた表記辞書情報の自動作成処理の一例を示す概念図である。
【図９】本実施形態における本システムの既存のＷｅｂ情報を用いた表記辞書情報の自動作成処理の一例を示す概念図である。
【図１０】本実施形態における本システムの既存の構造化データを用いたカテゴリ辞書情報の自動作成処理の一例を示す概念図である。
【図１１】本実施形態における本システムの既存の構造化データを用いたカテゴリ辞書情報の自動作成処理の一例を示す概念図である。
【図１２】本実施形態における本システムの既存の構造化データを用いたカテゴリ辞書情報の自動作成処理の一例を示す概念図である。
【図１３】本実施形態における本システムの既存の集合データを用いたカテゴリ辞書情報の自動作成処理の一例を示す概念図である。
【図１４】本実施形態における本システムの既存のＭｅＳＨタームデータを用いたカテゴリ辞書情報の自動作成処理の一例を示す概念図である。
【図１５】本実施形態における本システムの既存のＭｅＳＨタームデータを用いたカテゴリ辞書情報の自動作成処理の一例を示す概念図である。
【図１６】本実施形態における本システムの既存のＭｅＳＨタームデータを用いたカテゴリ辞書情報の自動作成処理の一例を示す概念図である。
【図１７】本実施形態における本システムの既存のデータベースを用いたカテゴリ辞書情報の自動作成処理の一例を示す概念図である。
【図１８】本実施形態における本システムの既存のデータベースを用いたカテゴリ辞書情報の自動作成処理の一例を示す概念図である。
【図１９】本実施形態における本システムの既存の解析プログラムの処理結果データを用いたカテゴリ辞書情報の自動作成処理の一例を示す概念図である。
【図２０】本実施形態における本システムのエントリ単位の辞書情報チェック処理の一例を示す概念図である。
【図２１】本実施形態における本システムのエントリ単位の辞書情報チェック処理の一例を示す概念図である。
【図２２】本実施形態における本システムのエントリ単位の辞書情報チェック処理の一例を示す概念図である。
【図２３】本実施形態における本システムの正規形不整合チェック処理の一例を示す概念図である。
【図２４】本実施形態における本システムの統計チェック処理の一例を示す概念図である。
【図２５】本実施形態における本システムの統計チェック処理の一例を示す概念図である。
【図２６】本実施形態における本システムの共起チェック処理の一例を示す概念図である。
【図２７】本実施形態における本システムの共起チェック処理の一例を示す概念図である。
【図２８】本実施形態における本システムのロジックを用いた名寄せ処理の一例を示す概念図である。
【図２９】本実施形態における本システムのチェック結果の出力処理の一例を示す概念図である。
【符号の説明】
１００　辞書情報処理装置
１０２　制御部
１０２ａ　表記辞書作成部
１０２ｂ　カテゴリ辞書作成部
１０２ｃ　辞書情報チェック部
１０２ｄ　処理結果出力部
１０２ｅ　解析プログラム部
１０２ｆ　名寄せ処理部
１０２ｇ　フィールド属性判断部
１０２ｈ　辞典用語判断部
１０２ｉ　Ｗｅｂ用語判断部
１０２ｊ　構造化データカテゴリ構造情報作成部
１０２ｋ　集合カテゴリ構造情報作成部
１０２ｍ　ＭｅＳＨタームカテゴリ構造情報作成部
１０２ｎ　データベースカテゴリ構造情報作成部
１０２ｐ　解析プログラムカテゴリ構造情報作成部
１０２ｒ　正規形不整合チェック部
１０２ｓ　統計チェック部
１０２ｔ　共起チェック部
１０２ｕ　エントリ単位チェック部
１０４　通信制御インターフェース部
１０６　記憶部
１０６ａ　表記辞書情報ファイル
１０６ｂ　カテゴリ辞書情報ファイル
１０６ｃ　文書情報ファイル
１０６ｄ　既存情報格納ファイル
１０６ｅ　チェック用語句リストファイル
１０６ｆ　チェック用パターンファイル
１０８　入出力制御インターフェース部
１１２　入力装置
１１４　出力装置
２００　外部システム
３００　ネットワーク[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to a dictionary information processing apparatus, a dictionary information processing method, a program, and a recording medium, and in particular, automates creation of various notation dictionaries and category dictionaries used in a document database search service and check of the created dictionaries. Alternatively, the present invention relates to a dictionary information processing apparatus, a dictionary information processing method, a program, and a recording medium that can be semi-automated.
[0002]
[Prior art]
In recent years, a document database storing various technical documents such as papers has been constructed and widely used via the Internet and the like. For example, there is PubMed provided by the United States National Biotechnology Center (NCBI) to provide literature data from the National Library of Medicine (NLM) and the like (URL of PubMed on the Internet: http://www.ncbi.nlm.gov/ entrez /).
[0003]
In a conventional document database search service, in order to improve search efficiency and the like, a "notation dictionary" for associating a normal form with a notation form of each term, and a "notation dictionary" for classifying each term into categories. For example, a "category dictionary" is used.
[0004]
For example, as a text mining system using existing notation dictionaries and category dictionaries, TAKMI (product name) of IBM (company name) exists (URL of the homepage of the text mining technology introduction of IBM Tokyo Research Laboratory: http: // /Www.trl.ibm.com/projects/s7710/tm/index.htm, URL of TAKMI introduction home page: http://www.trl.ibm.com/projects/s7710/tm/takmi/tamki/takmi/takmi/takmi/tamik .
[0005]
Also, as a thesaurus search service for medical terms, there is MeSH (Medical Subject Headings) and the like (URL of NLM MeSH homepage URL: http://www.nlm.nih.gov/mesh/meshhome.html, Mesh. URL of the homepage of the paper described: http://www.nlm.nih.gov/mesh/patterns.html, URL of the homepage of the MeSH Browser service: http://www.ncbi.nih.gov/entrez. cgi).
[0006]
[Problems to be solved by the invention]
However, since the creation of various notation dictionaries and category dictionaries used in conventional literature database search services and the checking of the created dictionaries are often performed manually by persons in charge, the latest and wide range of terms are covered. There is a fundamental problem in the system structure that it takes a lot of time and effort to create a precise and accurate dictionary.
Hereinafter, the content of this problem will be described more specifically.
[0007]
First, in a conventional document database search service or the like, if each term described in a document to be analyzed is in a different notation form registered in a notation dictionary in advance, the term is converted to its corresponding normal form and searched. . That is, by unifying the different notation form into the normal form, search accuracy can be improved, and text mining accuracy based on counting the number of terms can be improved.
[0008]
However, the creation of conventional notation dictionaries is often performed manually by workers, and it takes a great deal of time and effort to create comprehensive and accurate notation dictionaries for the latest and wide range of terms. Was.
[0009]
Further, in a conventional document database search service or the like, a category dictionary that defines to which category a normal form belongs is also required. The category structure of the normal form is a tree structure with a large width and a large depth, and the normal form and the categories have a many-to-many relationship, so the structure is very complex. It is often performed manually, and it takes a lot of time and effort to create a comprehensive and high-precision category dictionary on the relationship between the latest and wide range of normal forms and categories.
[0010]
Further, the created notation dictionary and category dictionary often include bugs and errors. In addition, with the progress of science and technology, the conventional category classification and definition may be corrected or changed. In such a case, the work of checking the created dictionary information is often performed manually, and it takes a lot of time and effort to check a comprehensive and large amount of dictionary information for the latest and wide range of terms. Was.
[0011]
As described above, the conventional system has a number of problems, and as a result, it is inconvenient and inefficient for both the user and the administrator of the document database search service. there were.
[0012]
The problems to be solved by the conventional technology and the invention described so far are not limited to the literature information database search system for literature of natural sciences such as living organisms, medicine, and science, and search for literature information in all fields. In all systems the same can be considered.
[0013]
The present invention has been made in view of the above problems, and a dictionary information processing apparatus that can automate creation of various notation dictionaries and category dictionaries used in a document database search service and check of the created dictionaries, It is an object to provide a dictionary information processing method, a program, and a recording medium.
[0014]
[Means for Solving the Problems]
In order to achieve such an object, a dictionary information processing apparatus according to claim 1 includes a notation dictionary creating unit that creates notation dictionary information that defines a correspondence between a normal form and a different notation form of each term; A category dictionary creating unit for creating category dictionary information defining a category to which the normal form belongs; and a dictionary information checking unit for checking information stored in the notation dictionary information and / or the category dictionary information. Features.
[0015]
According to this device, notation dictionary information defining the correspondence between the normal form of each term and another notation form is created, category dictionary information defining the category to which the normal form belongs is created, and the notation dictionary information and / or Alternatively, since the information stored in the category dictionary information is checked, it is possible to automate the creation of various notation dictionaries and category dictionaries used in the document database search service, and the check of the created dictionaries. Further, the efficiency of dictionary creation and the accuracy can be improved.
[0016]
Further, in the dictionary information processing apparatus according to the second aspect, in the dictionary information processing apparatus according to the first aspect, the notation dictionary creating unit may store each field based on attribute information of each field constituting an existing database. Further comprising field attribute determining means for determining whether to use a normal form, a different notation form, or not to be used, and based on a result of the determination by the field attribute determining means, from each field of the existing database, the notation dictionary It is characterized by creating information.
[0017]
This shows an example of the notation dictionary creating means more specifically. According to this device, based on the attribute information of each field constituting the existing database, it is determined whether each field is in a normal form, another notation form, or is not used, and based on the determination result, Since the notation dictionary information is created from each field of the existing database, the notation dictionary can be efficiently created from the existing database.
[0018]
Further, in the dictionary information processing apparatus according to claim 3, in the dictionary information processing apparatus according to claim 1 or 2, the notation dictionary creating unit is configured to execute the term based on a term described in the existing dictionary information. Is a normal form, a different notation form, or further comprising a dictionary term determining means for determining whether not to use, based on the determination result of the dictionary term determining means, from the above term of the dictionary information based on the notation It is characterized in that dictionary information is created.
[0019]
This shows an example of the notation dictionary creating means more specifically. According to this device, the term is made into a normal form based on the term described in the existing dictionary information (the term entered in the dictionary, the term described in the column of abbreviation, synonym, synonym, etc.). Judge whether to use notation form, another notation form, or not to use, and create notation dictionary information from the terms of dictionary information based on the judgment result, so create notation dictionary efficiently from existing dictionary information Will be able to
[0020]
A dictionary information processing apparatus according to a fourth aspect is the dictionary information processing apparatus according to any one of the first to third aspects, wherein the notation dictionary creating unit is configured to convert the terms described in the existing Web information. A Web term judging means for judging whether the term is in a normal form, a different notation form, or not to be used, based on the judgment result of the Web information. The notation dictionary information is created from the above terms.
[0021]
This shows an example of the notation dictionary creating means more specifically. According to this device, existing Web information (for example, information written on an existing Web site or a Web site that can be written by a participant for the purpose of collecting terms registered in a dictionary) is written. (Including information, etc.) based on the term described in the Web information term based on the result of the judgment. Since the dictionary information is created, the notation dictionary can be efficiently created from the existing Web information.
[0022]
In addition, this makes it possible to realize and share the dictionary information possessed by each person.
[0023]
According to a fifth aspect of the present invention, there is provided the dictionary information processing apparatus according to any one of the first to fourth aspects, wherein the category dictionary creating means includes a category dictionary based on existing structured data. The apparatus further comprises structured data category structure information creating means for creating structure information, wherein the category dictionary information is created based on the category structure information created by the structured data category structure information creating means.
[0024]
This shows an example of the category dictionary creating means more specifically. According to this device, the category structure information is created based on the existing structured data, and the category dictionary information is created based on the created category structure information. Therefore, the classification defined by the existing structured data is performed. Thus, a category dictionary can be efficiently created on the basis of the above.
[0025]
According to a sixth aspect of the present invention, in the dictionary information processing apparatus according to the fifth aspect, the structured data category structure information creating means includes a plurality of root nodes in the existing structured data. In this case, a virtual root node is added to the upper level to create category structure information.
[0026]
This shows an example of the structured data category structure information creating means more specifically. According to this device, when there are a plurality of root nodes in the existing structured data, a virtual root node is added above the root node to create category structure information, so that the classification defined by the existing structured data is performed. The category dictionary can be created more efficiently on the basis of the above.
[0027]
According to a seventh aspect of the present invention, there is provided the dictionary information processing apparatus according to the fifth or sixth aspect, wherein the structured data category structure information creating means has a confluence with the existing structured data. In this case, it is characterized in that the category structure information of a simple tree structure having no merging is created by copying the corresponding partial structure to the merging portion.
[0028]
This shows an example of the structured data category structure information creating means more specifically. According to this device, when a merge exists in the existing structured data, the corresponding partial structure is duplicated in the merge portion to create a simple tree-structured category structure information without the merge. The category dictionary can be created more efficiently based on the classification and the like defined by the structured data.
[0029]
The dictionary information processing apparatus according to claim 8 is the dictionary information processing apparatus according to any one of claims 1 to 7, wherein the category dictionary creating means is configured to determine a root node based on existing set data. Is a set data name, and further includes set category structure information creating means for creating category structure information having leaf nodes as set element names, and the category is defined based on the category structure information created by the set category structure information creating means. It is characterized in that dictionary information is created.
[0030]
This shows an example of the category dictionary creating means more specifically. According to this device, based on the existing set data, the root node is set as the set data name, and the leaf node is used to create category structure information having the set element name, and the category dictionary information is created based on the created category structure information. Is created, a category dictionary can be efficiently created based on information defined by existing set data.
[0031]
According to a ninth aspect of the present invention, in the dictionary information processing apparatus according to any one of the first to eighth aspects, the category dictionary creating means generates category structure information based on MeSH term data. The system further comprises MeSH term category structure information creating means for creating, and the category dictionary information is created based on the category structure information created by the MeSH term category structure information creating means.
[0032]
This shows an example of the category dictionary creating means more specifically. According to this device, the category structure information is created based on the MeSH term data, and the category dictionary information is created based on the created category structure information, so that the medical dictionary and the like defined by the existing MeSH term data are used. Based on this, a category dictionary can be efficiently created.
[0033]
According to a tenth aspect of the present invention, in the dictionary information processing apparatus according to any one of the first to ninth aspects, the category dictionary creating means determines a root node based on an existing database. The database further includes a database category structure information creating means for creating category structure information with the existing database name or the field name of a specific field stored therein and leaf nodes as stored data stored in the database or the field, The category dictionary information is created based on the category structure information created by the database category structure information creating means.
[0034]
This shows an example of the category dictionary creating means more specifically. According to this device, based on the existing database, the root node is set to the existing database name or the field name of a specific field stored, and the leaf node is set to the database or each stored data stored in the field. Since the category structure information is created and the category dictionary information is created based on the created category structure information, it is possible to efficiently create the category dictionary based on fields, stored data, and the like defined by the existing database. become able to.
[0035]
The dictionary information processing apparatus according to claim 11 is the dictionary information processing apparatus according to any one of claims 1 to 10, wherein the category dictionary creating means is based on processing result data of an existing analysis program. Analysis program category structure information creating means for creating category structure information with the root node as the existing processing program name and the leaf node as the processing result data, wherein the analysis program category structure information creation means The category dictionary information is created based on the category structure information.
[0036]
This shows an example of the category dictionary creating means more specifically. According to this device, based on the processing result data of the existing analysis program, category structure information is created in which the root node is the name of the existing processing program and the leaf node is the processing result data, and the created category is Since the category dictionary information is created based on the structure information, the category dictionary can be efficiently created based on the processing result data of the existing analysis program.
[0037]
According to a twelfth aspect of the present invention, there is provided the dictionary information processing apparatus according to any one of the first to eleventh aspects, wherein the dictionary information checking means includes a check term / phrase list, a check program, a check program, An entry unit check unit that checks the notation dictionary information and / or the category dictionary information for each entry based on at least one of the use patterns.
[0038]
This more specifically shows an example of the dictionary information check means. According to this device, the notation dictionary information and / or the category dictionary information are checked for each entry based on at least one of the check term / phrase list, the check program, and the check pattern. By doing so, it is possible to automatically improve the quality of the dictionary information.
[0039]
In addition, this makes it possible to easily find an inappropriate entry that is mixed in due to a program bug (fault) or omission of exception processing during dictionary creation.
[0040]
In addition, it is possible to easily find an inappropriate entry caused by an error in the used existing data.
[0041]
Further, an inappropriate entry as a text mining dictionary entry can be easily found.
[0042]
A dictionary information processing apparatus according to a thirteenth aspect is the dictionary information processing apparatus according to any one of the first to twelfth aspects, wherein the dictionary information checking means includes a different notation registered in the notation dictionary information. It is characterized by further comprising a normal form inconsistency check means for checking whether or not the form is registered as another normal form.
[0043]
This more specifically shows an example of the dictionary information check means. According to this device, it is checked whether or not another notation form registered in the notation dictionary information is registered as another normal form, so that the quality of the dictionary information can be improved by eliminating the inconsistency of the normal form. This can be achieved automatically.
[0044]
According to a fourteenth aspect of the present invention, in the dictionary information processing apparatus according to any one of the first to thirteenth aspects, the dictionary information checking means includes the notation dictionary information and / or the category dictionary. Statistical check that performs statistical processing on the registration status and usage status related to the normal form, different notation form, or category registered in the information, and checks whether the result of the statistical processing is within a predetermined normal value range It is characterized by further comprising means.
[0045]
This more specifically shows an example of the dictionary information check means. According to this device, statistical processing is performed on the registration status and usage status of the normal form, another notation form, or category registered in the notation dictionary information and / or the category dictionary information, and the result of the statistical processing is determined in advance. Since it is checked whether it is within the normal value range, it is possible to automatically improve the quality of the dictionary information by using a statistical method.
[0046]
Even when the number of dictionary information registration entries becomes enormous, an entry with a poor registration status (for example, when the entity entry is 0) or an entry with a poor usage status (for example, access Number, the number of extractions is 0, etc.) can be easily found.
[0047]
According to a fifteenth aspect of the present invention, in the dictionary information processing apparatus according to any one of the first to fourteenth aspects, the dictionary information checking means includes the notation dictionary information and / or the category dictionary. A co-occurrence check unit that calculates a similarity based on a co-occurrence relation regarding a normal form, another notation form, or a category registered in the information is further provided.
[0048]
This more specifically shows an example of the dictionary information check means. According to this device, the similarity is calculated based on the co-occurrence relation regarding the normal form, different notation form, or category registered in the notation dictionary information and / or the category dictionary information. It is possible to easily check the registered contents and judge whether the entries are integrated or abolished. For example, when the similarity between entries is higher than a predetermined similarity, those entries may be automatically integrated.
[0049]
Also, the present invention relates to a dictionary information processing method, wherein the dictionary information processing method according to claim 16 is a notation dictionary for creating notation dictionary information that defines a correspondence between a normal form and another notation form of each term. A creating step, a category dictionary creating step of creating category dictionary information defining a category to which the normal form belongs, and a dictionary information checking step of checking the notation dictionary information and / or information stored in the category dictionary information. It is characterized by including.
[0050]
According to this method, notation dictionary information that defines the correspondence between the normal form and another notation form of each term is created, category dictionary information that defines the category to which the normal form belongs is created, and the notation dictionary information and / or Alternatively, since the information stored in the category dictionary information is checked, it is possible to automate the creation of various notation dictionaries and category dictionaries used in the document database search service, and the check of the created dictionaries. Further, the efficiency of dictionary creation and the accuracy can be improved.
[0051]
Further, in the dictionary information processing method according to claim 17, in the dictionary information processing method according to claim 16, the notation dictionary creation step includes the steps of: defining each field based on attribute information of each field constituting an existing database. The method further includes a field attribute determining step of determining whether to use a normal form, a different notation form, or not to use, and based on the determination result of the field attribute determining step, from the respective fields of the existing database to the notation dictionary It is characterized by creating information.
[0052]
This shows one example of the notation dictionary creation step more specifically. According to this method, it is determined whether each field is in a normal form, a different notation form, or is not used based on the attribute information of each field constituting the existing database, and based on the determination result, Since the notation dictionary information is created from each field of the existing database, the notation dictionary can be efficiently created from the existing database.
[0053]
The dictionary information processing method according to claim 18 is the dictionary information processing method according to claim 16 or 17, wherein the notation dictionary creation step is performed based on a term described in existing dictionary information. Is a normal form, a different notation form, or further includes a dictionary term determination step of determining whether not to use, based on the determination result of the dictionary term determination step from the above term of the dictionary information based on the notation It is characterized in that dictionary information is created.
[0054]
This shows one example of the notation dictionary creation step more specifically. According to this method, based on a term described in the existing dictionary information (a term entered in a dictionary, an abbreviation, a synonym, a term described in a column such as a synonym, etc.), the term is made into a normal form. Judge whether to use notation form, another notation form, or not to use, and create notation dictionary information from the terms of dictionary information based on the judgment result, so create notation dictionary efficiently from existing dictionary information Will be able to
[0055]
The dictionary information processing method according to claim 19 is the dictionary information processing method according to any one of claims 16 to 18, wherein the notation dictionary creation step is performed by using the terms described in the existing Web information. A web term determining step of determining whether the term is in a normal form, a different notation form, or is not used, based on the result of the web term determining step. The notation dictionary information is created from the above terms.
[0056]
This shows one example of the notation dictionary creation step more specifically. According to this method, existing Web information (for example, information written on an existing Web site and written on a Web site that can be written by a participant for the purpose of collecting terms registered in a dictionary) is written. (Including information, etc.) based on the term described in the Web information term based on the result of the judgment. Since the dictionary information is created, the notation dictionary can be efficiently created from the existing Web information.
[0057]
In addition, this makes it possible to realize and share the dictionary information possessed by each person.
[0058]
According to a twentieth aspect of the present invention, in the dictionary information processing method according to any one of the sixteenth to nineteenth aspects, the category dictionary creating step includes the step of creating a category based on existing structured data. The method further includes a structured data category structure information creating step of creating structure information, wherein the category dictionary information is created based on the category structure information created by the structured data category structure information creating step.
[0059]
This shows one example of the category dictionary creation step more specifically. According to this method, the category structure information is created based on the existing structured data, and the category dictionary information is created based on the created category structure information. Therefore, the classification defined by the existing structured data is performed. Thus, a category dictionary can be efficiently created on the basis of the above.
[0060]
Also, in the dictionary information processing method according to claim 21, in the dictionary information processing method according to claim 20, the structured data category structure information creating step includes a plurality of root nodes in the existing structured data. In this case, a virtual root node is added to the upper level to create category structure information.
[0061]
This more specifically shows an example of the structured data category structure information creating step. According to this method, when there are a plurality of root nodes in the existing structured data, a virtual root node is added at a higher level to create the category structure information, so that the classification defined by the existing structured data is performed. The category dictionary can be created more efficiently on the basis of the above.
[0062]
Also, in the dictionary information processing method according to claim 22, in the dictionary information processing method according to claim 20 or 21, the structured data category structure information creating step includes a merge with the existing structured data. In this case, it is characterized in that the category structure information of a simple tree structure having no merging is created by copying the corresponding partial structure to the merging portion.
[0063]
This more specifically shows an example of the structured data category structure information creating step. According to this method, if a merge exists in the existing structured data, the corresponding partial structure is duplicated in the merge portion to create simple tree-structured category structure information without the merge. The category dictionary can be created more efficiently based on the classification and the like defined by the structured data.
[0064]
A dictionary information processing method according to claim 23 is the dictionary information processing method according to any one of claims 16 to 22, wherein the category dictionary creation step is performed based on existing set data. Is a set data name, and further includes a set category structure information creating step of creating category structure information having leaf nodes as set element names, wherein the category is defined based on the category structure information created by the set category structure information creating step. It is characterized in that dictionary information is created.
[0065]
This shows one example of the category dictionary creation step more specifically. According to this method, based on the existing set data, the root node is set as the set data name, and the leaf node is used to create category structure information having the set element name, and the category dictionary information is created based on the created category structure information. Is created, a category dictionary can be efficiently created based on information defined by existing set data.
[0066]
The dictionary information processing method according to claim 24 is the dictionary information processing method according to any one of claims 16 to 23, wherein the category dictionary creating step includes the step of generating category structure information based on MeSH term data. The method further includes a step of creating MeSH term category structure information to be created, wherein the category dictionary information is created based on the category structure information created by the step of creating MeSH term category structure information.
[0067]
This shows one example of the category dictionary creation step more specifically. According to this method, the category structure information is created based on the MeSH term data, and the category dictionary information is created based on the created category structure information, so that the medical terms and the like defined by the existing MeSH term data are used. Based on this, a category dictionary can be efficiently created.
[0068]
A dictionary information processing method according to claim 25 is the dictionary information processing method according to any one of claims 16 to 24, wherein the category dictionary creation step includes the step of: The method further includes a database category structure information creating step of creating category structure information with the existing database name or a field name of a specific field stored therein and leaf nodes as stored data stored in the database or the field. The above-mentioned category dictionary information is created based on the category structure information created in the database category structure information creating step.
[0069]
This shows one example of the category dictionary creation step more specifically. According to this method, based on the existing database, the root node is set to the existing database name or the field name of a specific field stored, and the leaf node is set to the database or each stored data stored in the field. Since the category structure information is created and the category dictionary information is created based on the created category structure information, it is possible to efficiently create the category dictionary based on fields, stored data, and the like defined by the existing database. become able to.
[0070]
The dictionary information processing method according to claim 26 is the dictionary information processing method according to any one of claims 16 to 25, wherein the category dictionary creating step is based on processing result data of an existing analysis program. And an analysis program category structure information creating step of creating category structure information having a root node as the name of the existing processing program and a leaf node as the processing result data. The category dictionary information is created based on the category structure information.
[0071]
This shows one example of the category dictionary creation step more specifically. According to this method, based on the processing result data of the existing analysis program, category structure information is created in which the root node is the existing processing program name and the leaf node is the processing result data, and the created category is Since the category dictionary information is created based on the structure information, the category dictionary can be efficiently created based on the processing result data of the existing analysis program.
[0072]
The dictionary information processing method according to claim 27 is the dictionary information processing method according to any one of claims 16 to 26, wherein the dictionary information check step includes a check term / phrase list, a check program, and a check program. The method further comprises an entry unit check step of checking the notation dictionary information and / or the category dictionary information for each entry based on at least one of the use patterns.
[0073]
This shows one example of the dictionary information check step more specifically. According to this method, the notation dictionary information and / or the category dictionary information is checked for each entry based on at least one of the check term / phrase list, the check program, and the check pattern. By doing so, it is possible to automatically improve the quality of the dictionary information.
[0074]
In addition, this makes it possible to easily find an inappropriate entry that is mixed in due to a program bug (fault) or omission of exception processing during dictionary creation.
[0075]
In addition, it is possible to easily find an inappropriate entry caused by an error in the used existing data.
[0076]
Further, an inappropriate entry as a text mining dictionary entry can be easily found.
[0077]
The dictionary information processing method according to claim 28 is the dictionary information processing method according to any one of claims 16 to 27, wherein the dictionary information checking step includes the step of checking the different notation registered in the notation dictionary information. The method may further include a normal form inconsistency check step of checking whether the form is registered as another normal form.
[0078]
This shows one example of the dictionary information check step more specifically. According to this method, it is checked whether or not another notation form registered in the notation dictionary information is registered as another normal form. Therefore, the quality of the dictionary information can be improved by eliminating the inconsistency of the normal form. This can be achieved automatically.
[0079]
A dictionary information processing method according to claim 29 is the dictionary information processing method according to any one of claims 16 to 28, wherein the dictionary information checking step includes the notation dictionary information and / or the category dictionary. Statistical check that performs statistical processing on the registration status and usage status related to the normal form, different notation form, or category registered in the information, and checks whether the result of the statistical processing is within a predetermined normal value range The method further includes a step.
[0080]
This shows one example of the dictionary information check step more specifically. According to this method, statistical processing is performed on the registration status and usage status related to the normal form, another notation form, or category registered in the notation dictionary information and / or the category dictionary information, and the result of the statistical processing is determined in advance. Since it is checked whether it is within the normal value range, it is possible to automatically improve the quality of the dictionary information by using a statistical method.
[0081]
Even when the number of dictionary information registration entries becomes enormous, an entry with a poor registration status (for example, when the entity entry is 0) or an entry with a poor usage status (for example, access Number, the number of extractions is 0, etc.) can be easily found.
[0082]
The dictionary information processing method according to claim 30 is the dictionary information processing method according to any one of claims 16 to 29, wherein the dictionary information checking step includes the notation dictionary information and / or the category dictionary. The method further includes a co-occurrence check step of calculating a similarity based on a co-occurrence relation regarding a normal form, another notation form, or a category registered in the information.
[0083]
This shows one example of the dictionary information check step more specifically. According to this method, the similarity is calculated based on the co-occurrence relation regarding the normal form, the different notation form, or the category registered in the notation dictionary information and / or the category dictionary information. It is possible to easily check the registered contents and judge whether the entries are integrated or abolished. For example, when the similarity between entries is higher than a predetermined similarity, those entries may be automatically integrated.
[0084]
Further, the present invention relates to a program, wherein the program according to claim 31 includes a notation dictionary creating step for creating notation dictionary information for defining a correspondence relationship between a normal form and another notation form of each term; A dictionary information processing method including: a category dictionary creating step of creating category dictionary information defining a category to which a shape belongs; and a dictionary information checking step of checking the notation dictionary information and / or information stored in the category dictionary information. Is executed by a computer.
[0085]
According to this program, notation dictionary information that defines the correspondence between the normal form and another notation form of each term is created, category dictionary information that defines the category to which the normal form belongs is created, and the notation dictionary information and / or Alternatively, since the information stored in the category dictionary information is checked, it is possible to automate the creation of various notation dictionaries and category dictionaries used in the document database search service, and the check of the created dictionaries. Further, the efficiency of dictionary creation and the accuracy can be improved.
[0086]
In the program according to claim 32, in the program according to claim 31, the notation dictionary creating step determines whether each field is in a normal form based on attribute information of each field configuring the existing database, The method further includes a field attribute determining step of determining whether to use a different notation form or not to use, and creating the notation dictionary information from each field of the existing database based on the determination result of the field attribute determining step. Features.
[0087]
This shows one example of the notation dictionary creation step more specifically. According to this program, based on the attribute information of each field constituting the existing database, it is determined whether each field is in a normal form, another notation form, or is not used, and based on the determination result, Since the notation dictionary information is created from each field of the existing database, the notation dictionary can be efficiently created from the existing database.
[0088]
In the program according to claim 33, in the program according to claim 31 or 32, the notation dictionary creating step determines whether the term is in a normal form based on the term described in the existing dictionary information. Further including a dictionary term determination step of determining whether to use another notation form or not to use, and creating the notation dictionary information from the term of the dictionary information based on the determination result of the dictionary term determination step It is characterized by.
[0089]
This shows one example of the notation dictionary creation step more specifically. According to this program, based on the terms described in the existing dictionary information (eg, words entered in columns of dictionary entries, abbreviations, synonyms, synonyms, etc.), the terms are made into a normal form. Judge whether to use notation form, another notation form, or not to use, and create notation dictionary information from the terms of dictionary information based on the judgment result, so create notation dictionary efficiently from existing dictionary information Will be able to
[0090]
Further, the program according to claim 34 is the program according to any one of claims 31 to 33, wherein the notation dictionary creating step includes the step of rewriting the term based on the term described in the existing Web information. The method further includes a Web term determination step of determining whether to use the normal form, another notation form, or not to use, and based on the determination result of the Web term determination step, from the above term of the Web information to the notation dictionary. It is characterized by creating information.
[0091]
This shows one example of the notation dictionary creation step more specifically. According to this program, existing Web information (for example, information written on an existing Web site or a Web site that can be written by a participant for the purpose of collecting terms registered in a dictionary) is written. (Including information, etc.) based on the term described in the Web information term based on the result of the judgment. Since the dictionary information is created, the notation dictionary can be efficiently created from the existing Web information.
[0092]
In addition, this makes it possible to realize and share the dictionary information possessed by each person.
[0093]
A program according to claim 35 is the program according to any one of claims 31 to 34, wherein the category dictionary creating step includes creating category structure information based on existing structured data. The method further includes the step of creating structured data category structure information, wherein the category dictionary information is created based on the category structure information created by the structured data category structure information creating step.
[0094]
This shows one example of the category dictionary creation step more specifically. According to this program, the category structure information is created based on the existing structured data, and the category dictionary information is created based on the created category structure information. Therefore, the classification defined by the existing structured data is performed. Thus, a category dictionary can be efficiently created on the basis of the above.
[0095]
The program according to claim 36 is the program according to claim 35, wherein the structured data category structure information creating step includes, when the existing structured data has a plurality of root nodes, And creating a category structure information by adding a virtual root node.
[0096]
This more specifically shows an example of the structured data category structure information creating step. According to this program, when there are a plurality of root nodes in the existing structured data, a virtual root node is added to the upper level to create category structure information, so that the classification defined by the existing structured data is performed. The category dictionary can be created more efficiently on the basis of the above.
[0097]
A program according to claim 37 is the program according to claim 35 or 36, wherein the structured data category structure information creating step corresponds to a case where a merge exists with the existing structured data. It is characterized in that a category structure information of a simple tree structure having no merging is created by copying a partial structure to the merging portion.
[0098]
This more specifically shows an example of the structured data category structure information creating step. According to this program, when a merge exists in the existing structured data, the corresponding partial structure is copied to the merge part to create simple tree-structured category structure information without the merge. The category dictionary can be created more efficiently based on the classification and the like defined by the structured data.
[0099]
A program according to claim 38 is the program according to any one of claims 31 to 37, wherein the category dictionary creating step sets a root node as a set data name based on existing set data, Further comprising a set category structure information creating step of creating category structure information having leaf nodes as set element names, creating the category dictionary information based on the category structure information created by the set category structure information create step It is characterized by.
[0100]
This shows one example of the category dictionary creation step more specifically. According to this program, based on the existing set data, the root node is set as the set data name, and the leaf node is set as the category element name, and the category structure information is created. Based on the created category structure information, the category dictionary information Is created, a category dictionary can be efficiently created based on information defined by existing set data.
[0101]
A program according to claim 39 is the program according to any one of claims 31 to 38, wherein the step of creating a category dictionary includes the step of creating category structure information based on MeSH term data. The method further includes an information creating step, wherein the category dictionary information is created based on the category structure information created by the MeSH term category structure information creating step.
[0102]
This shows one example of the category dictionary creation step more specifically. According to this program, the category structure information is created based on the MeSH term data, and the category dictionary information is created based on the created category structure information. Based on this, a category dictionary can be efficiently created.
[0103]
A program according to claim 40 is the program according to any one of claims 31 to 39, wherein the step of creating a category dictionary sets the root node based on an existing database to the name of the existing database or the name of the existing database. Further including a database category structure information creating step of creating category structure information as a field name of the stored specific field and a leaf node as the database or each stored data stored in the field, and creating the database category structure information The category dictionary information is created based on the category structure information created in the step.
[0104]
This shows one example of the category dictionary creation step more specifically. According to this program, based on the existing database, the root node is set to the existing database name or the field name of a specific field stored, and the leaf node is set to the database or each stored data stored in the field. Since the category structure information is created and the category dictionary information is created based on the created category structure information, it is possible to efficiently create the category dictionary based on fields, stored data, and the like defined by the existing database. become able to.
[0105]
A program according to claim 41 is the program according to any one of claims 31 to 40, wherein the step of creating a category dictionary sets the root node based on processing result data of an existing analysis program. The method further includes an analysis program category structure information creating step of creating category structure information with the existing processing program name and a leaf node as the processing result data, and the category structure information created by the analysis program category structure information creating step. It is characterized in that the category dictionary information is created based on the category dictionary information.
[0106]
This shows one example of the category dictionary creation step more specifically. According to this program, based on the processing result data of the existing analysis program, create the category structure information with the root node as the existing processing program name and the leaf node as the processing result data, and create the created category Since the category dictionary information is created based on the structure information, the category dictionary can be efficiently created based on the processing result data of the existing analysis program.
[0107]
A program according to claim 42 is the program according to any one of claims 31 to 41, wherein the dictionary information check step includes at least one of a check term / phrase list, a check program, and a check pattern. The method further includes an entry-unit check step of checking the notation dictionary information and / or the category dictionary information on an entry-by-entry basis.
[0108]
This shows one example of the dictionary information check step more specifically. According to this program, the notation dictionary information and / or the category dictionary information is checked for each entry based on at least one of the check term / phrase list, the check program, and the check pattern. By doing so, it is possible to automatically improve the quality of the dictionary information.
[0109]
In addition, this makes it possible to easily find an inappropriate entry that is mixed in due to a program bug (fault) or omission of exception processing during dictionary creation.
[0110]
In addition, it is possible to easily find an inappropriate entry caused by an error in the used existing data.
[0111]
Further, an inappropriate entry as a text mining dictionary entry can be easily found.
[0112]
A program according to claim 43 is the program according to any one of claims 31 to 42, wherein the dictionary information checking step is performed by using another notation form registered in the notation dictionary information as another normal form. The method further includes a normal-form mismatch check step of checking whether or not the information has been registered as.
[0113]
This shows one example of the dictionary information check step more specifically. According to this program, it is checked whether or not another notation form registered in the notation dictionary information is registered as another normal form. Therefore, by eliminating the inconsistency of the normal form, the quality of the dictionary information can be improved. This can be achieved automatically.
[0114]
A program according to claim 44 is the program according to any one of claims 31 to 43, wherein the dictionary information checking step is performed by using the registered dictionary information and / or the category dictionary information registered in the category dictionary information. The method further includes a statistic checking step of performing statistical processing on the registration status and usage status of the form, another notation form, or category, and checking whether or not the result of the statistical processing is within a predetermined normal value range. Features.
[0115]
This shows one example of the dictionary information check step more specifically. According to this program, statistical processing is performed on the registration status and usage status related to the normal form, another notation form, or category registered in the notation dictionary information and / or the category dictionary information, and the result of the statistical processing is determined in advance. Since it is checked whether it is within the normal value range, it is possible to automatically improve the quality of the dictionary information by using a statistical method.
[0116]
Even when the number of dictionary information registration entries becomes enormous, an entry with a poor registration status (for example, when the entity entry is 0) or an entry with a poor usage status (for example, access Number, the number of extractions is 0, etc.) can be easily found.
[0117]
The program according to claim 45 is the program according to any one of claims 31 to 44, wherein the dictionary information check step is performed by using a regular expression registered in the notation dictionary information and / or the category dictionary information. The method may further include a co-occurrence check step of calculating a similarity based on a co-occurrence relationship regarding a shape, another notation, or a category.
[0118]
This shows one example of the dictionary information check step more specifically. According to this program, the similarity is calculated based on the co-occurrence relation regarding the normal form, the different notation form, or the category registered in the notation dictionary information and / or the category dictionary information. It is possible to easily check the registered contents and judge whether the entries are integrated or abolished. For example, when the similarity between entries is higher than a predetermined similarity, those entries may be automatically integrated.
[0119]
Further, the present invention relates to a recording medium, wherein the recording medium according to claim 46 records the program according to any one of claims 31 to 45.
[0120]
According to this recording medium, the program described in any one of claims 31 to 45 is realized by using a computer by reading and executing the program recorded on the recording medium. And the same effect as each of these methods can be obtained.
[0121]
BEST MODE FOR CARRYING OUT THE INVENTION
Hereinafter, embodiments of a dictionary information processing apparatus, a dictionary information processing method, a program, and a recording medium according to the present invention will be described in detail with reference to the drawings. It should be noted that the present invention is not limited by the embodiment.
In particular, in the following embodiments, an example in which the present invention is applied to a literature information database search system for documents of natural sciences such as living organisms, medicine, and science will be described. The same can be applied to all systems for searching for document information.
[0122]
[Summary of the present invention]
Hereinafter, the outline of the present invention will be described, and then the configuration, processing, and the like of the present invention will be described in detail. FIG. 1 is a principle configuration diagram showing the basic principle of the present invention.
[0123]
The present invention generally has the following basic features. That is, the present invention is based on existing structured data, sets, databases, analysis program processing results, and the like, based on notation dictionary information that defines the correspondence between the normal form and another notation form of each term, and the normal form. Automatically create category dictionary information that defines the category to which it belongs.
[0124]
Then, the present invention automatically or semi-automatically checks the information stored in the notation dictionary information and / or the category dictionary information using various check methods. Here, as the checking method of the present invention, for example, each entry of the notation dictionary information and the category dictionary information may be checked based on a check term list, a check program, a check pattern, or the like. , A normal form mismatch check, a statistical check, a co-occurrence check, and the like may be performed.
Here, the details of the dictionary information creation method and the check method will be described later.
[0125]
[System configuration]
First, the configuration of the present system will be described. FIG. 2 is a block diagram showing an example of the configuration of the present system to which the present invention is applied, and conceptually shows only those parts of the configuration related to the present invention. The system schematically includes a dictionary information processing apparatus 100 and an external system 200 that provides an external database and related programs such as literature information, sequence information, and three-dimensional structure information, and various search services. It is configured so as to be communicable via the terminal.
[0126]
In FIG. 2, a network 300 has a function of interconnecting the dictionary information processing apparatus 100 and the external system 200, and is, for example, the Internet.
[0127]
In FIG. 2, an external system 200 is interconnected with the dictionary information processing apparatus 100 via a network 300, and executes a Web for executing an external database for sequence information and the like and an external program such as a homology search and a motif search for a user. Has the function of providing a site.
[0128]
Here, the external system 200 may be configured as a WEB server, an ASP server, or the like, and its hardware configuration may be configured by an information processing device such as a generally-available workstation, a personal computer, and its accompanying devices. Good. Each function of the external system 200 is realized by a CPU, a disk device, a memory device, an input device, an output device, a communication control device, and the like in a hardware configuration of the external system 200, a program for controlling them, and the like.
[0129]
In FIG. 2, the dictionary information processing apparatus 100 schematically includes a control unit 102 such as a CPU that comprehensively controls the entire dictionary information processing apparatus 100 and a communication apparatus such as a router connected to a communication line or the like (not shown). ), An input / output control interface unit 108 connected to the input device 112 and the output device 114, and a storage unit 106 for storing various databases and tables. These units are communicably connected via an arbitrary communication path. Further, the dictionary information processing apparatus 100 is communicably connected to the network 300 via a communication device such as a router and a wired or wireless communication line such as a dedicated line.
[0130]
Various databases and tables (notation dictionary information file 106a to check pattern file 106f) stored in the storage unit 106 are storage means such as a fixed disk device, and various programs, tables, files, and databases used for various processes. And files for web pages.
[0131]
Among the constituent elements of the storage unit 106, the notation dictionary information file 106a is a notation dictionary information storage unit that stores notation dictionary information that defines the correspondence between the normal form of each term and another notation form.
[0132]
The category dictionary information file 106b is a category dictionary information storage unit that stores category dictionary information that defines the category to which the normal form belongs.
[0133]
The document information file 106c is a document information storage unit that stores information such as document information to be analyzed.
[0134]
The existing information storage file 106d is an existing information storage unit that stores information on existing structured data, sets, databases, analysis target program processing results, dictionaries and the like.
[0135]
The check term / phrase list file 106e is a check term / phrase list storage unit that stores the check term / phrase list.
[0136]
The check pattern file 106f is a check pattern storage unit that stores a check pattern.
[0137]
2, a communication control interface unit 104 controls communication between the dictionary information processing apparatus 100 and the network 300 (or a communication device such as a router). That is, the communication control interface unit 104 has a function of communicating data with another terminal via a communication line.
[0138]
2, the input / output control interface unit 108 controls the input device 112 and the output device 114. Here, as the output device 114, in addition to a monitor (including a home television), a speaker can be used (in the following, the output device 114 may be described as a monitor). As the input device 112, a keyboard, a mouse, a microphone, and the like can be used. The monitor also realizes a pointing device function in cooperation with the mouse.
[0139]
2, the control unit 102 has a control program such as an OS (Operating System), a program defining various processing procedures and the like, and an internal memory for storing required data. And information processing for executing various processes. The control unit 102 functionally comprises a notation dictionary creation unit 102a, a category dictionary creation unit 102b, a dictionary information check unit 102c, a processing result output unit 102d, an analysis program unit 102e, and a name identification processing unit 102f. ing.
[0140]
The notation dictionary creation unit 102a is a notation dictionary creation unit that creates notation dictionary information that defines the correspondence between the normal form of each term and another notation form. Here, the notation dictionary creating unit 102a includes a field attribute determining unit 102g, a dictionary term determining unit 102h, and a Web term determining unit 102i, as shown in FIG. The field attribute determination unit 102g is a field attribute determination unit that determines whether each field is in a normal form, a different notation form, or is not used based on attribute information of each field configuring the existing database. is there. In addition, the dictionary term determination unit 102h normalizes the term based on the term described in the existing dictionary information (such as a term in the dictionary, a term described in an abbreviation, a synonym, a synonym, or the like column). It is a dictionary term determining means for determining whether to use the form, another form, or not to use. Further, the Web term determination unit 102i determines, based on the term described in the existing Web information, whether the term is in a normal form, another notation form, or is not used. It is.
[0141]
The category dictionary creating unit 102b is a category dictionary creating unit that creates category dictionary information that defines a category to which a normal form belongs. Here, as shown in FIG. 4, the category dictionary creating unit 102b includes a structured data category structure information creating unit 102j, a set category structure information creating unit 102k, a MeSH term category structure information creating unit 102m, a database category structure information creating unit 102n and an analysis program category structure information creating unit 102p. The structured data category structure information creating unit 102j is structured data category structure information creating means for creating category structure information based on existing structured data. The set category structure information creating unit 102k is a set category structure information creating unit that creates, based on existing set data, category structure information in which a root node is set as a set data name and a leaf node is an set element name. The MeSH term category structure information creating unit 102m is a MeSH term category structure information creating unit that creates category structure information based on MeSH term data. In addition, based on the existing database, the database category structure information creating unit 102n sets the root node to the existing database name or the field name of a specific field stored, and sets the leaf node to the database or the field. It is a database category structure information creating means for creating category structure information as each stored data. In addition, the analysis program category structure information creating unit 102p creates category structure information in which a root node is set to the existing processing program name and a leaf node is set to the processing result data based on the processing result data of the existing analysis program. It is an analysis program category structure information creating means.
[0142]
The dictionary information check unit 102c is a dictionary information check unit that checks information stored in the notation dictionary information and / or the category dictionary information. Here, as shown in FIG. 5, the dictionary information check unit 102c includes a normal form mismatch check unit 102r, a statistics check unit 102s, a co-occurrence check unit 102t, and an entry unit check unit 102u. The normal form inconsistency check unit 102r is a normal form inconsistency check unit that checks whether another notation form registered in the notation dictionary information is registered as another normal form. The statistical check unit 102s performs statistical processing on the registration status and usage status related to the normal form, another notation form, or category registered in the notation dictionary information and / or the category dictionary information, and the result of the statistical processing is determined in advance. This is a statistical checking means for checking whether or not a value falls within a predetermined normal value range. The co-occurrence check unit 102t is a co-occurrence check unit that calculates a similarity based on a co-occurrence relationship regarding a normal form, another notation form, or a category registered in the notation dictionary information and / or the category dictionary information. . The entry unit checking unit 102u is a unit for checking entry dictionary and / or category dictionary information for each entry based on at least one of a check term list, a check program, and a check pattern. It is.
[0143]
The processing result output unit 102d is a processing result output unit that outputs a processing result to the output device 114.
[0144]
The analysis program unit 102e is an analysis program execution unit that executes various analysis programs.
[0145]
Further, the name identification processing unit 102f is a name identification processing means for identifying each term registered in the normal form dictionary to lower case or singular form to identify those terms which become the same term.
The details of the processing performed by these units will be described later.
[0146]
[System processing]
Next, an example of the processing of the present system configured as described above according to the present embodiment will be described in detail below with reference to FIGS.
[0147]
[Automatic notation dictionary information creation process using existing database]
First, details of the automatic creation processing of the notation dictionary information using the existing database will be described with reference to FIGS. 6 and 7 are conceptual diagrams illustrating an example of an automatic creation process of the notation dictionary information using the existing database of the present system in the present embodiment.
[0148]
First, as shown in FIG. 6, the dictionary information processing apparatus 100 performs the processing of the field attribute determining unit 102 g to store each of the existing databases stored in the existing information storage file 106 d and the like and the external database of the external system 200. Based on the attribute information of the field, it is determined whether each field is in a normal form, another notation form, or is not used.
[0149]
Then, the dictionary information processing apparatus 100 creates notation dictionary information from each field of the existing database based on the determination result and stores the information in the notation dictionary information file 106a by the processing of the notation dictionary creation unit 102a. Here, when using an existing database such as a genome information database, fields uniquely associated with records, genes, etc., such as record IDs and Accession numbers, are used when the records, genes, etc. are in a normal form. Notation dictionary information may be created as a different notation form.
[0150]
As shown in FIG. 7, when a record stored in the existing database refers to a record (record X in the example of FIG. 7) of another database (database 1 in the example of FIG. 7), the reference destination By referring to the notation dictionary information created based on the record (the record X of the database 1 in the example of FIG. 7), the registered display dictionary information can be effectively used.
This completes the automatic creation processing of the notation dictionary information using the existing database.
[0151]
[Automatic creation of notation dictionary information using existing dictionary information]
Next, details of the automatic creation processing of the notation dictionary information using the existing dictionary information will be described with reference to FIG. FIG. 8 is a conceptual diagram illustrating an example of an automatic creation process of notation dictionary information using existing dictionary information of the present system in the present embodiment.
[0152]
As shown in FIG. 8, the dictionary information processing apparatus 100 performs the processing of the dictionary term determination unit 102h to execute the processing of the terms (entry words, abbreviations, and abbreviations) in the existing dictionary information stored in the existing information storage file 106d and the like. , Synonyms, synonyms, etc.), it is determined whether the term is in a normal form, another notation form, or is not used. For example, it is determined that the headword of each dictionary information is “normal form”, synonyms and the like are “different notation forms”, and meanings and example sentences are “not used”.
[0153]
Then, the dictionary information processing apparatus 100 creates the notation dictionary information from the terms of the existing dictionary information based on the result of the determination, and stores the notation dictionary information in the notation dictionary information file 106a by the processing of the notation dictionary creation unit 102a. As the dictionary information, an electronic medium dictionary may be used. Alternatively, a paper medium dictionary may be read using an input device 112 such as a scanner and digitized using a known text conversion tool (OCR). May be used.
This completes the process of automatically creating the notation dictionary information using the existing dictionary information.
[0154]
[Automatic Creation of Notation Dictionary Information Using Existing Web Information]
Next, details of the automatic creation processing of the notation dictionary information using the existing Web information will be described with reference to FIG. FIG. 9 is a conceptual diagram illustrating an example of an automatic creation process of the notation dictionary information using the existing Web information of the present system in the present embodiment.
[0155]
As illustrated in FIG. 9, the dictionary information processing apparatus 100 uses the processing of the Web term determination unit 102i to store the existing Web information stored in the existing information storage file 106d or the like (for example, the information described in the existing Web site, , Including information written on a website that can be written by the participant for the purpose of collecting the terms to be registered in the dictionary). Judge whether to use another notation form or not to use. Also, the Web term determination unit 102i has a function of displaying a Web site to which the participant can write on the participant's terminal, an editing function for the participant to write on the Web site, a function of collecting the written information, and the like. Although provided, these functions can be realized by using existing Web site operation technology.
[0156]
Then, the dictionary information processing apparatus 100 creates notation dictionary information from the terms of the existing Web information based on the result of the determination, and stores the notation dictionary information in the notation dictionary information file 106a by the processing of the notation dictionary creation unit 102a. For example, a notation dictionary may be created by accumulating dictionaries unique to the participants created by each web page creator who participates in the service. That is, among the terms registered in each participant's unique dictionary, it is determined whether the term is in a normal form, another notation form, or not used, and the selected normal form and another notation form are used. Create notation dictionary information based on the information. As a result, it becomes possible to realize and share the dictionary information possessed by each person.
This completes the automatic creation processing of the notation dictionary information using the existing Web information.
[0157]
[Automatic creation of category dictionary information using existing structured data]
Next, the details of the process of automatically creating category dictionary information using existing structured data will be described with reference to FIGS. 10 to 12 are conceptual diagrams showing an example of an automatic creation process of category dictionary information using existing structured data of the present system in the present embodiment.
[0158]
First, as shown in FIG. 10, the dictionary information processing apparatus 100 performs the processing of the structured data category structure information creating unit 102j on the basis of the existing structured data stored in the existing information storage file 106d or the like. Create information. Then, the category dictionary creating unit 102b creates category dictionary information based on the category structure information, and stores the created category dictionary information in the category dictionary information file 106b. Here, as shown in FIG. 10, as a work procedure, a category dictionary is created after a category structure is created, but the category structure is a category dictionary as a data dependency (based on what to create). Are also based on existing structured data.
[0159]
Further, as shown in FIG. 11, when there are a plurality of root nodes in the existing structured data (such a structure may be called a forest structure), the structured data category structure information creating unit 102j The category structure information is created by adding a virtual root node to the upper level. As a result, the category structure can always be treated as a simple tree structure, so that the search algorithm can be simplified.
[0160]
Further, as shown in FIG. 12, when there is a merge with existing structured data (such a structure is sometimes referred to as a DAG (directed non-circular graph) structure), the structured data category structure The information creation unit 102j creates category structure information of a simple tree structure with no merging by copying the corresponding partial structure to the merging portion. As a result, the category structure can always be treated as a simple tree structure, so that the search algorithm can be simplified.
This completes the process of automatically creating category dictionary information using the existing structured data.
[0161]
[Automatic creation of category dictionary information using existing set data]
Next, details of the process of automatically creating category dictionary information using existing set data will be described with reference to FIG. FIG. 13 is a conceptual diagram showing an example of an automatic creation process of category dictionary information using existing set data of the present system in the present embodiment.
[0162]
First, the dictionary information processing apparatus 100 sets the root node to the set data name and sets the leaf nodes to the set based on the existing set data stored in the existing information storage file 106d or the like by the processing of the set category structure information creating unit 102k. Create category structure information as element names. Then, the category dictionary creating unit 102b creates category dictionary information based on the category structure information, and stores the created category dictionary information in the category dictionary information file 106b. Here, as shown in FIG. 13, as a work procedure, a category dictionary is created after creating a category structure, but the category structure is also a category dictionary as data dependency (based on what to create). Are also created based on existing collective data.
[0163]
For example, if there is a set element of {Nematode, Human, Escherichia coli} in the existing set called "Genome-Decoded Organism", the root node is set to "Genome-Decoded Organism" and the leaf node is set to "Line". Category structure information for “insects, humans, and Escherichia coli” is created, and category dictionary information is created based on the category structure information.
This completes the process of automatically creating category dictionary information using the existing set data.
[0164]
[Automatic creation of category dictionary information using existing MeSH term data]
Next, the details of the process of automatically creating the category dictionary information using the existing MeSH term data will be described with reference to FIGS. FIGS. 14 to 16 are conceptual diagrams showing an example of automatic creation processing of category dictionary information using existing MeSH term data of the present system in the present embodiment.
[0165]
First, as shown in FIG. 14, the dictionary information processing apparatus 100 converts a complicated data structure such as existing MeSH term data stored in the existing information storage file 106d or the like by the processing of the MeSH term category structure information creating unit 102m. Create category structure information based on the data possessed.
[0166]
Here, in MeSH, a main structure of the MeSH term is indicated by a DAG structure of Dterm. Such a DAG structure of Dterm can be a category structure by using a simple tree structure without confluence by the above-described method as shown in FIG. Also, a Qterm that can be added is defined for each Dterm, and the correspondence between the Dterm and the Qterm is defined. Here, there is a method in which category dictionary information is created simply by ignoring the relationship between Cterm and Qterm and stored in the category dictionary information file 106b. As shown in FIG. 16, the correspondence between Dterm and Qterm can also have a category structure. Cterm is a word corresponding to a pair (single or plural) of Dterm and Qterm, and can be a candidate for a normal form. As described above, the MeSH term category structure information creation unit 102m creates category structure information from Dterm, Qterm, and Cterm. Then, the category dictionary creating unit 102b creates category dictionary information based on the category structure information, and stores the created category dictionary information in the category dictionary information file 106b.
This completes the process of automatically creating category dictionary information using the existing MeSH term data.
[0167]
[Automatic creation of category dictionary information using existing database]
Next, details of the automatic creation processing of the category dictionary information using the existing database will be described with reference to FIGS. 17 and 18 are conceptual diagrams showing an example of an automatic creation process of category dictionary information using an existing database of the present system in the present embodiment.
[0168]
First, as shown in FIG. 17, the dictionary information processing apparatus 100 performs processing by the database category structure information creating unit 102n to store the existing database stored in the existing information storage file 106d or the external database stored in the external system 200. Based on this, category structure information is created in which the root node is the existing database name or the field name of a specific field stored, and the leaf node is the database or each stored data stored in the field. Here, as shown in FIG. 17, as a work procedure, a category dictionary is created after a category structure is created, but the category structure is also used as a data dependency (based on what to create). It is also based on existing databases.
[0169]
Here, the existing database may be, for example, a database that stores characteristic structures such as Prosite, Pfam, and SMART.
[0170]
Also, as shown in FIG. 18, when there is a field storing a finite control phrase value such as a document name or an expression site name, a root node is a control phrase field name, a leaf node is a control phrase, and a normal node is a control phrase. The category structure information may be created using the shape as the value of the name field. Then, the category dictionary creating unit 102b creates category dictionary information based on the category structure information, and stores the created category dictionary information in the category dictionary information file 106b.
This completes the process of automatically creating category dictionary information using the existing database.
[0171]
[Automatic creation of category dictionary information using processing result data of existing analysis program]
Next, details of the automatic creation processing of the category dictionary information using the processing result data of the existing analysis program will be described with reference to FIG. FIG. 19 is a conceptual diagram illustrating an example of an automatic creation process of category dictionary information using processing result data of an existing analysis program of the present system in the present embodiment.
[0172]
First, as shown in FIG. 19, the dictionary information processing apparatus 100 performs the processing of the analysis program category structure information creation unit 102p based on the processing result data of the existing analysis program executed by the analysis program unit 102e based on the root node. Is used as the existing processing program name, and the category structure information is created using the leaf node as the processing result data. Then, the category dictionary creating unit 102b creates category dictionary information based on the category structure information, and stores the created category dictionary information in the category dictionary information file 106b.
Thus, the automatic creation processing of the category dictionary information using the processing result data of the existing analysis program ends.
[0173]
[Dictionary information check processing for each entry]
Next, the details of the dictionary information check processing for each entry will be described with reference to FIGS. 20 to 22 are conceptual diagrams illustrating an example of the dictionary information check processing in entry units of the present system in the present embodiment.
[0174]
First, as shown in FIG. 20, the dictionary information processing apparatus 100 stores the information in the notation dictionary information file 106a based on the check term list stored in the check term list file 106e by the processing of the entry unit check unit 102u. The written dictionary information and / or the category dictionary information stored in the category dictionary information file 106b are checked for each entry. Here, the check term / phrase list is a list in which terms such as prepositions, articles, and pronouns that should not be registered as normal forms or alternative notations are stored as lists.
[0175]
Also, as shown in FIG. 21, the dictionary information processing apparatus 100 performs the processing of the entry unit check unit 102u based on the check pattern stored in the check pattern file 106f and the notation dictionary information file based on the check program. The notation dictionary information stored in the category dictionary information and / or the category dictionary information stored in the category dictionary information file 106b is checked for each entry. Here, the check pattern is a pattern in which a pattern such as a numerical expression or a symbol string expression that should not be used (for example, described in a regular expression) is registered. The checking program is a program for checking a plurality of normal forms registered as another normal form. Further, as shown in FIG. 22, the check program measures the character string length, the number of words, the number of characters for each character type, and the like of each normal form and another notation form, and enters a predetermined normal range for each measurement item. It may be a measurement program for checking whether or not this is the case and outputting an abnormal check result.
This completes the dictionary information check processing for each entry.
[0176]
[Normal form mismatch check processing]
Next, details of the normal form mismatch check processing will be described with reference to FIG. FIG. 23 is a conceptual diagram illustrating an example of a normal-form mismatch check process of the present system in the present embodiment.
First, as shown in FIG. 23, the dictionary information processing apparatus 100 converts the different notation registered in the notation dictionary information stored in the notation dictionary information file 106a into another regular Checks if it is registered as a shape. As a result, the normal form is set as a different notation form from the other normal forms, and it is possible to check a duplicate form registered in the notation dictionary.
This completes the normal-form mismatch check process.
[0177]
[Statistics check processing]
Next, details of the statistical check processing will be described with reference to FIGS. FIG. 24 and FIG. 25 are conceptual diagrams illustrating an example of the statistical check processing of the present system in the present embodiment.
[0178]
First, as shown in FIG. 24, the dictionary information processing apparatus 100 performs the processing of the statistical check unit 102s to write the notation dictionary information stored in the notation dictionary information file 106a and / or the category dictionary stored in the category dictionary information file 106b. Performs statistical processing to obtain statistics on the registration status and usage status of the normal form, different notation form, or category registered in the information, and determines whether the result of the statistical processing is within a predetermined normal value range. To check.
[0179]
Here, as the statistical processing on the registration status, for example, as shown in FIG. 24, statistical processing may be performed on the number of normal forms for the same different notation form, the number of categories for the same normal form, the number of normal forms for the same category, and the like. .
[0180]
In addition, as the statistical processing related to the usage status, for example, as shown in FIG. 25, the statistical check unit 102s may search for a dictionary lookup hit for each original data of the document information stored in the document information file 106c and for each dictionary entry. A matrix may be created by counting the number of times, and statistical processing such as counting or viewing the distribution in the vertical or horizontal direction may be performed. Here, when counting in the vertical or horizontal direction, the statistical check unit 102s may simply take the sum of the numbers, or may count the number of squares other than 0. The statistical check unit 102s counts the sum of simple numbers and the number of squares other than 0 for each type of information (eg, normal form, notation dictionary name, information extracted by a parser, information on n-term relations, etc.). May be. When calculating the statistic, the statistical check unit 102s may calculate a maximum value, a minimum value, an average value, a distribution, or the like for each of the vertical and horizontal directions. The maximum value, the minimum value, the average value, the distribution, or the like may be calculated every time or in the entire table.
[0181]
In addition, as the statistical processing related to the category dictionary, the statistical check unit 102s may count the number of extractions for each original data of the document information stored in the document information file 106c and for each node of the category dictionary. Also, the statistics check unit 102s may create a matrix and perform statistical processing such as viewing totals and distributions in the vertical or horizontal direction. Here, when counting in the vertical or horizontal direction, the statistical check unit 102s may simply take the sum of the numbers, or may count the number of cells other than 0. Also, the statistics check unit 102s may simply take the sum of the numbers for each subtree, or may count the number of cells other than 0. When calculating the statistic, the statistical check unit 102s may calculate a maximum value, a minimum value, an average value, a distribution, or the like for each of the vertical and horizontal directions. The maximum value, the minimum value, the average value, the distribution, or the like may be calculated every time or in the entire table.
[0182]
In addition, the statistics checking unit 102s may count the number of times of extraction from a continuous portion on the text for each of the original data or for each m-item set of information. This makes it possible to check whether a set of terms having meanings in the collocation and the order of appearance are correctly registered.
[0183]
Also, the statistics check unit 102s may perform a statistical process by counting the number of words in locations that were not found by dictionary lookup or locations where information was not extracted for each original data. Statistical processing may be performed by counting the number of normal forms that did not exist or the number of normal forms that did not become an element of the n-term relation.
Thus, the statistical check processing ends.
[0184]
[Co-occurrence check processing]
Next, details of the co-occurrence check process will be described with reference to FIGS. FIG. 26 and FIG. 27 are conceptual diagrams illustrating an example of the co-occurrence check processing of the present system in the present embodiment.
[0185]
First, as shown in FIG. 26, the dictionary information processing apparatus 100 performs processing by the co-occurrence check unit 102t based on a co-occurrence relationship such as a notation dictionary entry having the same different notation form and a category having the same normal form. Calculate the similarity of each. For example, when the example of FIG. 26 is applied to a notation dictionary (XXX is a normal form, YYY group is another notation form), since the normal form A and the normal form B have the same different form W, they have a co-occurrence relationship. I have. If the normal form A and the normal form B all have the same different notation form, the normal form A and the normal form B are the same, and if different forms are included, they are similar. . Also, when the example of FIG. 26 is applied to a category dictionary (XXX is a category, YYY group is a normal form), since category A and category B have the same normal form W, they have a co-occurrence relationship. Then, when category A and category B all have the same normal form, category A and category B are the same, and when different categories are included, they are similar.
[0186]
Here, the calculation of the similarity may be represented by the number of matches, as shown in FIG. 27 (in Example 1 in FIG. 27, since two of X and W match, the similarity is 2), Alternatively, the number of matches may be indicated (in Example 2 in FIG. 27, the number of matches is 2 out of the total number of 13 elements, 2/13).
Thus, the co-occurrence check process ends.
[0187]
[Name identification processing using logic]
Next, details of the merging process using logic will be described with reference to FIG. FIG. 28 is a conceptual diagram illustrating an example of a name identification process using the logic of the present system in the present embodiment.
First, as shown in FIG. 28, the dictionary information processing apparatus 100 performs a check by performing lowercase, singular, etc. in the determination of the identity of the words in each dictionary check item by the processing of the name identification processing unit 102f. Improving accuracy.
Thus, the merging process using the logic is completed.
[0188]
[Check result output process]
Next, details of the check result output process will be described with reference to FIG. FIG. 29 is a conceptual diagram illustrating an example of a check result output process of the present system in the present embodiment.
As shown in FIG. 29, the dictionary information processing apparatus 100 outputs the check result if the check result or the like by the dictionary information check unit 102c exceeds a predetermined normal value range by the process of the process result output unit 102d. Output to device 114.
This completes the check result output process.
[0189]
[Other embodiments]
Although the embodiments of the present invention have been described above, the present invention is not limited to the above-described embodiments, but may be applied to various different embodiments within the scope of the technical idea described in the claims. It may be implemented.
[0190]
For example, the case where the dictionary information processing apparatus 100 performs the processing in a stand-alone form has been described as an example, but the processing is performed in response to a request from a client terminal configured in a separate housing from the dictionary information processing apparatus 100, The processing result may be returned to the client terminal.
[0191]
Further, among the processes described in the embodiment, all or a part of the processes described as being performed automatically may be manually performed, or all of the processes described as being performed manually may be performed. Alternatively, it can be performed partly automatically by a known method.
In addition, the processing procedures, control procedures, specific names, information including parameters such as various registration data and search conditions, screen examples, and database configurations shown in the above-described documents and drawings, except where otherwise noted, It can be changed arbitrarily.
[0192]
Also, regarding the dictionary information processing apparatus 100, the illustrated components are functionally conceptual and do not necessarily need to be physically configured as illustrated.
For example, with respect to the processing functions included in each unit or each device of the dictionary information processing apparatus 100, in particular, each processing function performed by the control unit 102, all or any part thereof is transferred to a CPU (Central Processing Unit) and the CPU. It can be realized by a program that is interpreted and executed, or can be realized as hardware by wired logic. The program is recorded on a recording medium described later, and is mechanically read by the dictionary information processing apparatus 100 as needed.
[0193]
That is, a computer program for giving instructions to the CPU in cooperation with an OS (Operating System) and performing various processes is recorded in the storage unit 106 such as a ROM or an HD. This computer program is executed by being loaded into a RAM or the like, and configures the control unit 102 in cooperation with the CPU. Further, this computer program may be recorded in an application program server connected to the dictionary information processing apparatus 100 via an arbitrary network 300, and all or a part of the computer program may be downloaded as needed. It is.
[0194]
Further, the program according to the present invention can be stored in a computer-readable recording medium. Here, the “recording medium” refers to an arbitrary “portable physical medium” such as a flexible disk, a magneto-optical disk, a ROM, an EPROM, an EEPROM, a CD-ROM, an MO, a DVD, and the like, and a built-in various computer systems. A short-term program such as a communication line or a carrier wave when transmitting the program via an arbitrary "fixed physical medium" such as ROM, RAM, HD, or a network represented by LAN, WAN, or the Internet. "Communications medium" that holds.
[0195]
The “program” is a data processing method described in an arbitrary language or description method, and may be in any format such as a source code or a binary code. The “program” is not necessarily limited to a single program, but may be distributed in the form of a plurality of modules or libraries, or may operate in cooperation with a separate program represented by an OS (Operating System). Includes those that achieve functions. Note that a known configuration and procedure can be used for a specific configuration, a reading procedure, an installation procedure after reading, and the like in each apparatus described in the embodiments.
[0196]
Various databases and the like (notation dictionary information file 106a to check pattern file 106f) stored in the storage unit 106 are storage devices such as a memory device such as a RAM and a ROM, a fixed disk device such as a hard disk, a flexible disk, and an optical disk. Yes, and stores various programs, tables, files, databases, web page files, and the like used for various processes and website provision.
[0197]
Further, the dictionary information processing apparatus 100 connects a peripheral device such as a printer, a monitor, and an image scanner to an information processing apparatus such as a known personal computer or an information processing terminal such as a workstation. May be implemented by implementing software (including programs, data, and the like) for implementing the above.
[0198]
Further, the specific form of the distribution / integration of the dictionary information processing apparatus 100 is not limited to the illustrated one, and all or a part of the distribution / integration may be functionally or physically distributed / arbitrarily in an arbitrary unit corresponding to various loads. Can be integrated and configured. For example, each database may be independently configured as an independent database device, or a part of the processing may be realized using a CGI (Common Gateway Interface).
[0199]
The network 300 has a function of interconnecting the dictionary information processing apparatus 100 and the external system 200. For example, the network 300 includes the Internet, an intranet, a LAN (including both wired / wireless), a VAN, and a personal computer. A communication network, a public telephone network (including both analog and digital), a private line network (including both analog and digital), a CATV network, an IMT2000 system, a GSM system, a PDC / PDC-P system, and the like. It may include any of a cellular line switching network / portable packet switching network, a radio paging network, a local radio network such as Bluetooth, a PHS network, and a satellite communication network such as CS, BS or ISDB. That is, the present system can transmit and receive various data via any network regardless of wired or wireless.
[0200]
【The invention's effect】
As described in detail above, according to the present invention, the notation dictionary information that defines the correspondence between the normal form of each term and another notation form is created, and the category dictionary information that defines the category to which the normal form belongs is created. Since the information is created and the information stored in the notation dictionary information and / or the category dictionary information is checked, the creation of various notation dictionaries and category dictionaries used in the document database search service and the check of the created dictionaries are automated. , A dictionary information processing method, a program, and a recording medium.
[0201]
Further, thereby, it is possible to provide a dictionary information processing apparatus, a dictionary information processing method, a program, and a recording medium that can increase the efficiency and accuracy of dictionary creation.
[0202]
Further, according to the present invention, based on attribute information of each field constituting the existing database, it is determined whether each field has a normal form, a different notation form, or is not used, and the result is determined. The present invention provides a dictionary information processing apparatus, a dictionary information processing method, a program, and a recording medium capable of efficiently creating a notation dictionary from an existing database because the notation dictionary information is created from each field of the existing database based on the information. be able to.
[0203]
Further, according to the present invention, based on a term described in existing dictionary information (eg, a term in a dictionary, a term described in a column such as abbreviations, synonyms, synonyms, etc.), the term is normalized. To determine whether to use a different notation form or not to use, and to create notation dictionary information from the terms of the dictionary information based on the result of the determination, so to efficiently create a notation dictionary from existing dictionary information A dictionary information processing apparatus, a dictionary information processing method, a program, and a recording medium.
[0204]
Further, according to the present invention, existing Web information (for example, information written on an existing Web site or written on a Web site that can be written by a participant for the purpose of collecting terms registered in a dictionary) is written. Based on the term described in the above), determine whether the term is in a normal form, a different form, or is not used, and use the term of the Web information based on the result of the determination. Since the notation dictionary information is generated from the Web information, it is possible to provide a dictionary information processing apparatus, a dictionary information processing method, a program, and a recording medium that can efficiently generate a notation dictionary from existing Web information.
[0205]
Further, according to the present invention, it is possible to provide a dictionary information processing apparatus, a dictionary information processing method, a program, and a recording medium capable of realizing and sharing dictionary information possessed by each person.
[0206]
According to the present invention, the category structure information is created based on the existing structured data, and the category dictionary information is created based on the created category structure information. It is possible to provide a dictionary information processing apparatus, a dictionary information processing method, a program, and a recording medium that can efficiently create a category dictionary based on classified data or the like.
[0207]
Further, according to the present invention, when there are a plurality of root nodes in the existing structured data, a virtual root node is added at a higher level to create category structure information. It is possible to provide a dictionary information processing apparatus, a dictionary information processing method, a program, and a recording medium that can more efficiently create a category dictionary based on the classification or the like.
[0208]
Further, according to the present invention, when a merge exists in the existing structured data, the corresponding partial structure is duplicated in the merge portion to create category structure information of a simple tree structure without merge. In addition, it is possible to provide a dictionary information processing apparatus, a dictionary information processing method, a program, and a recording medium that can more efficiently create a category dictionary based on a classification or the like defined by existing structured data.
[0209]
Further, according to the present invention, based on the existing set data, a root node is set as a set data name, and a leaf node is set as category structure information with a set element name, and a category is created based on the created category structure information. Provided is a dictionary information processing apparatus, a dictionary information processing method, a program, and a recording medium that can efficiently create a category dictionary based on information defined by existing collective data because dictionary information is created. Can be.
[0210]
Further, according to the present invention, the category structure information is created based on the MeSH term data, and the category dictionary information is created based on the created category structure information. Therefore, the medical term defined by the existing MeSH term data is used. It is possible to provide a dictionary information processing apparatus, a dictionary information processing method, a program, and a recording medium that can efficiently create a category dictionary based on the above.
[0211]
Further, according to the present invention, based on an existing database, the root node is set to the existing database name or the field name of a specific field stored, and the leaf node is set to the stored data stored in the database or the field. The category dictionary information is created, and the category dictionary information is created based on the created category structure information. Therefore, the category dictionary is efficiently created based on the fields and stored data defined by the existing database. A dictionary information processing apparatus, a dictionary information processing method, a program, and a recording medium.
[0212]
Further, according to the present invention, based on the processing result data of the existing analysis program, the category structure information in which the root node is the existing processing program name and the leaf node is the processing result data is created and created. Since the category dictionary information is created based on the category structure information, a dictionary information processing apparatus, a dictionary information processing method, a program, and a program that can efficiently create a category dictionary based on processing result data of an existing analysis program , A recording medium can be provided.
[0213]
Also, according to the present invention, the notation dictionary information and / or the category dictionary information is checked for each entry based on at least one of the check term list, the check program, and the check pattern. It is possible to provide a dictionary information processing apparatus, a dictionary information processing method, a program, and a recording medium that can automatically improve the quality of dictionary information by being determined in advance.
[0214]
Further, according to the present invention, a dictionary information processing apparatus, a dictionary information processing method, a program, and a program that can easily find an inappropriate entry mixed due to a bug (fault) of a program, omission of exception processing, or the like in dictionary creation. , A recording medium can be provided.
[0215]
Further, according to the present invention, it is possible to provide a dictionary information processing apparatus, a dictionary information processing method, a program, and a recording medium that can easily find an inappropriate entry due to an error in existing data used. it can.
[0216]
Further, according to the present invention, it is possible to provide a dictionary information processing device, a dictionary information processing method, a program, and a recording medium that can easily find an inappropriate entry as a text mining dictionary entry.
[0217]
Also, according to the present invention, it is checked whether or not another notation form registered in the notation dictionary information is registered as another normal form. It is possible to provide a dictionary information processing apparatus, a dictionary information processing method, a program, and a recording medium that can automatically realize the conversion.
[0218]
Further, according to the present invention, statistical processing is performed on a registration status and a usage status relating to a normal form, another notation form, or a category registered in the notation dictionary information and / or the category dictionary information, and the result of the statistical processing is determined in advance. A dictionary information processing apparatus, a dictionary information processing method, a program, and a method that can automatically improve the quality of dictionary information by using a statistical method because it checks whether or not the values fall within a predetermined normal value range. A recording medium can be provided.
[0219]
Further, according to the present invention, even in the case where the number of registered entries of the dictionary information becomes enormous, an entry having a bad registration status (for example, when the actual entry is 0) or a usage It is possible to provide a dictionary information processing apparatus, a dictionary information processing method, a program, and a recording medium that can easily find a bad entry (for example, when the number of accesses and the number of extractions are 0).
[0220]
Further, according to the present invention, the similarity is calculated based on the co-occurrence relation regarding the normal form, the different notation form, or the category registered in the notation dictionary information and / or the category dictionary information. It is possible to provide a dictionary information processing apparatus, a dictionary information processing method, a program, and a recording medium that can easily check registration contents and determine whether entries are integrated or abolished using the same.
[Brief description of the drawings]
FIG. 1 is a principle configuration diagram showing a basic principle of the present invention.
FIG. 2 is a block diagram illustrating an example of a configuration of the present system to which the present invention is applied.
FIG. 3 is a block diagram illustrating an example of a configuration of a notation dictionary creation unit 102a to which the present invention is applied.
FIG. 4 is a block diagram showing an example of a configuration of a category dictionary creating unit 102b to which the present invention is applied.
FIG. 5 is a block diagram illustrating an example of a configuration of a dictionary information check unit 102c to which the present invention is applied.
FIG. 6 is a conceptual diagram illustrating an example of an automatic creation process of notation dictionary information using an existing database of the present system in the present embodiment.
FIG. 7 is a conceptual diagram illustrating an example of an automatic creation process of notation dictionary information using an existing database of the present system in the present embodiment.
FIG. 8 is a conceptual diagram showing an example of an automatic creation process of notation dictionary information using existing dictionary information of the present system in the present embodiment.
FIG. 9 is a conceptual diagram showing an example of an automatic creation process of notation dictionary information using existing Web information of the present system in the present embodiment.
FIG. 10 is a conceptual diagram illustrating an example of an automatic creation process of category dictionary information using existing structured data of the present system in the present embodiment.
FIG. 11 is a conceptual diagram illustrating an example of an automatic creation process of category dictionary information using existing structured data of the present system in the present embodiment.
FIG. 12 is a conceptual diagram showing an example of an automatic creation process of category dictionary information using existing structured data of the present system in the present embodiment.
FIG. 13 is a conceptual diagram illustrating an example of an automatic creation process of category dictionary information using existing set data of the present system in the present embodiment.
FIG. 14 is a conceptual diagram illustrating an example of an automatic creation process of category dictionary information using existing MeSH term data of the present system in the present embodiment.
FIG. 15 is a conceptual diagram showing an example of an automatic creation process of category dictionary information using existing MeSH term data of the present system in the present embodiment.
FIG. 16 is a conceptual diagram showing an example of an automatic creation process of category dictionary information using existing MeSH term data of the present system in the present embodiment.
FIG. 17 is a conceptual diagram showing an example of an automatic creation process of category dictionary information using an existing database of the present system in the present embodiment.
FIG. 18 is a conceptual diagram showing an example of an automatic creation process of category dictionary information using an existing database of the present system in the present embodiment.
FIG. 19 is a conceptual diagram showing an example of an automatic creation process of category dictionary information using processing result data of an existing analysis program of the present system in the present embodiment.
FIG. 20 is a conceptual diagram illustrating an example of dictionary information check processing in entry units of the present system in the present embodiment.
FIG. 21 is a conceptual diagram illustrating an example of dictionary information check processing in entry units of the present system in the present embodiment.
FIG. 22 is a conceptual diagram illustrating an example of dictionary information check processing in entry units of the present system in the present embodiment.
FIG. 23 is a conceptual diagram illustrating an example of a normal-form mismatch check process of the system according to the present embodiment.
FIG. 24 is a conceptual diagram illustrating an example of a statistical check process of the system according to the present embodiment.
FIG. 25 is a conceptual diagram illustrating an example of a statistical check process of the system according to the present embodiment.
FIG. 26 is a conceptual diagram illustrating an example of a co-occurrence check process of the present system in the present embodiment.
FIG. 27 is a conceptual diagram illustrating an example of a co-occurrence check process of the present system in the present embodiment.
FIG. 28 is a conceptual diagram illustrating an example of a name identification process using logic of the present system in the present embodiment.
FIG. 29 is a conceptual diagram illustrating an example of a check result output process of the system according to the present embodiment.
[Explanation of symbols]
100 dictionary information processing device
102 control unit
102a Notation dictionary creation unit
102b Category dictionary creation unit
102c Dictionary information check unit
102d processing result output unit
102e Analysis program section
102f Merging section
102g Field attribute judgment unit
102h Dictionary term judgment unit
102i Web term determination unit
102j Structured data category structure information creation unit
102k set category structure information creation unit
102m MeSH term category structure information creation unit
102n Database category structure information creation unit
102p Analysis program category structure information creation unit
102r Normal form mismatch check unit
102s Statistics check section
102t Co-occurrence check section
102u entry unit check section
104 Communication control interface unit
106 storage unit
106a Notation dictionary information file
106b Category dictionary information file
106c Document information file
106d Existing information storage file
106e Check term list file
106f Check pattern file
108 I / O control interface
112 input device
114 Output device
200 External system
300 Network

Claims

A notation dictionary creating means for creating notation dictionary information that defines a correspondence between a normal form and a different notation form of each term;
A category dictionary creating means for creating category dictionary information defining a category to which the normal form belongs;
Dictionary information checking means for checking information stored in the notation dictionary information and / or the category dictionary information;
A dictionary information processing apparatus comprising:

The above notation dictionary creating means,
Field attribute determining means for determining whether each field is in a normal form, a different notation form, or is not used, based on attribute information of each field constituting the existing database,
Further comprising
2. The dictionary information processing apparatus according to claim 1, wherein the notation dictionary information is created from each field of the existing database based on a determination result of the field attribute determination unit.

The above notation dictionary creating means,
Dictionary term determination means for determining whether the term is in a normal form, a different notation form, or not used, based on the term described in the existing dictionary information,
Further comprising
3. The dictionary information processing apparatus according to claim 1, wherein the notation dictionary information is created from the term of the dictionary information based on a determination result of the dictionary term determination unit.

The above notation dictionary creating means,
Web term determining means for determining, based on terms described in existing Web information, whether the term is in a normal form, another notation form, or not used,
Further comprising
4. The dictionary information processing apparatus according to claim 1, wherein the notation dictionary information is created from the terms of the Web information based on a result of the determination by the Web term determination unit.

The category dictionary creating means is:
Structured data category structure information creating means for creating category structure information based on existing structured data,
Further comprising
The dictionary information processing apparatus according to any one of claims 1 to 4, wherein the category dictionary information is created based on the category structure information created by the structured data category structure information creating means.

The structured data category structure information creating means includes:
If there are a plurality of root nodes in the existing structured data, add a virtual root node above the root node to create category structure information;
The dictionary information processing apparatus according to claim 5, wherein:

The structured data category structure information creating means includes:
If there is a merge in the existing structured data, create a simple tree structure category structure information without merge by duplicating the corresponding partial structure to the merge,
7. The dictionary information processing apparatus according to claim 5, wherein:

The category dictionary creating means is:
A set category structure information creating means for creating category structure information having a root node as an aggregate data name and a leaf node as an aggregate element name based on the existing aggregate data;
Further comprising
8. The dictionary information processing apparatus according to claim 1, wherein the category dictionary information is created based on the category structure information created by the set category structure information creating unit.

The category dictionary creating means is:
MeSH term category structure information creating means for creating category structure information based on MeSH term data;
Further comprising
9. The dictionary information processing apparatus according to claim 1, wherein the category dictionary information is created based on the category structure information created by the MeSH term category structure information creating means.

The category dictionary creating means is:
Based on the existing database, create category structure information where the root node is the name of the existing database or the field name of a specific field stored, and the leaf node is the database or each stored data stored in the field. Database category structure information creation means,
Further comprising
The dictionary information processing apparatus according to claim 1, wherein the category dictionary information is created based on the category structure information created by the database category structure information creating unit.

The category dictionary creating means is:
Analysis program category structure information creating means for creating category structure information with a root node as the existing processing program name and a leaf node as the processing result data, based on the processing result data of the existing analysis program,
Further comprising
The dictionary information processing apparatus according to any one of claims 1 to 10, wherein the category dictionary information is created based on the category structure information created by the analysis program category structure information creating means.

The dictionary information checking means includes:
Entry unit checking means for checking the notation dictionary information and / or the category dictionary information for each entry based on at least one of a check term list, a check program, and a check pattern;
The dictionary information processing apparatus according to any one of claims 1 to 11, further comprising:

The dictionary information checking means includes:
A normal form inconsistency check means for checking whether another notation form registered in the notation dictionary information is registered as another normal form,
The dictionary information processing apparatus according to any one of claims 1 to 12, further comprising:

The dictionary information checking means includes:
Statistical processing is performed on the registration status and usage status related to the normal form, another notation form, or category registered in the notation dictionary information and / or the category dictionary information, and the result of the statistical processing falls within a predetermined normal value range. Statistics checking means to check whether it is included,
The dictionary information processing apparatus according to any one of claims 1 to 13, further comprising:

The dictionary information checking means includes:
Co-occurrence checking means for calculating a similarity based on a co-occurrence relationship with respect to the normal form, another notation form, or category registered in the notation dictionary information and / or the category dictionary information;
The dictionary information processing apparatus according to any one of claims 1 to 14, further comprising:

A notation dictionary creating step of creating notation dictionary information that defines the correspondence between the normal form of each term and another notation form;
A category dictionary creating step of creating category dictionary information defining a category to which the normal form belongs;
A dictionary information check step of checking information stored in the notation dictionary information and / or the category dictionary information;
A dictionary information processing method comprising:

The notation dictionary creation step includes:
A field attribute determining step of determining whether each field has a normal form, a different form, or is not used, based on attribute information of each field configuring the existing database;
Further comprising
17. The dictionary information processing method according to claim 16, wherein the notation dictionary information is created from each field of the existing database based on a determination result of the field attribute determination step.

The notation dictionary creation step includes:
Based on the terms described in the existing dictionary information, the term is in a normal form, a different notation form, or a dictionary term determining step of determining whether or not to use,
Further comprising
18. The dictionary information processing method according to claim 16, wherein the notation dictionary information is created from the term of the dictionary information based on a result of the dictionary term determination step.

The notation dictionary creation step includes:
A web term determining step of determining whether the term is in a normal form, another notation form, or not used based on the term described in the existing web information;
Further comprising
19. The dictionary information processing method according to claim 16, wherein the notation dictionary information is created from the terms of the Web information based on a result of the determination in the Web term determination step.

The step of creating the category dictionary includes:
A structured data category structure information creating step for creating category structure information based on existing structured data,
Further comprising
20. The dictionary information processing method according to claim 16, wherein the category dictionary information is created based on the category structure information created in the structured data category structure information creating step.

The structured data category structure information creating step includes:
If there are a plurality of root nodes in the existing structured data, add a virtual root node above the root node to create category structure information;
21. The dictionary information processing method according to claim 20, wherein:

The structured data category structure information creating step includes:
If there is a merge in the existing structured data, create a simple tree structure category structure information without merge by duplicating the corresponding partial structure to the merge,
22. The dictionary information processing method according to claim 20, wherein:

The step of creating the category dictionary includes:
A set category structure information creating step of creating category structure information having a root node as an aggregate data name and a leaf node as an aggregate element name based on the existing aggregate data;
Further comprising
23. The dictionary information processing method according to claim 16, wherein the category dictionary information is created based on the category structure information created in the set category structure information creating step.

The step of creating the category dictionary includes:
A MeSH term category structure information creating step of creating category structure information based on the MeSH term data;
Further comprising
24. The dictionary information processing method according to claim 16, wherein the category dictionary information is created based on the category structure information created in the MeSH term category structure information creating step.

The step of creating the category dictionary includes:
Based on the existing database, create category structure information where the root node is the name of the existing database or the field name of a specific field stored, and the leaf node is the database or each stored data stored in the field. Database category structure information creation step,
Further comprising
25. The dictionary information processing method according to claim 16, wherein the category dictionary information is created based on the category structure information created in the database category structure information creating step.

The step of creating the category dictionary includes:
Analysis program category structure information creating step of creating category structure information with the root node as the existing processing program name and the leaf node as the processing result data, based on the processing result data of the existing analysis program,
Further comprising
26. The dictionary information processing method according to claim 16, wherein the category dictionary information is created based on the category structure information created in the analysis program category structure information creating step.

The dictionary information check step includes:
An entry unit check step of checking the notation dictionary information and / or the category dictionary information for each entry based on at least one of a check term list, a check program, and a check pattern;
The dictionary information processing method according to any one of claims 16 to 26, further comprising:

The dictionary information check step includes:
A normal form inconsistency check step of checking whether another notation form registered in the notation dictionary information is registered as another normal form,
The dictionary information processing method according to any one of claims 16 to 27, further comprising:

The dictionary information check step includes:
Statistical processing is performed on the registration status and usage status related to the normal form, another notation form, or category registered in the notation dictionary information and / or the category dictionary information, and the result of the statistical processing falls within a predetermined normal value range. Statistics check step to check if it is in,
The dictionary information processing method according to any one of claims 16 to 28, further comprising:

The dictionary information check step includes:
A co-occurrence check step of calculating a similarity based on a co-occurrence relation regarding the normal form, another notation form, or category registered in the notation dictionary information and / or the category dictionary information;
The dictionary information processing method according to any one of claims 16 to 29, further comprising:

A notation dictionary creating step of creating notation dictionary information that defines the correspondence between the normal form of each term and another notation form;
A category dictionary creating step of creating category dictionary information defining a category to which the normal form belongs;
A dictionary information check step of checking information stored in the notation dictionary information and / or the category dictionary information;
A program for causing a computer to execute a dictionary information processing method including:

The notation dictionary creation step includes:
A field attribute determining step of determining whether each field has a normal form, a different form, or is not used, based on attribute information of each field configuring the existing database;
Further comprising
32. The program according to claim 31, wherein the notation dictionary information is created from each field of the existing database based on a determination result of the field attribute determination step.

The notation dictionary creation step includes:
Based on the terms described in the existing dictionary information, the term is in a normal form, a different notation form, or a dictionary term determining step of determining whether or not to use,
Further comprising
33. The program according to claim 31, wherein the notation dictionary information is created from the term of the dictionary information based on a result of the determination in the dictionary term determination step.

The notation dictionary creation step includes:
A web term determining step of determining whether the term is in a normal form, another notation form, or not used based on the term described in the existing web information;
Further comprising
The program according to any one of claims 31 to 33, wherein the notation dictionary information is created from the term of the Web information based on a result of the determination in the Web term determination step.

The step of creating the category dictionary includes:
A structured data category structure information creating step for creating category structure information based on existing structured data,
Further comprising
The program according to any one of claims 31 to 34, wherein the category dictionary information is created based on the category structure information created in the structured data category structure information creating step.

The structured data category structure information creating step includes:
If there are a plurality of root nodes in the existing structured data, add a virtual root node above the root node to create category structure information;
36. The program according to claim 35, wherein:

The structured data category structure information creating step includes:
If there is a merge in the existing structured data, create a simple tree structure category structure information without merge by duplicating the corresponding partial structure to the merge,
The program according to claim 35 or 36, wherein:

The step of creating the category dictionary includes:
A set category structure information creating step of creating category structure information having a root node as an aggregate data name and a leaf node as an aggregate element name based on the existing aggregate data;
Further comprising
The program according to any one of claims 31 to 37, wherein the category dictionary information is created based on the category structure information created in the set category structure information creating step.

The step of creating the category dictionary includes:
A MeSH term category structure information creating step of creating category structure information based on the MeSH term data;
Further comprising
The program according to any one of claims 31 to 38, wherein the category dictionary information is created based on the category structure information created in the MeSH term category structure information creating step.

The step of creating the category dictionary includes:
Based on the existing database, create category structure information where the root node is the name of the existing database or the field name of a specific field stored, and the leaf node is the database or each stored data stored in the field. Database category structure information creation step,
Further comprising
The program according to any one of claims 31 to 39, wherein the category dictionary information is created based on the category structure information created in the database category structure information creating step.

The step of creating the category dictionary includes:
Analysis program category structure information creating step of creating category structure information with the root node as the existing processing program name and the leaf node as the processing result data, based on the processing result data of the existing analysis program,
Further comprising
41. The program according to claim 31, wherein the category dictionary information is created based on the category structure information created in the analysis program category structure information creating step.

The dictionary information check step includes:
An entry unit check step of checking the notation dictionary information and / or the category dictionary information for each entry based on at least one of a check term list, a check program, and a check pattern;
The program according to any one of claims 31 to 41, further comprising:

The dictionary information check step includes:
A normal form inconsistency check step of checking whether another notation form registered in the notation dictionary information is registered as another normal form,
The program according to any one of claims 31 to 42, further comprising:

The dictionary information check step includes:
Statistical processing is performed on the registration status and usage status related to the normal form, another notation form, or category registered in the notation dictionary information and / or the category dictionary information, and the result of the statistical processing falls within a predetermined normal value range. Statistics check step to check if it is in,
The program according to any one of claims 31 to 43, further comprising:

The dictionary information check step includes:
A co-occurrence check step of calculating a similarity based on a co-occurrence relation regarding the normal form, another notation form, or category registered in the notation dictionary information and / or the category dictionary information;
The program according to any one of claims 31 to 44, further comprising:

A computer-readable recording medium having recorded thereon the program according to any one of claims 31 to 45.