JP2007026155A

JP2007026155A - Distributed development system of word information

Info

Publication number: JP2007026155A
Application number: JP2005208238A
Authority: JP
Inventors: Yuichi Kojima; 裕一小島
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 2005-07-19
Filing date: 2005-07-19
Publication date: 2007-02-01

Abstract

<P>PROBLEM TO BE SOLVED: To provide a distributed development system of word information capable of providing a flexible dictionary development environment. <P>SOLUTION: A unique word ID is generated for each word, and with it as a key, information individually required for each application accompanying the word such as a reading, a translation and word constitution is managed. Thus, the word can be registered even without filling the entire information of the word. Also, since the word information can be shared through a network, the base of a distributed user dictionary development environment which can be utilized in common by the plurality of applications is provided. <P>COPYRIGHT: (C)2007,JPO&INPIT

Description

本発明は、入力文書を言語解析し、解析結果を所定の目的のために加工して出力する言語解析システムと、ネットワークを介して前記言語解析システムと接続し、当該言語解析システムに対して単語情報を提供する単語情報サーバからなる単語情報の分散開発システムに関する。 The present invention relates to a language analysis system that performs language analysis on an input document, processes and outputs an analysis result for a predetermined purpose, and connects to the language analysis system via a network. The present invention relates to a word information distributed development system including word information servers that provide information.

コンピュータシステムにおいて、ユーザが入力した入力文書などを処理する言語解析処理を自分なりにアレンジして用いるしくみとしてユーザ辞書がある。 In a computer system, there is a user dictionary as a mechanism for arranging and using language analysis processing for processing input documents input by a user.

このユーザ辞書では、通常、ユーザが個人の好みや利用形態に応じて語彙を追加する。 In this user dictionary, the user usually adds vocabulary according to personal preferences and usage patterns.

しかしながら、ひとりでの辞書開発には限界があり、たとえばユーザＡが追加した語彙をユーザＢもユーザＣも使いたいという要望は存在する。 However, there is a limit to the development of a dictionary alone, and for example, there is a desire for both user B and user C to use the vocabulary added by user A.

これらに対応するため、辞書のフォーマットが広く公開され、そのフォーマットに沿ったテキスト形式のデータなどがネットワーク上では公開されている。例えば、いわゆるｐｄｉｃ形式などである。 In order to deal with these problems, the dictionary format is widely disclosed, and text data and the like in accordance with the format are disclosed on the network. For example, the so-called pdic format.

また、特許文献１では、辞書の開発を会員制で実施し、それぞれの会員の活躍の度合いなどに応じた特典を与えることによって開発のモチベーションアップを図っている。
特開２００３−１７３４０９ Further, in Patent Document 1, the development of a dictionary is carried out by a membership system, and the development is motivated by giving privileges according to the degree of activity of each member.
JP 2003-173409 A

しかしながら、これら従来技術では、特定のアプリケーションに向けた辞書情報の共有を考慮しているのみであり、たとえばアプリケーションＸに向けた辞書の語彙追加が、わずかな情報の追加でアプリケーションＹでも利用できるようなケースに対応することは困難である。また、この結果、ひとつの辞書開発環境に広範囲の自主的な辞書開発者をひきつけることが困難であった。 However, these conventional techniques only consider sharing dictionary information for a specific application. For example, dictionary vocabulary addition for application X can be used by application Y with a slight addition of information. It is difficult to deal with a special case. As a result, it was difficult to attract a wide range of independent dictionary developers to one dictionary development environment.

本発明は、かかる実情に鑑みてなされたものであり、柔軟な辞書開発環境を提供できる単語情報の分散開発システムを提供すること目的とする。 The present invention has been made in view of such circumstances, and an object thereof is to provide a distributed development system for word information that can provide a flexible dictionary development environment.

本発明は、入力文書を言語解析し、解析結果を所定の目的のために加工して出力する言語解析システムと、ネットワークを介して前記言語解析システムと接続し、当該言語解析システムに対して単語情報を提供する単語情報サーバからなる単語情報の分散開発システムにおいて、前記言語解析システムは、文書情報を入力し、形態素解析を行なう形態素解析処理手段と、前記形態素解析処理手段に利用され、単語ごとに、少なくとも単語ＩＤ、表記、品詞情報、および、複数の単語付随情報を保持する単語辞書手段と、前記形態素解析処理手段から受け取った単語ＩＤ列を用いて単語ごとに取得した表記、品詞情報、および、単語付随情報を用いて、出力情報を生成する出力情報生成手段と、単語辞書に対する単語の登録、削除、および、更新を行なう辞書編集手段と、前記辞書編集手段から呼び出され、前記単語情報サーバに対して表記や品詞情報、単語ＩＤ、および、単語付随情報を組み合わせた検索要求を送り、条件に一致した単語の情報を、当該単語情報サーバより取得する単語情報取得手段と、前記辞書編集手段から呼び出され、前記単語情報サーバから単語ＩＤを取得する単語ＩＤ取得手段と、前記辞書編集部から呼び出され、前記単語情報サーバに対して、単語ＩＤと関連づけて表記、品詞情報、および、単語付随情報を送る単語情報登録部を備え、前記単語情報サーバは、単語ごとの情報として、単語ＩＤ、表記、品詞情報、複数の単語付随情報を保持する単語データベース手段と、前記言語解析システムからの要求に応じて新規に単語ＩＤを発効する単語ＩＤ付与手段と、前記言語解析システムからの要求に応じて、前記単語データベース手段へ、単語ＩＤをキーとした表記、品詞情報、単語付随情報の登録、削除および更新、および、単語ＩＤ、表記、品詞情報、および、単語付随情報の組み合わせを検索条件とした検索を行なう単語データベース管理手段を備えるようにしたものである。 The present invention relates to a language analysis system that performs language analysis on an input document, processes and outputs an analysis result for a predetermined purpose, and connects to the language analysis system via a network. In a distributed development system of word information comprising word information servers that provide information, the language analysis system is used for morpheme analysis processing means for inputting document information and performing morpheme analysis, and for each morpheme analysis processing means. And word dictionary means for holding at least word ID, notation, part of speech information, and a plurality of word associated information, and notation acquired for each word using the word ID string received from the morpheme analysis processing means, part of speech information, And output information generation means for generating output information using word-accompanying information, and registration, deletion, and update of words in the word dictionary. A dictionary editing unit that performs a search, sends a search request that combines notation, part-of-speech information, word ID, and word incidental information to the word information server, and information on words that match the condition Is obtained from the word information server, the word editing unit is called from the word editing server, the word ID obtaining unit is obtained from the word information server, and is called from the dictionary editing unit. A word information registration unit that sends notation, part-of-speech information, and word-accompanying information in association with a word ID to the server, and the word information server includes word ID, notation, part-of-speech information, plural A word database means for storing the word-accompanying information and a word ID that is newly activated in response to a request from the language analysis system In response to a request from the language analysis system, the word database means, the notation using the word ID as a key, part-of-speech information, registration, deletion and update of word-associated information, and word ID, notation, part-of-speech information And a word database management means for performing a search using a combination of word accompanying information as a search condition.

また、前記言語解析システムは、前記単語付随情報として、単語ＩＤで示される語の同義語の単語ＩＤを有するものである。 The language analysis system has a word ID of a synonym of a word indicated by a word ID as the word-accompanying information.

また、前記言語解析システムは、前記単語付随情報として、単語ＩＤで示される語の語構成の単語ＩＤを有するものである。 Further, the language analysis system has a word ID of a word structure of a word indicated by a word ID as the word accompanying information.

また、前記言語解析システムは、あらかじめ定められたクリエータＩＤのリストを保持したクリエータＩＤリスト保持手段と、あらかじめ定められた時間ごとに前記単語情報取得手段を呼び出し、クリエータＩＤリストを用いて単語情報の取得を行い、その結果を前記辞書編集手段に送る自動交信手段をさらに備え、前記辞書編集手段は前記自動更新手段から受け取った単語情報の単語辞書への登録を行なうようにしたものである。 In addition, the language analysis system calls a creator ID list holding unit that holds a list of predetermined creator IDs, and calls the word information acquisition unit every predetermined time, and uses the creator ID list to store the word information. The system further comprises automatic communication means for obtaining the result and sending the result to the dictionary editing means, and the dictionary editing means registers the word information received from the automatic updating means in the word dictionary.

また、前記単語情報サーバは、単語ごとの情報として保持する表記、品詞情報、および、複数の単語付随情報のそれぞれに、さらにクリエータＩＤを設定し、前記言語解析システムからの要求に応じて新規にクリエータＩＤを発行するクリエータＩＤ付与手段を、さらに備えたものである。 The word information server further sets a creator ID for each of the notation, part-of-speech information, and a plurality of word-accompanying information held as information for each word, and newly responds to a request from the language analysis system. Creator ID giving means for issuing a creator ID is further provided.

したがって、本発明によれば、各単語ごとにユニークな単語ＩＤを発生させ、これをキーとして、ふりがな、訳語、語構成など単語に付随し、アプリケーションごとに個別に必要とされる情報を管理しているので、単語の全情報を埋めずとも単語の登録を可能であり、また、ネットワークを介して単語情報を共有可能としているので、これらにより複数のアプリケーションで共通に利用できる分散ユーザ辞書開発環境の基盤が提供できるという効果を得る。 Therefore, according to the present invention, a unique word ID is generated for each word, and this is used as a key to manage information that is attached to the word, such as a phonetic, a translation, and a word structure, and is individually required for each application. Therefore, it is possible to register words without filling in all the word information, and since word information can be shared via a network, a distributed user dictionary development environment that can be used in common by multiple applications The effect that the base of can be provided is obtained.

以下、添付図面を参照しながら、本発明の実施の形態を詳細に説明する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings.

図１は、本発明の一実施例にかかるネットワークシステムの一例を示している。このネットワークシステムは、複数のユーザ端末装置ＴＭ１〜ＴＭｎと、単語情報を管理するための単語情報サーバ装置ＳＴ、検索情報を管理するための検索サーバ装置ＳＫ、および、検索対象となる文書を蓄積管理する文書データベース装置ＳＤが、ローカルエリアネットワークＬＡＮを介して接続されている。 FIG. 1 shows an example of a network system according to an embodiment of the present invention. This network system stores and manages a plurality of user terminal devices TM1 to TMn, a word information server device ST for managing word information, a search server device SK for managing search information, and a document to be searched. A document database device SD is connected via a local area network LAN.

また、おのおののユーザ端末装置ＴＭ１〜ＴＭｎは、検索サーバ装置ＳＫに対して検索サービスを要求するための検索クライアント、および、検索クライアントにユーザが入力した検索対象情報（文書情報、検索条件等）を、検索サーバ装置ＳＫが検索時に適用する情報に変換するための言語解析システムを備えており、この言語解析システムは、単語情報サーバ装置ＳＴから種々の情報を取得する。 Each of the user terminal devices TM1 to TMn receives a search client for requesting a search service from the search server device SK, and search target information (document information, search conditions, etc.) input by the user to the search client. The search server device SK includes a language analysis system for converting into information applied at the time of search, and the language analysis system acquires various information from the word information server device ST.

図２は、言語解析システムの構成の一例を示している。 FIG. 2 shows an example of the configuration of the language analysis system.

同図において、入力文書バッファ１は、ユーザが検索クライアントに入力した文字列を保存し、形態素解析処理部２へ出力するためのものである。形態素解析処理部２は、単語辞書部３を参照し、入力された文書情報を構成する単語に分割し、単語ＩＤの列に変換するためのものである。ここで、形態素解析方法は、周知のものを適用することができる。例えば、対象文字列に含まれる自立語数が最小になる単語分割結果を採用する手法を適用することができる。この単語分割の結果得られた単語列は、それぞれ単語ＩＤを持つため、形態素解析結果（単語列）は、単語ＩＤの列に置き換えることができる。 In the figure, an input document buffer 1 stores a character string input by a user to a search client and outputs it to a morpheme analysis processing unit 2. The morpheme analysis processing unit 2 refers to the word dictionary unit 3 and divides the input document information into words constituting the document information and converts them into word ID strings. Here, a well-known morphological analysis method can be applied. For example, it is possible to apply a method that employs a word division result that minimizes the number of independent words included in the target character string. Since each word string obtained as a result of this word division has a word ID, the morphological analysis result (word string) can be replaced with a word ID string.

単語辞書部３は、単語ごとに少なくとも単語ＩＤ（単語ごとにユニークに付与された番号）、表記、品詞情報、および、複数の単語付随情報（ふりがなや対応する訳語など当該単語に付随した情報）を保持するものである。 The word dictionary unit 3 includes at least a word ID (a number uniquely assigned to each word), notation, part-of-speech information, and a plurality of word-accompanying information (information associated with the word such as a furigana and a corresponding translation) Is to hold.

出力情報生成部４は、形態素解析処理部２から受け取った単語ＩＤ列を用いて単語ごとに取得した表記、品詞情報、および、単語付随情報を用いて、出力情報を生成するためのものであり、その出力情報は、出力情報バッファ５を介して、検索クライアントへ出力される。 The output information generation unit 4 is for generating output information using the notation, part-of-speech information, and word associated information acquired for each word using the word ID string received from the morphological analysis processing unit 2. The output information is output to the search client via the output information buffer 5.

辞書編集部６は、単語辞書部３に対して、単語の登録、削除、更新を行なうためのものであり、単語ＩＤ取得部７は、辞書編集部６から呼び出され、ローカルエリアネットワークＬＡＮを介して、単語情報サーバ装置ＳＴから単語ＩＤを取得するためのものである。 The dictionary editing unit 6 is for registering, deleting, and updating words in the word dictionary unit 3, and the word ID acquisition unit 7 is called from the dictionary editing unit 6 and is connected via the local area network LAN. Thus, the word ID is obtained from the word information server device ST.

単語情報登録部８は、辞書編集部３から呼び出され、ローカルエリアネットワークＬＡＮを介し、単語情報サーバ装置ＳＴに対して、単語ＩＤと関連づけて表記、品詞情報、および、単語付随情報を送るためのものである。 The word information registration unit 8 is called from the dictionary editing unit 3 and transmits notation, part-of-speech information, and word associated information to the word information server device ST in association with the word ID via the local area network LAN. Is.

単語情報取得部９は、自動更新部１０により起動され、ローカルエリアネットワークＬＡＮを介し、単語情報サーバ装置ＳＴから、指定したクリエータＩＤリストに含まれるクリエータＩＤが登録されている単語情報を取得するためのものである。また、単語情報取得部９は、新しい単語ＩＤの発行を、ローカルエリアネットワークＬＡＮを介して単語情報サーバ装置ＳＴへ送り、単語情報サーバ装置ＳＴから得られた単語ＩＤを取得するためのものである。 The word information acquisition unit 9 is activated by the automatic update unit 10 and acquires word information in which creator IDs included in the specified creator ID list are registered from the word information server device ST via the local area network LAN. belongs to. Further, the word information acquisition unit 9 is for issuing a new word ID to the word information server device ST via the local area network LAN and acquiring the word ID obtained from the word information server device ST. .

自動更新部１０は、所定時間周期で、クリエータＩＤリスト保持部１１に保持されるクリエータＩＤを持つ単語の情報を単語情報取得部９を用いて取得する。また、その単語情報のうち、語構成および同義語の情報を保持する単語の情報を単語辞書部３に格納する。 The automatic updating unit 10 acquires information on a word having a creator ID held in the creator ID list holding unit 11 by using the word information acquisition unit 9 at a predetermined time period. In addition, of the word information, word information that holds information on the word structure and synonyms is stored in the word dictionary unit 3.

クリエータＩＤリスト保持部１１は、おのおののユーザにユニークに割り当てられているクリエータＩＤの一覧を保持するためのものであり、ユーザは、ユーザインタフェースを介し、クリエータＩＤリスト保持部１１に任意のクリエータＩＤを保存することができる。 The creator ID list holding unit 11 is for holding a list of creator IDs uniquely assigned to each user, and the user can input any creator ID to the creator ID list holding unit 11 via the user interface. Can be saved.

通信制御部１２は、単語ＩＤ取得部７、単語情報登録部８、および、単語情報取得部９が、ローカルエリアネットワークＬＡＮを介して行う単語情報サーバ装置ＳＴとの間の通信を司るためのものである。 The communication control unit 12 is for the word ID acquisition unit 7, the word information registration unit 8, and the word information acquisition unit 9 to manage communication with the word information server device ST performed via the local area network LAN. It is.

図３は、単語情報サーバ装置ＳＴの構成の一例を示している。 FIG. 3 shows an example of the configuration of the word information server device ST.

同図において、単語データベース部２１は、所定形式で、単語ごとの情報として、単語ＩＤ、表記、品詞情報、および、複数の単語付随情報を保持するためのものであり、単語データベース管理部２２は、言語解析システムからの要求に応じて、単語データベース部２１へ、単語ＩＤをキーとした表記、品詞情報、および、単語付随情報の登録、削除および更新、および、単語ＩＤ、表記、品詞情報、および、単語付随情報の組み合わせを検索条件とした検索を行なうためのものである。 In the figure, a word database unit 21 is for holding a word ID, notation, part-of-speech information, and a plurality of word-accompanying information as information for each word in a predetermined format. In response to a request from the language analysis system, the word database unit 21 is registered with the word ID as a key, part-of-speech information, and registration, deletion and update of word-accompanying information, and word ID, notation, part-of-speech information, In addition, the search is performed using a combination of word-accompanying information as a search condition.

単語ＩＤ付与部２３は、ネットワークを通して単語ＩＤの発行要求を受け、新しい単語ＩＤを作成し、これを発行要求元へ応答するものである。例えば、要求を受け取ると、初期値１から順にカウントアップして単語ＩＤを生成する。 The word ID assigning unit 23 receives a word ID issue request through the network, creates a new word ID, and responds to the issue request source. For example, when a request is received, a word ID is generated by counting up from the initial value 1 in order.

また、クリエータＩＤ付与部２４は、ネットワークを通してクリエータＩＤの発行要求を受け、新しいクリエータＩＤを作成し、これを発行要求元へ応答するものである。例えば、要求を受け取ると、初期値１から順にカウントアップして、クリエータＩＤを形成する。 The creator ID assigning unit 24 receives a request for issuing a creator ID through the network, creates a new creator ID, and responds to the issue request source. For example, when a request is received, the creator ID is formed by counting up from the initial value 1 in order.

単語データベース管理部は、表２に示すような単語ＩＤ、クリエータＩＤ、プロパティ名、プロパティ値の４組からなるデータを管理し、通常のデータベース管理と同様にデータベースの内容の削除・更新・検索機能を提供する。この動作は一般のデータベースと同じである。 The word database management unit manages data consisting of four sets of word IDs, creator IDs, property names, and property values as shown in Table 2, and functions for deleting / updating / retrieving database contents in the same way as normal database management I will provide a. This operation is the same as a general database.

通信制御部２６は、単語データベース管理部２２、単語ＩＤ付与部２３、および、クリエータＩＤ付与部２４が、ローカルエリアネットワークＬＡＮを介して行う、他の端末との間の通信を司るためのものである。 The communication control unit 26 is for the word database management unit 22, the word ID assigning unit 23, and the creator ID assigning unit 24 to manage communication with other terminals performed via the local area network LAN. is there.

また、ユーザ情報管理部２５へのユーザ情報の登録は、クリエータＩＤ付与部２４が行うとともに、図示しない管理者ユーザからの指令により行うことができる。 Registration of user information in the user information management unit 25 is performed by the creator ID adding unit 24 and can be performed by a command from an administrator user (not shown).

また、図４は、単語辞書の構成の一例を示している。この単語辞書は、単語辞書部３、および、単語データベース部２１に登録される。通常、単語登録部３には、当該言語解析システムが含まれるユーザ端末装置ＴＭ１〜ＴＭｎを使用するユーザが登録した単語についての情報のみが登録され、また、単語情報サーバ装置ＳＴの単語データベース部２１には、全てのユーザが登録した単語情報が登録される。 FIG. 4 shows an example of the configuration of the word dictionary. This word dictionary is registered in the word dictionary unit 3 and the word database unit 21. Usually, only the information about the word registered by the user using the user terminal devices TM1 to TMn including the language analysis system is registered in the word registration unit 3, and the word database unit 21 of the word information server device ST. The word information registered by all users is registered.

この単語辞書に登録される１つの単語情報は、それぞれの単語を識別するための単語ＩＤ、当該単語を登録したユーザをあらわすクリエータＩＤ、当該単語を構成する文字列をあらわす表記、当該単語の品詞、当該単語について同義語がある場合にはその同義語の単語ＩＤをあらわす同義語単語ＩＤ、および、当該単語が複数の他の単語の表記を並べたものである場合にはその語を構成する他の単語の単語ＩＤを、表記の順序に並べてなる語構成単語ＩＤからなる。 One word information registered in the word dictionary includes a word ID for identifying each word, a creator ID representing a user who has registered the word, a notation representing a character string constituting the word, and a part of speech of the word. If there is a synonym for the word, the synonym word ID that represents the word ID of the synonym, and if the word is a combination of notations of a plurality of other words, constitute the word It consists of word constituent word IDs in which word IDs of other words are arranged in the order of description.

以上の構成で、いずれかのユーザ端末装置ＴＭ１〜ＴＭｎのユーザが、文書データベース装置ＳＤに登録されている文書を、自端末の検索クライアントを利用して検索する際、例えば、ユーザから示された検索条件が、「プロレス」かつ「東京」であった場合、検索クライアントは「プロレス」を言語解析システムに渡し、｛プロ，レスリング｝と｛プロレス｝の２つの単語列を得る。また、「東京」を言語解析システムに渡し｛東京｝を得る。 With the above configuration, when a user of any one of the user terminal devices TM1 to TMn searches for a document registered in the document database device SD using the search client of the own terminal, for example, the user indicates When the search condition is “pro-wrestling” and “Tokyo”, the search client passes “pro-wrestling” to the language analysis system, and obtains two word strings {pro, wrestling} and {pro-wrestling}. Also, “Tokyo” is passed to the language analysis system to obtain {Tokyo}.

ここで、検索サーバ装置ＳＫで利用可能な演算子としてＡＮＤ演算子、ＯＲ演算子、ＮＯＴ演算子、および任意の１単語あるいは０単語にマッチする「？」が利用可能とする。 Here, as operators that can be used in the search server device SK, an AND operator, an OR operator, a NOT operator, and “?” That matches any one word or zero word can be used.

この場合は、検索クライアントは前述の単語列から、
（（プロ？？レスリング）ＯＲプロレス）ＡＮＤ東京
の検索式を生成し、これを検索サーバ装置ＳＫに渡す。 In this case, the search client uses the above word string,
((Pro ?? Wrestling) OR Pro Wrestling) A search expression for AND Tokyo is generated and passed to the search server device SK.

この結果、検索サーバ装置ＳＫは、渡された検索式に基づいて文書データベース装置ＳＤを検索し、例えば、「これがプロのレスリングだ in 東京ドーム」のような文書がヒットする。 As a result, the search server device SK searches the document database device SD based on the passed search expression, and hits a document such as “This is a professional wrestling in Tokyo Dome”.

ここで、言語解析システムでは、「プロレスリング」が入力文書バッファ１に格納された場合、形態素解析処理部２は、「プロ／スリング」「プロレス／リング」「プロレスリング」の３つの単語分割結果を比較し、自立語数最小の観点から、「プロレスリング」を選択する。従って、出力情報生成部４には、単語ＩＤ「１２３４５６７」（図４参照）が渡される。 Here, in the language analysis system, when “pro-wrestling” is stored in the input document buffer 1, the morpheme analysis processing unit 2 performs the three word division results of “pro / sling”, “pro-wrestling / ring”, and “pro-wrestling”. And select “Pro Wrestling” from the viewpoint of minimizing the number of independent words. Accordingly, the word ID “1234567” (see FIG. 4) is passed to the output information generation unit 4.

出力情報生成部４は、渡された単語ＩＤの列を用いて、まず可能な同義語の組み合わせを複数の単語ＩＤの列として生成する。仮に「プロレスリングプロレスリング」が解析された単語ＩＤ列｛１２３４５６７，１２３４５６７｝が出力情報生成部４に渡された場合、複数の単語ＩＤの列としては、｛１２３４５６７，１２３４５６７｝，｛１２３４５６７，２３４５６７３｝，｛２３４５６７３，１２３４５６７｝，｛２３４５６７３，２３４５６７３｝がそれぞれ得られる。 The output information generation unit 4 first generates possible synonym combinations as a sequence of a plurality of word IDs using the passed sequence of word IDs. If a word ID sequence {1234567, 1234567} obtained by analyzing “pro-wrestling pro-wrestling” is passed to the output information generation unit 4, {1234567, 1234567}, {1234567, 2345673, }, {2345673, 1234567}, {2345673, 2345673} are obtained respectively.

次に、それぞれの単語について、語構成に置き換えられるものを、語構成に置き換える。この場合、前述の例は次のようになる。｛１２７４５６７，０００６７１２，１２７４５６７，０００６７１２｝，｛１２７４５６７，０００６７１２，２３４５６７３｝，｛２３４５６７３，１２７４５６７，０００６７１２｝，｛２３４５６７３，２３４５６７３｝。 Next, for each word, what is replaced with the word structure is replaced with the word structure. In this case, the above example is as follows. {1274567, 0006712, 1274567, 0006712}, {127567, 0006712, 2345673}, {2345673, 1274567, 0006712}, {2345673, 2345673}.

最後に出力情報生成部４は、それぞれの単語ＩＤを用いて表記の情報を取得し、取得した表記列を出力情報バッファ５に格納する。 Finally, the output information generation unit 4 acquires notation information using each word ID, and stores the acquired notation string in the output information buffer 5.

この場合の処理の一例を図５に示す。 An example of processing in this case is shown in FIG.

まず、ユーザが単語Ｗ（ｓｔａｒｔ）．．．Ｗ（ｅｎｄ）および演算子Ｏ（ｓｔａｒｔ）．．．Ｏ（ｅｎｄ）からなる検索条件を検索クライアントに入力すると（処理１０１）、検索クライアントはこの検索条件に含まれる単語Ｗ（ｓｔａｒｔ）．．．Ｗ（ｉ）．．．Ｗ（ｅｎｄ）をそれぞれ言語解析システムに送る。 First, the user moves the word W (start). . . W (end) and operator O (start). . . When a search condition consisting of O (end) is input to the search client (process 101), the search client uses the word W (start). . . W (i). . . Each W (end) is sent to the language analysis system.

言語解析システムは、個別の単語Ｗ（ｉ）について形態素解析を行い、単語列Ｍ（ｉ，ｓｔａｒｔ）．．Ｍ（ｉ，ｊ）．．．Ｍ（ｉ，ｅｎｄ）を得る（処理１０２，１０３）。また、Ｍ（ｉ，ｊ）は同義語表記群への置き換えプロセスにより、複数の表記群ＳＷ（ｉ，ｊ，ｎ，ｍ）に置き換えられる（処理１０４）。 The language analysis system performs morphological analysis on individual words W (i), and generates a word string M (i, start). . M (i, j). . . M (i, end) is obtained (processing 102, 103). Further, M (i, j) is replaced with a plurality of notation groups SW (i, j, n, m) by the replacement process with the synonym notation group (process 104).

言語解析システムは、このＳＷ（ｉ，ｊ，ｎ，ｍ）を検索クライアントに送り返し、検索クライアントでは、この表記群を用いて検索条件を構成するＷ（ｉ）を｛（ＳＷ（ｉ，ｊ，ｓｔａｒｔ，ｓｔａｒｔ）？？ＳＷ（ｉ，ｊ，ｓｔａｒｔ，１）？？．．．）ＯＲ（ＳＷ（ｉ，ｊ，１，ｓｔａｒｔ）？？ＳＷ（ｉ，ｊ，１，１）？？．．．）ＯＲ．．．（ＳＷ（ｉ，ｊ，ｅｎｄ，ｓｔａｒｔ）？？ＳＷ（ｉ，ｊ，ｅｎｄ，ｅｎｄ））｝なる形に書き換える（処理１０５）。 The language analysis system sends this SW (i, j, n, m) back to the search client, and the search client uses this notation group to change W (i) constituting the search condition {(SW (i, j, start (start, start) ??? SW (i, j, start, 1) ???) OR (SW (i, j, 1, start) ??? SW (i, j, 1,1) ??? .) OR. . . (SW (i, j, end, start) ?? SW (i, j, end, end))} is rewritten (process 105).

これは、単語Ｗ（ｉ）の構成語を文字列「？？」で連結することにより近接演算の代替とし、それぞれの構成語に関して、その同義語を「ＯＲ」で連結し、同義語を構成する下位構成語についても、「？？」で連結したものである。 This replaces the proximity operation by concatenating the constituent words of the word W (i) with the character string “??”, and synonyms are formed by concatenating the synonyms with “OR” for each constituent word. The subordinate constituent words are also concatenated with “??”.

検索クライアントでは、すべての単語についてこの書き換えを行なった後（判断１０６のＮＯループ）、その書き換え結果に基づいて検索式を作成する（処理１０７）。この場合の検索式としては、（ＲＷ（ｓｔａｒｔ）Ｏ（ｓｔａｒｔ）ＲＷ（１）Ｏ（１）．．．Ｏ（ｅｎｄ）ＲＷ（ｅｎｄ））となる。 The search client performs this rewriting on all the words (NO loop of decision 106), and then creates a search expression based on the rewriting result (process 107). The retrieval formula in this case is (RW (start) O (start) RW (1) O (1)... O (end) RW (end)).

そして、その検索式を検索サーバ装置ＳＫに渡し（処理１０８）、検索サーバ装置ＳＫより検索結果を取得し（処理１０９）、検索結果をユーザに提示する（処理１１０）。 Then, the search expression is passed to the search server device SK (process 108), the search result is acquired from the search server device SK (process 109), and the search result is presented to the user (process 110).

図６は、Ｍ（ｉ，ｊ）をＳＷ（ｉ，ｊ，ｎ，ｍ）に展開する処理の一例を示している。 FIG. 6 shows an example of processing for expanding M (i, j) to SW (i, j, n, m).

まず、Ｍ（ｉ，ｊ）を示す単語ＩＤ：Ｘを受け取り（処理２０１）、Ｘの複数の同義語ＩＤ：Ｙ（ｓｔａｒｔ）．．．Ｙ（ｅｎｄ）を単語辞書より取得する（処理２０２）。 First, a word ID: X indicating M (i, j) is received (process 201), and a plurality of synonym IDs of X: Y (start). . . Y (end) is acquired from the word dictionary (process 202).

次に、それぞれのＹ（ｎ）について、さらにその語構成ＳＹ（ｎ，ｓｔａｒｔ）．．．ＳＹ（ｎ，ｅｎｄ）を単語辞書より取得する（処理２０３，２０４、判断２０５のＮＯループ）。 Next, for each Y (n), the word structure SY (n, start). . . SY (n, end) is acquired from the word dictionary (NO loop of processes 203 and 204 and determination 205).

そして、これら取得したＳＹ（ｎ，ｍ）を用いて単語辞書から表記を取得し（処理２０６）、これをＳＷ（ｎ，ｍ）として上位の処理に応答する（処理２０７）。 The notation is acquired from the word dictionary using these acquired SY (n, m) (process 206), and this is used as SW (n, m) to respond to the upper process (process 207).

さて、本実施例では、ユーザ端末装置ＴＭ１〜ＴＭｎのユーザは、単語辞書部３に登録している単語辞書を適宜に編集することができる。また、その編集結果は、単語情報サーバ装置ＳＴの単語データベース部２１の内容に反映される。 In the present embodiment, the users of the user terminal devices TM1 to TMn can appropriately edit the word dictionary registered in the word dictionary unit 3. The editing result is reflected in the contents of the word database unit 21 of the word information server device ST.

まず、ユーザによる新語の追加、語情報の追加、自動更新を例に辞書編集部６の動作を説明する。 First, the operation of the dictionary editing unit 6 will be described by taking as an example the addition of new words, addition of word information, and automatic updating by the user.

ユーザが仮に「アルティメットレスリング」なる語を追加したいとする。ユーザは辞書編集部６を操作し、一般名詞「アルティメットレスリング」および語構成情報「アルティメット」「レスリング」を登録単語として入力する。 Suppose the user wants to add the word “ultimate wrestling”. The user operates the dictionary editing unit 6 to input the general noun “ultimate wrestling” and the word composition information “ultimate” “wrestling” as registered words.

まず、辞書編集部６は「アルティメットレスリング」「アルティメット」「レスリング」を単語情報取得部９に渡す。これにより、単語情報取得部９は、単語情報サーバ装置よりそれぞれの単語情報を得る。この場合、それぞれ「アルティメットレスリング−該当単語なし」「レスリング−０００６７１２」「アルティメット−４４４５５６６」が得られる。 First, the dictionary editing unit 6 passes “ultimate wrestling”, “ultimate”, and “wrestling” to the word information acquisition unit 9. Thereby, the word information acquisition part 9 acquires each word information from a word information server apparatus. In this case, “ultimate wrestling—no corresponding word”, “wrestling-0006712”, and “ultimate-44445566” are obtained.

次いで、辞書編集部６は、単語ＩＤ：０００６７１２，４４４５５６６を順に用いて単語情報取得部９を呼び出し、これをユーザに提示する。ユーザはこれを用いて、「アルティメットレスリング」の分割結果である「アルティメット−４４４５５６６」が正しい構成単語か否かを判断し、正しければ「ＯＫ」を、正しくなければ次候補の提示要求を辞書編集部に送る。次候補がなければ、後述する新語の登録の扱いとなる。 Next, the dictionary editing unit 6 calls the word information acquisition unit 9 using the word IDs: 0006712, 4445566 in order, and presents this to the user. The user uses this to determine whether or not “Ultimate-44444566”, which is the division result of “Ultimate Wrestling”, is a correct constituent word. If it is correct, “OK” is determined. Send to department. If there is no next candidate, it will be handled as new word registration described later.

「アルティメット」が決定すると、同様の手順を「レスリング」についても適用し、すべての語構成を取得する。ここでは前述の語構成４４４５５６６−０００６７１２は正しかったものとする。 When “ultimate” is determined, the same procedure is applied to “wrestling” to acquire all word configurations. Here, it is assumed that the above-mentioned word structure 444556-0006712 was correct.

次に辞書編集部６は、さらに語構成情報の単語ＩＤ：０００６７１２と４４４５５６６を単語情報取得部９に渡し、これらを語構成単語IDとして保持する単語を検索するが、ここでは前述の２つの単語ＩＤを同時に語構成として保持する単語は存在しない。 Next, the dictionary editing unit 6 further passes the word IDs 0006712 and 4445566 of the word configuration information to the word information acquisition unit 9 and searches for words that hold these as word configuration word IDs. Here, the two words described above are used. There is no word that holds ID as a word structure at the same time.

この結果、辞書編集部６は、「アルティメット・レスリング」や「レスリングアルティメット」など同義語の可能性のある語は存在しないと判断し、「アルティメットレスリング」をまったくの新語として扱う。具体的には単語ＩＤ取得部７を呼び出し、単語情報サーバ装置ＳＴより新たな単語ＩＤを取得する。そして、得られた単語ＩＤ（たとえば９９３４１２１）をアルティメットレスリングに付与する。 As a result, the dictionary editing unit 6 determines that there is no word that may have a synonym such as “ultimate wrestling” or “wrestling ultimate”, and treats “ultimate wrestling” as a completely new word. Specifically, the word ID acquisition unit 7 is called to acquire a new word ID from the word information server device ST. Then, the obtained word ID (for example, 9934121) is assigned to the ultimate wrestling.

また、辞書編集部６は、９９３４１２１と表記「アルティメットレスリング」、品詞「一般名詞」、語構成「０００６７１２，４４４５５６６」を単語情報登録部に渡し、単語情報登録部は、これをネットワーク上の単語情報サーバに単語登録要求として渡す。単語情報登録部は、また、固有のクリエータＩＤを管理し、このクリエータＩＤの初期値は「−１」である。 Further, the dictionary editing unit 6 passes 9934121 and the notation “ultimate wrestling”, the part of speech “general noun”, and the word configuration “0006712, 4455566” to the word information registration unit, and the word information registration unit passes this to the word information on the network. Pass it to the server as a word registration request. The word information registration unit also manages a unique creator ID, and the initial value of this creator ID is “−1”.

クリエータＩＤが「−１」の場合、単語情報登録部８は単語情報サーバ装置ＳＴにクリエータＩＤの発行要求を出し、取得したクリエータＩＤによって初期値を置き換える。単語登録要求には、常にクリエータＩＤを添付する。クリエータＩＤが「Ｃ１１２４３」であるとすれば、単語登録要求は図７（ａ）のようになる。 When the creator ID is “−1”, the word information registration unit 8 issues a creator ID issuance request to the word information server device ST, and replaces the initial value with the acquired creator ID. A creator ID is always attached to a word registration request. If the creator ID is “C11243”, the word registration request is as shown in FIG.

ユーザが仮に既登録単語である「ウルティメット」なる語と、既登録単語である「アルティメット」なる語は同義語であると追加したい場合、ユーザは辞書編集部６に「アルティメット」および同義語情報「ウルティメット」を追加情報として入力する。 If the user wants to add the word “Ultimate”, which is a registered word, and the word “Ultimate”, which is a registered word, the user wants to add “Ultimate” and synonym information “ Enter "Ultimate" as additional information.

「アルティメット」、および、「ウルティメット」の両者の単語情報が取得できた場合、辞書編集部６は図７（ｂ）に示したような単語情報追加要求を発行する。なお、ここでは異なったクリエータが追加を実施したとし、新たなクリエータＩＤを付与した。 When the word information of both “ultimate” and “ultimate” can be acquired, the dictionary editing unit 6 issues a word information addition request as shown in FIG. In this case, it is assumed that different creators add the new creator ID.

一方、単語辞書部３の自動更新時には、自動更新部１０は、クリエータＩＤリスト保持部１１を利用する。 On the other hand, when the word dictionary unit 3 is automatically updated, the automatic update unit 10 uses the creator ID list holding unit 11.

ユーザは、あらかじめクリエータＩＤリスト保持部１１に、自分と同じ観点の興味分野を持つクリエータ（ユーザ）のクリエータＩＤや、信用のおける単語情報を作成するクリエータのクリエータＩＤを登録している。 The user registers in advance in the creator ID list holding unit 11 the creator ID of the creator (user) who has the same field of interest as that of himself or the creator ID of the creator who creates reliable word information.

クリエータＩＤリスト保持部１１に保持されるクリエータＩＤリストは、単なるクリエータＩＤの羅列であり、例えば、｛Ｃ１１２４３，Ｃ１７２１２，Ｃ００００１，Ｃ００００７｝という内容を持つ。 The creator ID list held in the creator ID list holding unit 11 is simply a list of creator IDs, and has, for example, the contents {C11243, C17212, C00001, C00007}.

自動更新時、自動更新部１０は、クリエータＩＤリスト保持部１１に保持されているクリエータＩＤ「Ｃ１１２４３」あるいは「Ｃ１７２１２」あるいは「Ｃ００００１」あるいは「Ｃ００００７」を持つ単語情報を単語情報取得部９を用いて、単語情報サーバ装置ＳＴから検索し、得られた単語ＩＤ群のうち、プロパティ名として「語構成」「同義語」を持つ単語ＩＤのみを選択し、これらを検索結果として単語情報を辞書編集部３に送る。なお、同義語や語構成のない単語については値「ＮＵＬＬ（空値）」が設定されているものとする。 At the time of automatic update, the automatic update unit 10 uses the word information acquisition unit 9 for word information having the creator ID “C11243”, “C17212”, “C00001”, or “C00007” held in the creator ID list holding unit 11. Then, from the word information server device ST, only word IDs having “word structure” and “synonyms” as property names are selected from the obtained word ID group, and the word information is edited as a search result. Send to part 3. It is assumed that the value “NULL (null value)” is set for words having no synonyms or word structures.

辞書編集部６では、単語ＩＤも含めたすべての情報が完備した状態で単語情報を得たため、これを単語辞書部３に登録する。ただし、重複する単語情報は登録しない。 In the dictionary editing unit 6, the word information is obtained in a state where all the information including the word ID is complete, and this is registered in the word dictionary unit 3. However, duplicate word information is not registered.

この結果、前述のクリエータＩＤリストを用いたシステムでは、図７（ｂ）に示したような「アルティメットレスリング」なるボキャブラリと、「ウルティメット」と「アルティメット」の同義語関係を利用可能になる。 As a result, in the system using the creator ID list described above, the vocabulary “ultimate wrestling” as shown in FIG. 7B and the synonym relationship between “ultimate” and “ultimate” can be used.

このようにして、ユーザは、単語情報を適宜に登録したり、登録されている単語情報を適宜に修正したり、更新することができる。 In this way, the user can register word information as appropriate, or correct or update the registered word information as appropriate.

また、各単語ごとにユニークな単語ＩＤを発生させ、これをキーとして、ふりがな、訳語、語構成など単語に付随し、アプリケーションごとに個別に必要とされる情報を管理しているので、単語の全情報を埋めずとも単語の登録を可能であり、また、ネットワークを介して単語情報を共有可能としているので、これらにより複数のアプリケーションで共通に利用できる分散ユーザ辞書開発環境の基盤が提供できる。 In addition, a unique word ID is generated for each word, and this is used as a key to accompany the word such as phonetics, translations, word structure, and so on. Words can be registered without filling in all information, and word information can be shared via a network, thereby providing a base for a distributed user dictionary development environment that can be shared by a plurality of applications.

また、単語ごとにユニークな単語ＩＤの付与、これによる同義語情報の管理を可能としたことにより、同義語に関して正確な語の指定を可能とすることができる In addition, it is possible to assign a unique word ID for each word and to manage synonym information by this, so that it is possible to specify an accurate word for the synonym.

また、単語ごとにユニークな単語ＩＤの付与、これによる語構成情報の検索を可能としたことにより、同義語に関して正確な語の指定を可能とすることができる In addition, by assigning a unique word ID for each word and enabling the search of word configuration information thereby, it is possible to specify an exact word for a synonym.

また、クリエータＩＤを取り扱い可能としてことにより、特定の分野に偏った、あるいはある一定レベル以上の技量を持ったクリエータ群を指定可能とし、これにより、ユーザの好みや要望を反映した単語情報の取得を可能とすることができる。 In addition, by making it possible to handle creator IDs, it is possible to specify a group of creators that are biased to a specific field or have a certain level of skill, thereby obtaining word information that reflects user preferences and requests Can be made possible.

また、クリエータＩＤのリストを保持し、これを利用したアップデートを可能とすることにより、自動的に単語辞書を更新することが可能となる。 Further, the word dictionary can be automatically updated by maintaining a list of creator IDs and enabling update using the list.

なお、本発明は、検索システムや、翻訳システムについても、同様にして適用することができる。 It should be noted that the present invention can be similarly applied to a search system and a translation system.

本発明の一実施例にかかるネットワークシステムの一例を示したブロック図。The block diagram which showed an example of the network system concerning one Example of this invention. 言語解析システムの構成の一例を示したブロック図。The block diagram which showed an example of the structure of the language analysis system. 単語情報サーバ装置ＳＴの構成の一例を示したブロック図。The block diagram which showed an example of the structure of word information server apparatus ST. 単語辞書の構成の一例を示した概略図。Schematic which showed an example of the structure of a word dictionary. ユーザ端末装置の処理の一例を示したフローチャート。The flowchart which showed an example of the process of a user terminal device. Ｍ（ｉ，ｊ）をＳＷ（ｉ，ｊ，ｎ，ｍ）に展開する処理の一例を示したフローチャート。The flowchart which showed an example of the process which expand | deploys M (i, j) to SW (i, j, n, m). 更新情報の一例を示した概略図。Schematic which showed an example of update information.

Explanation of symbols

ＴＭ１〜ＴＭｎユーザ端末装置
ＳＴ単語サーバ装置
ＳＫ検索サーバ装置
ＳＤ文書データベース装置
１入力文書バッファ
２形態素解析処理部
３単語辞書部
４出力情報生成部
５出力情報バッファ
６辞書編集部
７単語ＩＤ取得部
８単語情報登録部
９単語情報取得部
１０自動更新部
１１クリエータＩＤリスト保持部
１２、２６通信制御部
２１単語データベース部
２２単語データベース管理部
２３単語ＩＤ付与部
２４クリエータＩＤ付与部 TM1 to TMn User terminal device ST Word server device SK Search server device SD Document database device 1 Input document buffer 2 Morphological analysis processing unit 3 Word dictionary unit 4 Output information generation unit 5 Output information buffer 6 Dictionary editing unit 7 Word ID acquisition unit 8 Word information registration part 9 Word information acquisition part 10 Automatic update part 11 Creator ID list holding part 12, 26 Communication control part 21 Word database part 22 Word database management part 23 Word ID provision part 24 Creator ID provision part

Claims

A language analysis system that analyzes the input document, processes and outputs the analysis result for a predetermined purpose, and connects to the language analysis system via a network to provide word information to the language analysis system In the distributed development system of word information consisting of word information servers,
The language analysis system includes:
Morphological analysis processing means for inputting document information and performing morphological analysis;
Word dictionary means that is used in the morphological analysis processing means and holds at least a word ID, a notation, a part-of-speech information, and a plurality of word-accompanying information for each word;
Output information generating means for generating output information using notation, part-of-speech information acquired for each word using the word ID string received from the morphological analysis processing means, and word accompanying information;
A dictionary editing means for registering, deleting, and updating words in the word dictionary;
It is called from the dictionary editing means, sends a search request that combines notation, part-of-speech information, word ID, and word incidental information to the word information server, and information on the word that matches the condition is sent from the word information server. Word information acquisition means to acquire;
A word ID acquisition unit that is called from the dictionary editing unit and acquires a word ID from the word information server;
A word information registration unit that is called from the dictionary editing unit and sends the notation, part-of-speech information, and word incidental information in association with the word ID to the word information server;
The word information server
Word database means for holding word ID, notation, part-of-speech information, and a plurality of word-accompanying information as information for each word;
Word ID giving means for newly issuing a word ID in response to a request from the language analysis system;
In response to a request from the language analysis system, to the word database means, notation using word ID as a key, part of speech information, registration of word incidental information, deletion and update, and word ID, notation, part of speech information, and A word information distributed development system comprising word database management means for performing a search using a combination of word accompanying information as a search condition.

2. The distributed development system of word information according to claim 1, wherein the language analysis system has a word ID of a synonym of a word indicated by a word ID as the word accompanying information.

2. The distributed development system of word information according to claim 1, wherein the language analysis system has a word ID of a word structure of a word indicated by a word ID as the word accompanying information.

The language analysis system includes:
Creator ID list holding means for holding a list of predetermined creator IDs;
Calling the word information acquisition means at predetermined time intervals, obtaining word information using a creator ID list, further comprising an automatic communication means for sending the result to the dictionary editing means,
4. The distributed development system for word information according to claim 1, wherein said dictionary editing means registers word information received from said automatic updating means in a word dictionary.

The word information server
Further, a creator ID is set for each of the notation, part-of-speech information, and plural word-accompanying information held as information for each word,
5. The word information according to claim 1, further comprising creator ID giving means for newly issuing a creator ID in response to a request from the language analysis system. Distributed development system.