JP3893600B1

JP3893600B1 - Base database generation method, base list generation method and apparatus, and computer program

Info

Publication number: JP3893600B1
Application number: JP2006079401A
Authority: JP
Inventors: 和雄吹谷; 介渡篠原; 文裕森谷
Original assignee: Sematics
Current assignee: Sematics
Priority date: 2006-03-22
Filing date: 2006-03-22
Publication date: 2007-03-14
Anticipated expiration: 2026-03-22
Also published as: JP2007257191A

Abstract

【課題】コンピューターを用いて文章の意味構造解析処理を実現するための意味位相概念データを生成する方法を得る。
【解決手段】単語とこの単語の意味を説明する一つ以上の語釈文とを関連付けて記憶する記憶手段を備えたコンピューターによって単語の基底データベースを生成する方法であって、記憶手段から単語を読み出すステップ、読み出した単語に関連付けて記憶されている語釈文を記憶手段から読み出すステップ、読み出した語釈文を形態素解析処理により単語分割するステップ、分割された単語が基底に該当するか否かを判断するステップ、判断ステップで基底に該当した単語を読み出した単語に対する基底として関連付けるステップ、分割した単語が基底に該当しない場合に当該単語が基底に該当するまで上記のステップを再帰的に行うステップとを有する基底データベース生成方法による。
【選択図】図６A method for generating semantic phase conceptual data for realizing semantic structure analysis processing of a sentence using a computer is obtained.
A method for generating a base database of words by a computer having storage means for storing a word and one or more word sentences explaining the meaning of the word in association with each other, and reading the word from the storage means A step, a step of reading out a word sentence stored in association with the read word from the storage means, a step of dividing the read word sentence by a morphological analysis process, and determining whether the divided word corresponds to a base A step of associating a word corresponding to the base in the determination step as a base for the read word, and a step of recursively performing the above steps until the word corresponds to the base when the divided word does not correspond to the base. According to the base database generation method.
[Selection] Figure 6

Description

本発明は、コンピューターによる文章の意味構造解析処理に用いる意味位相概念データと、意味位相概念データの生成に用いる基底リストを生成する方法及びその装置、コンピュータプログラムを提供することを目的とする。 SUMMARY OF THE INVENTION An object of the present invention is to provide a method and apparatus for generating semantic phase concept data used for semantic structure analysis processing of a sentence by a computer, and a base list used for generating semantic phase concept data, and a computer program.

近年、コンピューター技術の発達に伴い、文章解析のコンピューター処理に関する研究が盛んに行われている。コンピューターによる高精度な文章解析処理は、類似の文章を検索することができるシステム等への応用が期待されている。文章構造の解析を単語レベルのマッチング処理によるのではなく、文章そのものの意味概念の解析をコンピューターによって実現できれば、従来の検索システムによる検索結果のように、検索キーワードが単に含まれているだけでヒットしてしまうことを排除することができ、検索文章の意味に近いものだけを最適に検出することができるようになる。このようにコンピューターを用いた意味概念解析処理を検索システムに応用すると、従来の単語キーワード検索では得られなかったような検索結果を得ることが出来る。また、コンピューターを用いた意味概念検索処理を用いて、多量な文章で構成される文書（例えば論文）を所定の文字数で自動的に要約する処理を、文字数を指定するだけで、当該論文の意味合いの類似性を保ったまま行なうことができるシステムなどへの応用も期待できる。上記のように文章の解析処理をコンピューターによって行うシステムを実現することができれば産業発展に多大なる貢献を期待することができる。このため、高精度な解析技術とそれを実現するコンピュータプログラムが必要となる。 In recent years, with the development of computer technology, research on computer processing for sentence analysis has been actively conducted. High-precision sentence analysis processing by a computer is expected to be applied to a system that can search for similar sentences. If the analysis of the sentence structure is not based on word level matching processing but the analysis of the semantic concept of the sentence itself can be realized by a computer, it will be a hit only by including the search keyword as in the search result by the conventional search system. This makes it possible to eliminate only the fact that the search text is close to the meaning. By applying the semantic concept analysis process using a computer to a search system in this way, it is possible to obtain a search result that cannot be obtained by a conventional word keyword search. In addition, the process of automatically summarizing a document composed of a large number of sentences (for example, a paper) with a predetermined number of characters using a semantic concept search process using a computer can be performed by simply specifying the number of characters. Application to a system that can be performed while maintaining the similarity is also expected. If a system that performs sentence analysis processing by a computer as described above can be realized, a great contribution to industrial development can be expected. For this reason, a highly accurate analysis technique and a computer program for realizing it are necessary.

コンピューターを用いた文章解析技術に関しては従来から種々の発明があり、これら文章解析技術の中で周知技術に単語シソーラス辞書がある。単語シソーラス辞書とは、単語の同義語・類義語を定義するものであって、この単語シソーラス辞書に含まれるデータ（単語群）によって、高精度な解析結果を得ることができる。従って高精度な単語シソーラス辞書を生成することが必要になる。この単語シソーラス辞書の生成方法の一例を示す。膨大な量の文書データ（例えば数十万件もの新聞記事）に含まれる文章を解析して、この文章に含まれる単語同士の関連（同時発生度合いなど）を解析して生成するものが知られている（例えば特許文献１参照）。特許文献１に記載の発明は、単語シソーラス辞書データ生成方法において、統計的手法を用いることを特徴としている。新聞記事などの文書データに含まれる単語それぞれに対して、文書データ中に出現する頻度を計測し、各単語間で同一文書内に出現する確率（共起確率）を計算し、この確率を用いて単語間の意味的距離をベクトル演算によって算定する。この各単語間の距離に基づく構造を数値化したデータを単語シソーラス辞書とするものである。このようにして生成した単語シソーラス辞書を用いると、解析対象の文書中に含まれる単語により、統計的に近い文書を特定することが可能になる。従ってある文書に含まれる単語とその単語によって構成される文書に類似する構造を持つ別の文書をコンピューターによって判定することが可能となる。 There have been various inventions related to sentence analysis technology using a computer. Among these sentence analysis techniques, a well-known technique is a word thesaurus dictionary. The word thesaurus dictionary defines synonyms and synonyms of words, and highly accurate analysis results can be obtained from data (word groups) included in the word thesaurus dictionary. Therefore, it is necessary to generate a highly accurate word thesaurus dictionary. An example of a method for generating this word thesaurus dictionary will be shown. It is known to analyze a sentence contained in a huge amount of document data (for example, hundreds of thousands of newspaper articles) and analyze a relation (such as the degree of coincidence) between words contained in this sentence. (For example, refer to Patent Document 1). The invention described in Patent Document 1 is characterized in that a statistical method is used in a word thesaurus dictionary data generation method. For each word contained in document data such as newspaper articles, the frequency of occurrence in the document data is measured, and the probability (co-occurrence probability) that each word appears in the same document is calculated. Then, the semantic distance between words is calculated by vector calculation. Data obtained by digitizing the structure based on the distance between the words is used as a word thesaurus dictionary. When the word thesaurus dictionary generated in this way is used, it is possible to specify a document that is statistically close by a word included in the document to be analyzed. Therefore, it becomes possible for a computer to determine another word having a structure similar to a word included in a certain document and a document constituted by the word.

特開２００１−３３１５１５号公報JP 2001-331515 A

上記発明を用いて生成したデータ（単語シソーラス辞書）には、当然ながら偏差（誤差）が内在することになる。即ち、生成に使用する文書データ内に出現する単語の出現頻度にはその文書の属性によって影響を受けるため、偏差が生じるからである。このような文書の属性は新聞記事であれば、その記事の内容自体に関わるものであるので排除することは困難である。従って上記発明を用いる場合、偏差を是正するために生成するデータの正規化を行う必要がある。しかし正規化を行っても偏差を全く無くすことは不可能であり、単語の出現頻度を求める為に使用する文書データ（例えば新聞記事）の内容や関連がそのまま統計処理を行う情報の質に反映されてしまう。また、各単語の意味自体を全く考慮していない方法であるため、統計上偶発的に関連性を持った意味的に全く異なる単語同士を「関連の有る単語」として認識し、それによって単語シソーラス辞書を生成してしまうことになる。 The data (word thesaurus dictionary) generated using the above invention naturally has a deviation (error). That is, since the appearance frequency of words appearing in the document data used for generation is affected by the attribute of the document, a deviation occurs. If such a document attribute is a newspaper article, it is difficult to exclude it because it relates to the content of the article itself. Therefore, when the above invention is used, it is necessary to normalize the data to be generated in order to correct the deviation. However, it is impossible to eliminate the deviation at all even if normalization is performed, and the contents and relations of the document data (for example, newspaper articles) used to obtain the appearance frequency of the word are reflected in the quality of the information to be statistically processed as it is. Will be. In addition, since the meaning of each word itself is not considered at all, statistically accidentally related semantically different words are recognized as “related words” and thereby a word thesaurus. A dictionary will be generated.

このように上記統計的手法を用いて生成した単語シソーラス辞書を用いて、文章解析処理を行う場合、辞書データ（上記例における単語シソーラス）の生成に用いた文書データの属性・嗜好によって、文章解析の結果の精度が落ちることになる。上記手法を用いて行う文章解析処理で解析精度を向上させることは困難である。 When sentence analysis processing is performed using the word thesaurus dictionary generated using the statistical method as described above, sentence analysis is performed according to the attribute / preference of the document data used to generate dictionary data (word thesaurus in the above example). The accuracy of the result will be reduced. It is difficult to improve the analysis accuracy by the sentence analysis process performed using the above method.

そこで本発明は、統計的手法を用いた文章解析手段の欠点を解消し、解析精度に優れた文章解析処理を実現することができる意味位相概念データである基底データベースの生成方法及び、基底データベースに用いる基底リストの生成方法とその装置、プログラムを提供することを目的とする。 Therefore, the present invention eliminates the shortcomings of sentence analysis means using statistical methods, and generates a base database that is semantic phase concept data capable of realizing sentence analysis processing with excellent analysis accuracy, and a base database. It is an object of the present invention to provide a method of generating a base list to be used, an apparatus thereof, and a program.

本発明は、単語とこの単語の意味を説明する一つ以上の語釈文とを関連付けて記憶する記憶手段を備えたコンピューターにより、この単語の基底データベースを生成する方法であって、上記コンピューターが備える記憶手段から単語を読み出すステップと、読み出した単語に関連付けて記憶されている語釈文を上記記憶手段から読み出すステップと、読み出した語釈文を形態素解析処理により単語分割するステップと、上記分割された単語が基底に該当するか否かを判断するステップと、上記判断ステップで基底に該当すると判断された単語を上記読み出した単語に対する基底として関連付けるステップと、分割した単語が基底に該当しない場合には当該単語が基底に該当するまで上記のステップを再帰的に行うステップとを有することを主な特徴とする。 The present invention is a method for generating a base database of a word by a computer including a storage unit that stores a word and one or more word sentences that explain the meaning of the word in association with each other. A step of reading a word from the storage means, a step of reading a word sentence stored in association with the read word from the storage means, a step of dividing the read word sentence by a morphological analysis process, and the divided word Determining whether or not the word corresponds to the base, associating the word determined to be the base in the determination step as a base for the read word, and if the divided word does not correspond to the base, Recursively performing the above steps until the word falls into the base. And features.

また、単語とこの単語の意味を説明する一以上の語釈文を関連付けて記憶する第一記憶手段、単語とその単語の出現頻度を関連付けて記憶する第二記憶手段、基底リストを記憶する第三記憶手段とを備えたコンピューターにより、単語の基底データベース生成に用いる基底リストを生成する方法であって、上記コンピューターが、上記第一記憶手段より、単語を読み出すステップと、上記第一記憶手段から読み出した単語に関連付けて記憶されている語釈文を上記読み出す語釈文読み出しステップと、上記読み出した語釈文を単語に分割する単語分割ステップと、上記分割した単語を第二記憶手段に登録するステップと、上記分割した単語に対応する語釈文を第一記憶手段から読み出すステップと、上記読み出した語釈文に対して単語分割処理、単語登録処理、語釈文データ読出し処理を再帰的に行うステップと、上記第二記憶手段に記憶した単語の中から登録回数が多いものを基底として抽出し、第三記憶手段の基底リストに登録するステップとを有することを特徴とする。 In addition, a first storage means for storing the word and one or more word explanations explaining the meaning of the word in association with each other, a second storage means for storing the word and the appearance frequency of the word in association with each other, and a third storage for storing the base list A method of generating a base list for use in generating a base database of words by a computer having storage means, wherein the computer reads a word from the first storage means, and reads from the first storage means Reading a reading of the sentence stored in association with the word, a word reading step for dividing the read word sentence into words, a step of registering the divided word in a second storage means, A step of reading out the sentence corresponding to the divided word from the first storage means, and a word dividing process on the read out sentence. , Recursively performing word registration processing and word interpretation data reading processing, and extracting from the words stored in the second storage means as a base, and registering them in the base list of the third storage means And a step of performing.

本発明によれば、各単語を構成する意味概念を定量的に扱うことができるように、各単語の意味を構成する「基底」を明らかにすることができる。これによって基底を用いた文章解析処理にコンピューターを用いることが可能となり、これまでの文章解析処理では得ることができなかった意味概念解析処理を行うことが出来る。 According to the present invention, it is possible to clarify the “base” constituting the meaning of each word so that the semantic concept constituting each word can be handled quantitatively. This makes it possible to use a computer for sentence analysis processing using the base, and to perform semantic concept analysis processing that could not be obtained by conventional sentence analysis processing.

以下、図面を参照しながら、本発明にかかる基底データベース生成方法の実施形態について説明する。図１は本発明に係る基底データベース生成装置の構成を示す機能ブロック図である。基底データベース生成装置１は本発明にかかる基底データベース生成方法を実行する基底データベース生成プログラム２を搭載したコンピューターによって構成される。図１に示す各機能ブロックは、本発明に係る基底データベース生成プログラムと当該コンピューターのハードウェアが協働して処理を実行することにより実施される。 Hereinafter, embodiments of a base database generation method according to the present invention will be described with reference to the drawings. FIG. 1 is a functional block diagram showing a configuration of a base database generation apparatus according to the present invention. The base database generation device 1 is configured by a computer equipped with a base database generation program 2 that executes the base database generation method according to the present invention. Each functional block shown in FIG. 1 is implemented when the base database generation program according to the present invention and the hardware of the computer cooperate to execute processing.

図１において基底データベース生成装置１は辞書データ記憶部１０、基底データベース記憶部３０、辞書データ読み出し部１１、形態素解析部１２、表記ゆれ補正部１３、探査情報蓄積部１４、基底判断部１５、基底登録部１６、終了条件判断部１７、基底リスト１８を有してなる。 In FIG. 1, the base database generation device 1 includes a dictionary data storage unit 10, a base database storage unit 30, a dictionary data reading unit 11, a morpheme analysis unit 12, a notation fluctuation correction unit 13, a search information storage unit 14, a base determination unit 15, a base A registration unit 16, an end condition determination unit 17, and a base list 18 are provided.

辞書データ記憶部１０は、「見出し語」となる単語データ（見出し語データ）と「語釈文」となる文章データ（語釈文データ）とを関連付けて記憶する辞書データを記憶する。語釈文は見出し語である単語の意味解釈を説明する一以上の文章で構成される。即ち、語釈文データは見出し語データの意味解釈を説明する文章データ群によって構成される。辞書データ記憶部１０は、本発明に係るプログラムを実行するコンピューターが備える外部記憶装置（ハードディスク装置等）上に構築され、予め辞書データを記憶しておく。辞書データは見出し語データを読み出し可能であり、また見出し語データを指定することでその見出し語データに関連付けられて記憶されている語釈文データを読み出し可能なものであって、上記のようにコンピューターが備えるハードウェア手段に記憶できるものであれば、その記憶方法や形式、データ構造は限定されない。 The dictionary data storage unit 10 stores dictionary data that stores word data (headword data) that is “headword” and sentence data (word data) that is “word interpretation” in association with each other. An lexical sentence is composed of one or more sentences that explain the semantic interpretation of a word that is a headword. That is, the word interpretation data is composed of sentence data groups that explain the semantic interpretation of the entry word data. The dictionary data storage unit 10 is constructed on an external storage device (such as a hard disk device) included in a computer that executes a program according to the present invention, and stores dictionary data in advance. Dictionary data can read headword data, and by specifying headword data, word sentence data stored in association with the headword data can be read. As long as it can be stored in the hardware means included in the storage method, the storage method, format, and data structure are not limited.

辞書データ記憶部１０に記憶される辞書データの例を図２に示す。図２において辞書データ１００は、見出し語データ１０１と、この見出し語データ１０１の読み方（発音）を示す読みデータ１１１と、この見出し語データ１０１の意味を説明する文章である語釈文データ１０２、読み出し済フラグ１１２を有してなる。図２において、各見出し語データ１０１には、読み出し済フラグ１１２が関連付いて記憶されている。この読み出し済フラグは本発明にかかる基底生成プログラムが当該見出し語データ１０１を読み出した後にフラグデータ（例えば「*」）を挿入し、当該プログラムが既にこの見出し語データ１０１に対する処理を行ったことを示す為に用いる。語釈文データ１０２は、見出し語データ１０１をインデックスとして辞書データ記憶部１０に記憶されているので、１の見出し語データ１０１を指定することでそれに関連付けて記憶されている全ての語釈文データ１０２を読み出すことができるように構成されている。 An example of dictionary data stored in the dictionary data storage unit 10 is shown in FIG. In FIG. 2, dictionary data 100 includes headword data 101, reading data 111 indicating how to read (pronounced) the headword data 101, and sentence interpretation data 102 that is a sentence explaining the meaning of the headword data 101. A completed flag 112. In FIG. 2, each headword data 101 stores a read flag 112 in association with it. This read flag indicates that flag data (for example, “*”) is inserted after the headword generation program according to the present invention reads the headword data 101, and that the program has already processed the headword data 101. Used to show. Since the word interpretation data 102 is stored in the dictionary data storage unit 10 using the entry word data 101 as an index, by specifying one entry word data 101, all of the word interpretation data 102 stored in association with it is specified. It is comprised so that it can read.

なお、辞書データ記憶部１０を実装する記憶手段は上記例示のハードディスク装置の他に、ＣＤ−ＲＯＭなどの外部記憶装置であっても構わない。本発明に係るコンピュータプログラムの指示によって辞書データ１００に対して上記所定の読み出し処理が行える装置であれば媒体は問わず本発明は同様の効果を得ることができる。 The storage means for mounting the dictionary data storage unit 10 may be an external storage device such as a CD-ROM in addition to the above-described hard disk device. The present invention can obtain the same effect as long as it is an apparatus capable of performing the predetermined reading process on the dictionary data 100 according to the instruction of the computer program according to the present invention.

基底データベース記憶部３０は、本発明に係るプログラムによって生成される基底データベースを格納する記憶装置である。基底データベース記憶部３０に格納される基底データベースは、上記見出し語データ１０１に相当する単語データをインデックスとし、上記基底リスト１８に含まれる単語データ１８ａをフィールドとするテーブル構造を有してなる。 The base database storage unit 30 is a storage device that stores a base database generated by the program according to the present invention. The base database stored in the base database storage unit 30 has a table structure in which word data corresponding to the entry word data 101 is used as an index and word data 18a included in the base list 18 is used as a field.

図３に上記記憶部３０に格納される基底データベースの例を示す。図３において基底データベース４００は、単語４０１をインデックスとして、基底４０２をフィールドとするテーブル構造を有する。単語４０１の単語の意味構成に関係する基底４０２に値「１」を記憶し、関係しない基底４０２に「０」を記憶する。基底データベース４００は、単語と基底との関連付けが登録できる構造を有することができれば如何なる構造であってもよく、図３に示す例に限られない。 FIG. 3 shows an example of the base database stored in the storage unit 30. In FIG. 3, the base database 400 has a table structure with a word 401 as an index and a base 402 as a field. The value “1” is stored in the base 402 related to the semantic structure of the word 401, and “0” is stored in the base 402 that is not related. The base database 400 may have any structure as long as it has a structure capable of registering associations between words and bases, and is not limited to the example shown in FIG.

辞書データ読み出し部１１は、上記辞書データ１００に対して所定の問い合わせ処理を行う。所定の問い合わせ処理とは、見出し語データ１０１の読み出し処理、見出し語データ１０１に関連付いて記憶されている語釈文データ１０２を読み出す処理、読み出した見出し語データ１０１の読出し済フラグ１１２に値を挿入する処理、探査情報蓄積部１４の探査済データに当該見出し語データ１０１を追加する処理である。また、語釈文データ１０２を読み出した際に探査階層情報に１を加算する処理も行う。 The dictionary data reading unit 11 performs a predetermined inquiry process on the dictionary data 100. The predetermined inquiry processing includes reading processing of the headword data 101, processing of reading the word sentence data 102 stored in association with the headword data 101, and inserting a value into the read flag 112 of the read headword data 101. Processing for adding the headword data 101 to the searched data in the search information storage unit 14. Also, a process of adding 1 to the search hierarchy information when the interpretation data 102 is read.

形態素解析部１２は辞書データ１００から読み出した語釈文データ１０２に対して形態素解析処理を行い、単語データに分割する処理を行う。 The morpheme analysis unit 12 performs a morpheme analysis process on the word sentence data 102 read from the dictionary data 100 and divides it into word data.

ここで形態素解析処理について説明する。形態素解析処理は、文書を形態素と呼ばれる語の最小単位に分割し、分割した形態素の品詞を決定する処理のことである。上記形態素解析部１２は語釈文データ１０２に対して分ち書き処理、読み付与処理、品詞付与処理を行った後、活用型と活用形があるものに関しては、活用型、活用形、基本形の情報を付与する処理を行う。この形態素解析部１２によって語釈文データ１０２から後段の処理に用いる事ができる単語データを生成することが可能となる。形態素解析処理は既に周知となっているソフトウェアを用いても本発明は実施可能である。形態素解析部１２に用いることが可能なものとして、所定の言語体系に含まれる全ての語をデータとして含む辞書データを用意し、解析対象の文書に含まれる語がその辞書データ内の語に合致するか否かの判断をし、合致した場合はその文書に含まれていた語を分割処理するソフトウェアや、大規模コーパスを基にした統計確率手法による算術的に形態素に分割し品詞を決定することで単語分割をするソフトウェアなどがある。 Here, the morphological analysis process will be described. The morpheme analysis process is a process of dividing a document into minimum units of words called morphemes and determining the part of speech of the divided morphemes. The morpheme analysis unit 12 performs segmentation processing, reading assignment processing, and part-of-speech addition processing on the interpretation data 102, and information on the utilization type, the utilization form, and the basic form regarding the utilization form and the utilization form. The process of giving is performed. The morpheme analysis unit 12 can generate word data that can be used for subsequent processing from the word interpretation data 102. The morphological analysis process can be carried out using software that is already known. As data that can be used in the morphological analysis unit 12, dictionary data including all words included in a predetermined language system as data is prepared, and a word included in a document to be analyzed matches a word in the dictionary data. If there is a match, the word contained in the document is divided into software and the statistical probability method based on a large corpus is used to arithmetically divide it into morphemes to determine the part of speech. There is software that breaks up words.

表記ゆれ補正部１３は形態素解析部１２によって語釈文データ１０２から分割して生成された単語データに対して表記ゆれの補正処理を行う。表記ゆれとは、発音や意味が同じであっても、表記が異なることを意味する。例えば「インターフェース」には「インタフェース」や「インターフェイス」という表記も存在しうる。この表記ゆれを補正するために表記ゆれ補正部１３では語釈文データ１０２から分割された単語データの表記を見出し語データ１０１に合わせる処理を行う。例えば見出し語データ１０１が「インターフェース」という表記であれば、上記分割された単語データが「インタフェース」であった場合に、それを「インターフェース」に変換する処理を行うことをいう。 The notation fluctuation correction unit 13 performs a notation fluctuation correction process on the word data generated by dividing the word sentence data 102 by the morphological analysis unit 12. The notation fluctuation means that the notation is different even if the pronunciation and meaning are the same. For example, “interface” and “interface” can also be described as “interface”. In order to correct the notation fluctuation, the notation fluctuation correcting unit 13 performs a process of matching the notation of the word data divided from the word interpretation data 102 with the entry word data 101. For example, if the headword data 101 is represented as “interface”, when the divided word data is “interface”, the headword data 101 is converted into “interface”.

探査情報蓄積部１４は、分割された単語データと上記探査階層情報を関連付けて生成される未探査リストを記憶する。未探査リストの具体例は図４に示すとおりである。図４において未探査リスト７００は、未探査データ７０１とそれに関連する探査階層情報７０２によるテーブル構造を有してなる。探査階層情報は、基底判断処理を行った見出し語データ１０１を用いて語釈文データ１０２の読み出し処理を行う際に、再帰的に語釈文データ１０２を読み出す回数をカウントするために用いる情報である。 The search information storage unit 14 stores an unsearched list generated by associating the divided word data with the search hierarchy information. A specific example of the unexplored list is as shown in FIG. In FIG. 4, an unexplored list 700 has a table structure with unexplored data 701 and associated search hierarchy information 702. The exploration hierarchy information is information used to count the number of times of reading the word sentence data 102 recursively when the word sentence data 102 is read using the headword data 101 that has undergone the base determination process.

また、探査情報蓄積部１４には図示しない探査済データも蓄積する。探査済データは基底判断部１５において基底判断処理を行った見出し語データ１０１と未探査データ７０１が随時蓄積されるファイルである。この探査済データは同じデータに対する基底判断処理を重複して行うことを回避する為に用いるものである。従って、未探査リスト、探査済データ共に、後に説明する本発明に係るプログラムによって所定の処理に対応することが可能で有れば、その記憶形式等は上記に限ることはない。 The search information storage unit 14 also stores searched data (not shown). The searched data is a file in which the headword data 101 and the unsearched data 701 that have been subjected to the base determination process in the base determination unit 15 are accumulated at any time. This searched data is used for avoiding duplicated base judgment processing for the same data. Therefore, the storage format and the like of the unsearched list and the searched data are not limited to the above as long as the program according to the present invention, which will be described later, can handle predetermined processing.

基底判断部１５は、辞書データ読み出し部１１が読み出した見出し語データ１０１が基底に該当するか否かの判断を、基底リスト１８を用いた問い合わせ処理によって行う。この処理の結果によって基底リスト１８に含まれると判断された見出し語データ１０１を基底登録部１６に渡す。また、基底判断部１５は上記の問い合わせ処理を行った見出し語データ１０１を探査情報蓄積部１４に蓄積されている探査済データに追加する処理を行う。
基底判断部１５は、未探査リスト７００内に記憶されている未探査データ７０１を読み出して、この未探査データ７０１を未探査リスト７００から削除した後に、この未探査データ７０１が探査情報蓄積部１４に記憶されている探査済データに含まれているか否かを確認する処理を行う。この確認処理の結果、上記未探査データ７０１が探査済データに含まれていなければ、この未探査データ７０１を探査済データに追加して上記問い合わせ処理を行う。問い合わせ処理の結果、当該未探査データ７０１が基底リスト１８に含まれていると判断されれば当該未探査データ７０１を基底登録部１６に渡す。上記確認処理において未探査データ７０１が探査済データに含まれていれば、新たに未探査リスト７００から未探査データ７０１を読み出す処理を行う。 The base determination unit 15 determines whether or not the headword data 101 read by the dictionary data reading unit 11 corresponds to a base by an inquiry process using the base list 18. The entry word data 101 determined to be included in the base list 18 based on the result of this processing is passed to the base registration unit 16. In addition, the base determination unit 15 performs a process of adding the entry word data 101 that has been subjected to the above inquiry process to the searched data stored in the search information storage unit 14.
The base determination unit 15 reads the unsearched data 701 stored in the unsearched list 700 and deletes the unsearched data 701 from the unsearched list 700, and then the unsearched data 701 is stored in the search information storage unit 14. To check whether it is included in the searched data stored in. If the unexplored data 701 is not included in the searched data as a result of the confirmation process, the unsearched data 701 is added to the searched data and the inquiry process is performed. If it is determined that the unexplored data 701 is included in the base list 18 as a result of the inquiry process, the unsearched data 701 is passed to the base registration unit 16. If the unsearched data 701 is included in the searched data in the confirmation process, a process of newly reading the unsearched data 701 from the unsearched list 700 is performed.

上記基底リスト１８は、コンピューターが具備するメモリ上に予め記憶しておくファイルであって基底に該当する単語データがリスト形式で格納されている。基底とは、単語の意味概念を構成する素たる概念を表す単語であって、予め人間によって決定するものとする。基底リスト１８は基底である単語データを１以上含んで構成され、上記基底判断部１５がある単語データを用いて問い合わせ処理を行った際に、その結果（有る／無い）を判定可能な形式であれば、その記憶方法及び情報の構造はこれ限ることなく他の構造等であっても本発明の効果は同様に得ることができる。基底リスト１８の例を図５に示す。図５において基底リスト１８は基底である単語の単語データ１８ａを１以上記憶するファイル構造を有してなる。 The base list 18 is a file stored in advance in a memory included in the computer, and word data corresponding to the base is stored in a list format. The base is a word that represents a basic concept that constitutes the semantic concept of the word, and is determined in advance by a human. The base list 18 is configured to include one or more word data as a base, and when the inquiry processing is performed using the word data with the base determination unit 15, the result (present / not present) can be determined. If there is, the storage method and the structure of information are not limited to this, and the effects of the present invention can be obtained in the same manner even if other structures are used. An example of the base list 18 is shown in FIG. In FIG. 5, the base list 18 has a file structure for storing one or more word data 18a of a word as a base.

基底登録部１６は、上記の基底判断部１５により基底であると判断された見出し語データ１０１または未探査データ７０１を基底データベース４００に格納する処理を行う。基底データベースの構造は既に説明した通りである。基底登録部１６は見出し語データ１０１または未探査データ７０１を基底データベースのインデックス(単語４０１)に追加し、当該インデックスに関連する基底４０１の中で上記基底判断処理において処理をした基底と同じ基底４０１に該当するフィールドに１を追加し、該当しない基底４０２のフィールドに０を追加する処理を行う。 The base registration unit 16 performs processing for storing the headword data 101 or the unsearched data 701 determined to be the base by the base determination unit 15 in the base database 400. The structure of the base database is as described above. The base registration unit 16 adds the entry word data 101 or the unsearched data 701 to the index (word 401) of the base database, and the same base 401 as the base processed in the base determination process in the base 401 related to the index. 1 is added to the field corresponding to, and 0 is added to the field of the base 402 that is not applicable.

終了条件判断部１７は、上記未探査リスト７００に未探査データ７０１が存在するか否かの問い合わせ処理を行う。この処理の結果、未探査データ７０１が抽出されなければ（つまり未探査データ７０１が一つも存在しなければ）、探査済データをクリアして上記探査階層情報の値をゼロにする。また、終了条件判断部１７は、辞書データ１００の読み出し済フラグ１１２が空白な見出し語データ１０１があるか否かを問い合わせる処理を行う。この処理の結果、読み出しフラグ１１２が空白な見出し語データ１０１が無ければ本発明に係るプログラムの動作を終了させる。 The end condition determination unit 17 performs an inquiry process as to whether or not the unsearched data 701 exists in the unsearched list 700. If unexplored data 701 is not extracted as a result of this processing (that is, if there is no unexplored data 701), the searched data is cleared and the value of the search hierarchy information is set to zero. Further, the end condition determination unit 17 performs a process for inquiring whether or not there is the entry word data 101 in which the read flag 112 of the dictionary data 100 is blank. As a result of this processing, if there is no entry word data 101 whose read flag 112 is blank, the operation of the program according to the present invention is terminated.

上記機能を具備する基底データベース生成装置１によって実施する基底データベース生成方法について説明する。図６は基底データベース生成方法を実行する本発明に係るコンピュータプログラムの処理の流れを示すフローチャートである。 A base database generation method performed by the base database generation apparatus 1 having the above functions will be described. FIG. 6 is a flowchart showing the flow of processing of the computer program according to the present invention for executing the base database generation method.

先ず、ステップ２０１は見出し語読み出し処理である。ステップ２０１において辞書データ読み出し部１１は辞書データ記憶部１０に記憶している辞書データ１００の記憶領域先頭位置の見出し語データ１０１を読み出して、読み出し済フラグ１１２に「*」を追加する。見出し語データ１０１の読み出しは辞書データ１００に記憶されている順番で行う。 First, step 201 is a headword reading process. In step 201, the dictionary data reading unit 11 reads the headword data 101 at the head position of the storage area of the dictionary data 100 stored in the dictionary data storage unit 10 and adds “*” to the read flag 112. Reading of the headword data 101 is performed in the order stored in the dictionary data 100.

ステップ２０２は基底判断処理である。ステップ２０２において、基底判断部１５は上記にて読み出された見出し語データ１０１が、基底リスト１８に含まれる単語データに合致するか否かを問い合わせる処理を行う。この問い合わせ処理の結果、基底リスト１８から上記見出し語データ１０１が抽出されれば、基底に該当するので、この見出し語データ１０１を基底登録部１６に渡す。上記問い合わせ処理の結果、見出し語データ１０１が抽出されなければ、基底に該当しないので、この見出し語データ１０１を辞書データ読み出し部１１に渡す。上記問い合わせ処理をした当該見出し語データ１０１を基底判断部１５は探査済データに追加する。 Step 202 is a basic judgment process. In step 202, the base determination unit 15 performs a process of inquiring whether or not the headword data 101 read out above matches the word data included in the base list 18. If the headword data 101 is extracted from the base list 18 as a result of the inquiry process, it corresponds to the base, and the headword data 101 is transferred to the base registration unit 16. If the entry word data 101 is not extracted as a result of the inquiry process, the entry word data 101 is transferred to the dictionary data reading unit 11 because it does not correspond to the basis. The base determination unit 15 adds the entry word data 101 subjected to the inquiry process to the searched data.

ステップ２０４は基底登録処理である。ステップ２０４において基底登録部１６は基底判断部１５から渡された見出し語データ１０１を基底データベース記憶部３０に記憶されている基底データベース４００のインデックス（単語４０１）に追加し、上記基底判断処理に用いられた基底リスト１８内の単語データと同一の単語データである基底４０２に「１」を追加し、それ以外の基底４０２に「０」を追加する処理を行う。 Step 204 is a base registration process. In step 204, the base registration unit 16 adds the entry word data 101 passed from the base determination unit 15 to the index (word 401) of the base database 400 stored in the base database storage unit 30 and uses it for the base determination process. Then, “1” is added to the base 402 which is the same word data as the word data in the base list 18 and “0” is added to the other bases 402.

ステップ２１３は全見出し語終了判断処理である。ステップ２１３において終了条件判断部１７は、見出し語データ１０１に関連付いて記憶されている読み出し済フラグ１１２に空欄があるか否かの問い合わせ処理を行う。この処理の結果、空欄があればその見出し語データ１０１は基底判断処理が行われていないことになるので、本プログラムの処理をステップ２０１に再帰させる。また上記処理の結果、読み出し済フラグ１１２に空欄が無ければ、本プログラムを終了させる。 Step 213 is an all headword end determination process. In step 213, the end condition determination unit 17 performs an inquiry process as to whether or not there is a blank in the read flag 112 stored in association with the entry word data 101. As a result of this processing, if there is a blank, the entry word data 101 has not been subjected to base determination processing, so the processing of this program is recursed to step 201. As a result of the above processing, if there is no blank in the read flag 112, the program is terminated.

次に、上記ステップ２０２において見出し語データ１０１が基底に該当しなかった場合の処理について説明する。 Next, processing when the headword data 101 does not correspond to the base in the above step 202 will be described.

ステップ２０３は語釈文読み出し処理である。ステップ２０３において辞書データ読み出し部１０１は、先に読み出した見出し語データ１１に関連付いて記憶されている語釈文データ１０２を読み出す処理を行う。 Step 203 is a sentence reading process. In step 203, the dictionary data reading unit 101 performs a process of reading the word sentence data 102 stored in association with the headword data 11 read previously.

ステップ２０５は形態素解析処理である。ステップ２０５において形態素解析部１２はステップ２０３において読み出した語釈文データ１０２に対して形態素解析処理を行う。この形態素解析処理によって当該語釈文データ１０２は単語データに分割される。 Step 205 is a morphological analysis process. In step 205, the morpheme analysis unit 12 performs morpheme analysis processing on the word sentence data 102 read in step 203. The word sentence data 102 is divided into word data by this morphological analysis process.

ステップ２０６は表記ゆれ補正処理である。ステップ２０６において表記ゆれ補正部１４は形態素解析部１３によって生成された単語データの表記を、見出し語データ１０１の表記に合わせる補正処理を行う。 Step 206 is a notation fluctuation correction process. In step 206, the notation fluctuation correction unit 14 performs a correction process for matching the notation of the word data generated by the morpheme analysis unit 13 with the notation of the entry word data 101.

上記ステップ２０５、２０６によって得る結果の例を図７に示す。図７は図１に示した語釈文データ１０６に対して上記処理を行った結果例である。図７に示すように解析結果データ６００は語釈文１０６から分割した単語データ（形態素）毎に、基本形、読み、品詞分類、品詞細分類、活用型の有無とその型名、活用形の有無とその形名を関連づけて構成される。この解析結果データ６００に含まれる形態素の中で、語の意味概念に直接関与しない形態素である非自立語（例えば助詞や助動詞）、形式名詞（例えば「こと」）、補助用言（例えば「いる」「ある」「ない」）を除去した形態素の基本形、すなわち「電子計算機」、「異なる」、「機器」、「装置」、「あいだ」、「接続」、「する」、「交信」、「制御」、「可能だ」、「装置」、「ソフトウェア」に対して、ステップ２０６において表記ゆれ補正処理を行い、その結果「電子計算機」、「異なる」、「機器」、「装置」、「間」、「接続」、「する」、「交信」、「制御」、「可能だ」、「装置」、「ソフトウェア」という単語データを探査情報蓄積部１４の未探査リスト７００に未探査データ７０１として記憶する。 An example of the result obtained by the above steps 205 and 206 is shown in FIG. FIG. 7 shows an example of a result obtained by performing the above processing on the word interpretation data 106 shown in FIG. As shown in FIG. 7, the analysis result data 600 includes basic form, reading, part-of-speech classification, part-of-speech subclassification, presence / absence of utilization type, type name, presence / absence of utilization form for each word data (morpheme) divided from the interpretation sentence 106 It is constructed by associating its model name. Among the morphemes included in the analysis result data 600, non-independent words (for example, particles and auxiliary verbs) that are morphemes that are not directly related to the semantic concept of words, formal nouns (for example, “ko”), auxiliary words (for example, “is” ”“ Yes ”“ No ”), ie“ electronic computer ”,“ different ”,“ equipment ”,“ device ”,“ between ”,“ connect ”,“ do ”,“ communication ”,“ In step 206, the correction of notation is performed on “control”, “possible”, “device”, and “software”. As a result, “electronic computer”, “different”, “equipment”, “device”, “interval” are performed. ”,“ Connect ”,“ Yes ”,“ Communication ”,“ Control ”,“ Available ”,“ Apparatus ”,“ Software ”as unexplored data 701 in the unexplored list 700 of the exploration information storage unit 14. Remember.

上記の処理で語釈文データ１０２から分割された単語データに対して基底判断処理を行った結果、この単語データがさらに基底ではないと判断された場合に、この単語データを用いて再度語釈文データ１０２を読み出す処理を行うことになる。このように基底に該当する単語データに行き着くまで語釈文読み出し処理から基底判断処理までを繰り返す必要がある。このためステップ２０３は再帰的に処理されることになる。この時、最初に読み出した語釈文データ１０２から分割された単語データの探査階層情報を「１」として、再帰的に語釈文データ１０２を読み出すごとに、１を加算することで再帰回数を計測可能にする情報が探査階層情報である。 As a result of performing the base determination process on the word data divided from the word sentence data 102 in the above process, when it is determined that the word data is not further the base, the word data is used again for the word sentence data. Processing to read 102 is performed. Thus, it is necessary to repeat from the sentence reading process to the base determination process until the word data corresponding to the base is reached. Therefore, step 203 is processed recursively. At this time, it is possible to measure the number of recursions by adding 1 to each time the word sentence data 102 is read recursively by setting the search hierarchy information of the word data divided from the first read word data 102 to “1”. The information to make is exploration hierarchy information.

ステップ２０７は探査済判断処理である。ステップ２０７において基底判断部１５は未探査リスト７００に記憶されている未探査データ７０１と、その未探査データ７０１に関連付いて記憶されている探査階層情報７０２を読み出して、当該未探査データ７０１を未探査リスト７００から削除した後に、探査情報蓄積部１４に記憶されている図示しない探査済データに上記の未探査データ７０１が既に含まれているか否かの確認処理を行う。この処理の結果、探査済データ内に合致する単語データが存在しなければ上記未探査データ７０１はまだ基底判断処理を行っていないことになるので、上記未探査データ７０１を探査済データに追加してステップ２０８へ移行する。 Step 207 is a search completed determination process. In step 207, the base determination unit 15 reads out the unsearched data 701 stored in the unsearched list 700 and the search hierarchy information 702 stored in association with the unsearched data 701, and stores the unsearched data 701. After deleting from the unexplored list 700, it is confirmed whether or not the unsearched data 701 is already included in the searched data (not shown) stored in the search information storage unit 14. As a result of this processing, if there is no matching word data in the searched data, the unsearched data 701 has not yet undergone the base determination process, so the unsearched data 701 is added to the searched data. To step 208.

ステップ２０８は基底判断処理である。ステップ２０８において基底判断部１５は、先の処理で読み出した未探査データ７０１に対する基底判断処理を行う。基底判断処理の詳細は既に説明したステップ２０２と同様であり、続くステップ２０９は既に説明したステップ２０４の基底登録処理と同じ処理を行う。 Step 208 is a base determination process. In step 208, the base determination unit 15 performs base determination processing on the unsearched data 701 read in the previous processing. The details of the base determination process are the same as in step 202 already described, and subsequent step 209 performs the same process as the base registration process in step 204 already described.

ステップ２１０は探査終了判断処理である。ステップ２１０において、終了条件判断部１７は先の処理において基底判断部１５が探査情報記憶部１４から読み出した探査階層情報７０２（図４参照）が、所定の数値以下であるかどうかの判断を行う。ここで所定の数値は「５」とする。このように探査階層情報を用いて再帰的に語釈文読み出し処理を行う回数を制限する目的は本発明の実施を効率的に行う為であって上記所定の数値はこれに限るものではなく、本発明の必須要件ではない。 Step 210 is a search end determination process. In step 210, the end condition determination unit 17 determines whether or not the exploration hierarchy information 702 (see FIG. 4) read from the exploration information storage unit 14 by the base determination unit 15 in the previous process is equal to or less than a predetermined numerical value. . Here, the predetermined numerical value is “5”. The purpose of limiting the number of recursively read word interpretation processes using exploration hierarchy information is to efficiently implement the present invention, and the predetermined numerical value is not limited to this. It is not an essential requirement of the invention.

ステップ２１１は見出し語確認処理である。ステップ２１１において終了条件判断部１７は、上記処理に用いた未探査データ７０１が見出し語データ１０１として辞書データ１００に含まれているか否かを問い合わせる処理を行う。 Step 211 is a headword confirmation process. In step 211, the end condition determination unit 17 performs a process of inquiring whether or not the unsearched data 701 used in the above process is included in the dictionary data 100 as the entry word data 101.

ステップ２１２は終了判断処理である。ステップ２１２において終了条件判断部１７は、探査情報蓄積部１４に記憶する未探査リスト７００に１以上の未探査データ７０１が存在するか否かの問い合わせ処理を行う。この処理の結果、未探査データ７０１が一つも存在しなければ、探査済データをクリアして上記探査階層情報の値をゼロにする。 Step 212 is an end determination process. In step 212, the end condition determination unit 17 performs an inquiry process as to whether or not one or more unsearched data 701 exists in the unsearched list 700 stored in the search information storage unit 14. If there is no unexplored data 701 as a result of this processing, the searched data is cleared and the value of the search hierarchy information is set to zero.

ステップ２１３は既に説明したとおり全見出し語終了判断処理である。 Step 213 is all headword end determination processing as already described.

上記ステップ２０７からステップ２１２に至る処理について具体例を提示して説明する。上記ステップ２０７において基底判断部１５は、探査情報蓄積部１４に記憶する未探査リスト７００の先頭位置に記憶されている未探査データ７０１（図４において「電子計算機」）と、この未探査データ７０１に関連付いて記憶されている探査階層情報（図４において「１」）とを読み出す。次に基底判断部１５は読み出した未探査データ７０１が探査済データに含まれているか否かの問い合わせ処理を行う。 A specific example of the processing from step 207 to step 212 will be described. In step 207, the base determination unit 15 stores the unsearched data 701 (“electronic computer” in FIG. 4) stored at the head position of the unsearched list 700 stored in the search information storage unit 14 and the unsearched data 701. The search hierarchy information ("1" in FIG. 4) stored in association with is read out. Next, the base determination unit 15 performs an inquiry process as to whether or not the read unsearched data 701 is included in the searched data.

次のステップ２０８において基底判断部１５は、上記未探査データ７０１である「電子計算機」なる単語の基底判断処理を行う。「電子計算機」は基底リスト１８に含まれていないので、基底では無いと判断される（２０８のＮ）。次のステップ２１０において終了条件判断部１７は上記探査階層情報の値が終了条件である「５以上」であるか判断する。上記のように探査階層情報は「１」なので終了条件は満たさない（２１０のＮ）。次のステップ２１１おいて終了条件判断部１７は上記「電子計算機」が見出し語データ１０１に存在するか否かの問い合わせ処理を行う。辞書データ１００には「電子計算機」が見出し語１０７として含まれているので（２１１のＹ））単語データ「電子計算機」を辞書データ読み出し部１１に渡して処理はステップ２０３に再帰する。 In the next step 208, the basis determination unit 15 performs a basis determination process for the word “electronic computer” that is the unexplored data 701. Since “electronic computer” is not included in the base list 18, it is determined that it is not a base (N in 208). In the next step 210, the end condition determining unit 17 determines whether the value of the search hierarchy information is “5 or more” which is the end condition. Since the exploration hierarchy information is “1” as described above, the termination condition is not satisfied (N in 210). In the next step 211, the termination condition determination unit 17 performs an inquiry process as to whether or not the “electronic computer” exists in the entry word data 101. Since the dictionary data 100 includes “electronic computer” as the entry word 107 (Y in 211), the word data “electronic computer” is passed to the dictionary data reading unit 11 and the process returns to step 203.

次に再度、ステップ２０３の処理を行う。ステップ２０３において辞書データ読み出し
部１１は、見出し語データ１０１が「電子計算機」である語釈文データ１０２を読み出して、探査階層情報に１を加える。図１において読み出される語釈文データ１０２は語釈文１０８である（図２参照）。次にステップ２０５において形態素解析処理を行い、単語に分割された「コンピュータ」と「の」、「こと」、「。」から非自立語「の」、「。」と形式名詞「こと」を除いた「コンピュータ」に対してステップ２０６において表記ゆれ補正処理を行い、処理結果である「コンピューター」を探査階層情報（値「２」）と共に探査情報蓄積部１４の未探査リスト７００に追加記憶する。「コンピューター」を追加した状態の未探査リスト７００の例を図４に示す。 Next, the process of step 203 is performed again. In step 203, the dictionary data reading unit 11 reads the word sentence data 102 whose headword data 101 is “electronic computer” and adds 1 to the search hierarchy information. The word sentence data 102 read in FIG. 1 is the word sentence 108 (see FIG. 2). Next, in step 205, morphological analysis processing is performed, and the non-independent words “no”, “.” And the formal noun “ko” are excluded from “computer”, “no”, “ko”, “.” Divided into words. In step 206, the “computer” is subjected to the notation fluctuation correction process, and the “computer” as the processing result is additionally stored in the unsearched list 700 of the search information storage unit 14 together with the search hierarchy information (value “2”). FIG. 4 shows an example of the unexplored list 700 with “computer” added.

次にステップ２０７において未探査リストの未探査データ７０１の先頭位置に記憶されている単語データ「異なる」が探査済データに含まれているか否かの問い合わせ処理を行う。上記単語データ「異なる」の基底判断処理は探査済データに含まれていないので処理はステップ２０８に移行する。ステップ２０８において基底判定部１５は単語データ「異なる」が基底リスト１８に含まれているか否かの判断を行う。基底リスト１８に「異なる」は含まれていないので、当該単語データは基底では無いと判断される（Ｎ）。 Next, in step 207, an inquiry process is performed to determine whether or not the word data “different” stored at the head position of the unsearched data 701 in the unsearched list is included in the searched data. The basis determination process for the word data “different” is not included in the searched data, and the process proceeds to step 208. In step 208, the basis determination unit 15 determines whether or not the word data “different” is included in the basis list 18. Since “different” is not included in the base list 18, it is determined that the word data is not a base (N).

次にステップ２１０において、探査終了判断処理を行う。終了条件判断部１７は「異なる」の探査階層情報が終了条件を満たすか否かの判定を行う。探査階層情報は「１」であり終了条件は満たさない。従って処理をステップ２１１に移行する。 Next, in step 210, search end determination processing is performed. The end condition determination unit 17 determines whether “different” search hierarchy information satisfies the end condition. The exploration hierarchy information is “1” and the termination condition is not satisfied. Therefore, the process proceeds to step 211.

次にステップ２１１において、辞書データ読み出し部１１は「異なる」が辞書データ１００の見出し語１０１として存在するか否か照合し、見出し語データ１０１に「異なる」は存在しないので処理をステップ２１２に移行する（Ｎ）。 Next, in step 211, the dictionary data reading unit 11 checks whether or not “different” exists as the entry word 101 of the dictionary data 100. Since there is no “different” in the entry word data 101, the process proceeds to step 212. (N).

次にステップ２１２において、終了条件判断部１７は探査情報記憶部１４に記憶されている未探査リスト７００に単語データが存在するか否かの判断を行う。未探査リストには単語データが存在するので処理をステップ２０７に移行する（２１２のＮ）。このように未探査リストに含まれる未探査データ７０１が無くなるまで（未探査リストの内容が空になるまで）上記処理を再帰的に継続する。ステップ２０７において未探査リスト中に含まれる未探査データ７０１が無ければ、探査済データをクリアして処理をステップ２１３に移行する。 Next, at step 212, the end condition determination unit 17 determines whether word data exists in the unsearched list 700 stored in the search information storage unit 14. Since word data exists in the unexplored list, the process proceeds to step 207 (N in 212). In this way, the above processing is recursively continued until there is no unsearched data 701 included in the unsearched list (until the contents of the unsearched list becomes empty). If there is no unsearched data 701 included in the unsearched list in step 207, the searched data is cleared and the process proceeds to step 213.

このようにして辞書データ１００に含まれる全ての見出し語データ１０１に対する基底判断処理と、見出し語データ１０１に関連付いて記憶している語釈文データ１０２を分割した全ての単語データに対する基底判断処理を行った結果、図３に示した基底データベースが生成される。図３に示すとおり基底データベースには単語４０２に該当した基底４０２（行う、機器、色、味・・・Ｘn）のそれぞれに値「１」が追加されている。つまり各単語４０１（行う、インターフェース、単語１、単語２・・・単語ｍ）がいかなる基底４０２と関連する語であるかを、数値で表現することが可能となる。すなわち、各単語の意味概念を形成する基底との関連を数値で表わすことができるので、この基底データベースを用いることで単語の持つ意味概念を数学的手法により容易に処理可能となる。 In this way, basic determination processing for all headword data 101 included in the dictionary data 100 and basic determination processing for all word data obtained by dividing the word sentence data 102 stored in association with the headword data 101 are performed. As a result, the base database shown in FIG. 3 is generated. As shown in FIG. 3, the value “1” is added to each of the bases 402 (performed, equipment, color, taste... Xn) corresponding to the word 402 as shown in FIG. In other words, it is possible to express numerically what base 402 is associated with each word 401 (perform, interface, word 1, word 2... Word m). That is, since the relation with the base forming the semantic concept of each word can be expressed numerically, the semantic concept of the word can be easily processed by a mathematical method by using this base database.

以上説明した実施の形態によれば、辞書データから単語の意味概念データを自動構築できる効果がある。 According to the embodiment described above, there is an effect that word semantic concept data can be automatically constructed from dictionary data.

なお、この発明は上記一実施形態に限定されるものではなく、発明の要旨を変更しない範囲で終了判断条件は種々変形可能である。 The present invention is not limited to the above-described embodiment, and the end determination condition can be variously modified without changing the gist of the invention.

本発明にかかる別の実施形態について図面を用いて説明する。図８は本発明に係る基底データベース生成装置の構成を示す機能ブロック図である。基底データベース生成装置１ａは本発明にかかる基底データベース生成方法を実行する基底データベース生成プログラム２ａを搭載したコンピューターによって実施される。図８に示す各機能ブロックは、本発明に係る基底データベース生成プログラムと当該コンピューターのハードウェアとが協働して処理を実行することにより実施されるものである。既に説明した図１における基底データベース生成装置と異なる部分は、探査情報蓄積部１４ａ、基底判断部１５ａ、基底化不能データベース記憶部４０である。その他の辞書データ記憶部１０，辞書データ読出し部１１，形態素解析部１２、表記ゆれ補正部１３、終了条件判断部１７、基底リスト１８、基底登録部１６、基底データベース記憶部３０は実施例１に用いた基底データベース生成装置１と同様の機能ブロックである。 Another embodiment according to the present invention will be described with reference to the drawings. FIG. 8 is a functional block diagram showing the configuration of the base database generation apparatus according to the present invention. The base database generation device 1a is implemented by a computer equipped with a base database generation program 2a for executing the base database generation method according to the present invention. Each functional block shown in FIG. 8 is implemented when the base database generation program according to the present invention and the hardware of the computer cooperate to execute processing. The parts different from the base database generation apparatus in FIG. 1 described above are a search information storage unit 14a, a base determination unit 15a, and a non-basisable database storage unit 40. The other dictionary data storage unit 10, dictionary data reading unit 11, morphological analysis unit 12, notation fluctuation correction unit 13, end condition determination unit 17, base list 18, base registration unit 16, and base database storage unit 30 are the same as those in the first embodiment. It is a functional block similar to the base database generation device 1 used.

基底化不能データベース記憶部４０は、図２に示す辞書データ１００に登録されている見出し語データ１０１や、語釈文データ１０２を分割して得られる未探査データ７０１（図４参照）の中で、本発明に係る基底データベース生成方法ではただ１つの基底へも到達できない単語データを登録する基底化不能データベースを記憶する。基底化不能データベースは、見出し語データ１０１、未探査データ７０１による問い合わせ処理を可能なものであれば、その作成方法・記録方法・データ構造は問わない。 The non-basisable database storage unit 40 includes headword data 101 registered in the dictionary data 100 shown in FIG. 2 and unsearched data 701 (see FIG. 4) obtained by dividing the word interpretation data 102. The base database generation method according to the present invention stores an unbasisable database that registers word data that cannot reach only one base. As long as the non-basisable database can be queried using the headword data 101 and the unsearched data 701, the creation method, recording method, and data structure are not limited.

探査情報蓄積部１４ａは、語釈文データ１０２から分割された単語データと、この単語データの生成元になった語釈文データ１０２に関連付いている見出し語データ１０１である読出し元の単語データ（以下「元単語データ」という）と、探査階層情報とを関連付けて生成される未探査リストを記憶する。この探査情報蓄積部１４ａに記憶される未探査リストの具体例を図１３に示す。図１３において未探査リスト７００ａは、未探査データ７０１と探査階層情報７０２と元単語データ７０３とを関連付けて記憶可能なテーブル構造を有してなる。図１３（ａ）、（ｂ）、（ｃ）に示すように未探査リスト７００ａは元単語データ７０３毎に生成され、記憶される。この未探査リスト７００ａは、後に説明する本発明に係るプログラムによって所定の処理に対応することが可能で有れば、その記憶形式等は上記説明した形式に限ることはない。 The search information storage unit 14a reads word data (hereinafter referred to as word data 101) associated with the word data divided from the word data 102 and the word data 102 from which the word data is generated (hereinafter referred to as word data 101). An unexplored list generated by associating “original word data” with the search hierarchy information is stored. A specific example of the unsearched list stored in the search information storage unit 14a is shown in FIG. In FIG. 13, the unexplored list 700a has a table structure that can store unsearched data 701, searched hierarchy information 702, and original word data 703 in association with each other. As shown in FIGS. 13A, 13B, and 13C, the unexplored list 700a is generated for each original word data 703 and stored. The unsearched list 700a is not limited to the format described above as long as it can handle a predetermined process by a program according to the present invention described later.

また、探査情報蓄積部１４ａには図示しない探査済データも蓄積する。探査済データは基底判断部１５ａにおいて基底判断処理を行った見出し語データ１０１と未探査データ７０１が随時蓄積されるファイルである。この探査済データは同じデータに対する基底判断処理を重複して行うことを回避する為に用いるものである。従って、未探査リスト、探査済データ共に、後に説明する本発明に係るプログラムによって所定の処理に対応することが可能で有れば、その記憶形式等は上記に限ることはない。 In addition, searched data (not shown) is also stored in the search information storage unit 14a. The searched data is a file in which the headword data 101 and the unsearched data 701 that have been subjected to the base determination process in the base determination unit 15a are accumulated at any time. This searched data is used for avoiding duplicated base judgment processing for the same data. Therefore, the storage format and the like of the unsearched list and the searched data are not limited to the above as long as the program according to the present invention, which will be described later, can handle predetermined processing.

基底判断部１５ａは、辞書データ読み出し部１１が読み出した見出し語データ１０１が基底に該当するか否かの判断を、基底リスト１８を用いた問い合わせ処理によって行う。
この処理の結果によって基底リスト１８に含まれると判断された見出し語データ１０１を基底登録部１６に渡す。また、基底判断部１５は、未探査リスト７００ａ内に記憶されている未探査データ７０１を読み出して、この未探査データ７０１を未探査リスト７００ａから削除した後に、この未探査データ７０１が探査情報蓄積部１４ａに記憶されている探査済データに含まれているか否かを確認する処理を行う。この確認処理の結果、上記未探査データ７０１が探査済データに含まれていなければ、この未探査データ７０１を探査済データに追加して上記問い合わせ処理を行う。この問い合わせ処理の結果、当該未探査データ７０１が基底リスト１８に含まれていると判断されれば当該未探査データ７０１を基底登録部１６に渡す。当該未探査データ７０１を未探査リスト７００aから削除する。また、基底化不能データベース記憶部４０にアクセスして、基底化不能データベースに所定の単語データが記憶されているか否かを問い合わせる処理も行う。上記確認処理において未探査データ７０１が探査済データに含まれていれば、新たに未探査リスト７００ａから未探査データ７０１を読み出す処理を行う。 The base determination unit 15 a determines whether the entry word data 101 read by the dictionary data reading unit 11 corresponds to a base by an inquiry process using the base list 18.
The entry word data 101 determined to be included in the base list 18 based on the result of this processing is passed to the base registration unit 16. Further, the base determination unit 15 reads out the unsearched data 701 stored in the unsearched list 700a, deletes the unsearched data 701 from the unsearched list 700a, and then stores the unsearched data 701 in the search information storage. Processing for confirming whether or not the data is included in the searched data stored in the unit 14a is performed. If the unexplored data 701 is not included in the searched data as a result of the confirmation process, the unsearched data 701 is added to the searched data and the inquiry process is performed. As a result of the inquiry processing, if it is determined that the unexplored data 701 is included in the base list 18, the unsearched data 701 is transferred to the base registration unit 16. The unexplored data 701 is deleted from the unexplored list 700a. Further, a process of accessing the non-basisable database storage unit 40 and inquiring whether or not predetermined word data is stored in the non-basisable database is also performed. If the unsearched data 701 is included in the searched data in the confirmation process, a process of newly reading the unsearched data 701 from the unsearched list 700a is performed.

上記基底データベース生成装置１ａを用いて行う基底データベース生成方法の処理の流れについて図９は基底データベース生成プログラムの処理の流れを示すフローチャートである。 FIG. 9 is a flowchart showing the processing flow of the base database generation program with respect to the processing flow of the base database generation method performed using the base database generation device 1a.

ステップ５０１は実施例１におけるステップ２０１と同様の見出し語読み出し処理である。ステップ５０１において辞書データ読み出し部１１は辞書データ記憶部１０に記憶している辞書データ１００の記憶領域先頭位置の見出し語データ１０１を読み出して、読み出し済フラグ１１２に「*」を挿入する。見出し語データ１０１の読み出しは辞書データ１００に記憶されている順番で行う。 Step 501 is a headword reading process similar to step 201 in the first embodiment. In step 501, the dictionary data reading unit 11 reads the entry word data 101 at the head position of the storage area of the dictionary data 100 stored in the dictionary data storage unit 10 and inserts “*” into the read flag 112. Reading of the headword data 101 is performed in the order stored in the dictionary data 100.

ステップ５０２は基底データベース確認処理である。ステップ５０２において基底判断部１５ａは上記ステップ５０１において読み出した見出し語データ１０１が基底データベース４００のインデックスに記憶されている否かを問い合わせる処理を行う。この処理の結果、上記見出し語データ１０１がインデックスに存在しなければ処理はステップ５０３に移行し、存在すればステップ５２１に移行する。 Step 502 is a base database confirmation process. In step 502, the base determination unit 15 a performs a process for inquiring whether or not the headword data 101 read in step 501 is stored in the index of the base database 400. As a result of this process, if the headword data 101 does not exist in the index, the process proceeds to step 503, and if present, the process proceeds to step 521.

ステップ５０３は基底判断処理である。実施例１におけるステップ２０２と同様の処理を行う。ステップ５０３において、基底判断部１５ａは上記見出し語データ１０１が、基底リスト１８に含まれる単語データに合致するか否かを問い合わせる処理を行う。この問い合わせ処理の結果、基底リスト１８から上記見出し語データ１０１が抽出されれば、この見出し語データ１０１は基底に該当するので、この見出し語データ１０１を基底登録部１６に渡す。 Step 503 is a base determination process. Processing similar to that in step 202 in the first embodiment is performed. In step 503, the base determination unit 15a performs a process for inquiring whether or not the headword data 101 matches the word data included in the base list 18. If the headword data 101 is extracted from the base list 18 as a result of the inquiry process, the headword data 101 corresponds to the base, and the headword data 101 is transferred to the base registration unit 16.

図２の辞書データ１００を例に用いて上記ステップの具体的な説明を行う。見出し語データ１０１の先頭位置に記憶されている見出し語１０３「行う」を読み出して（ステップ５０１）、この見出し語１０３が基底データベースに記憶されているか確認する。本プログラムの処理を開始した段階では基底データベース４００にインデックスとして登録されている単語はないので、見出し語１０３（行う）はインデックスに存在しないと判断され（ステップ５０２のＮ）、続くステップ５０３においては図５に示した基底リスト１８には見出し語１０１「行う」は含まれているので、基底であると判断され処理はステップ５０４に移行することになる。 The above steps will be specifically described using the dictionary data 100 in FIG. 2 as an example. The headword 103 “Done” stored at the head position of the headword data 101 is read (step 501), and it is confirmed whether or not the headword 103 is stored in the base database. Since there is no word registered as an index in the base database 400 at the stage when the processing of this program is started, it is determined that the headword 103 (perform) does not exist in the index (N in Step 502), and in the subsequent Step 503, Since the headword 101 “do” is included in the base list 18 illustrated in FIG. 5, the base list 18 is determined to be a base, and the process proceeds to step 504.

ステップ５０４は基底登録処理であって、実施例１のステップ２０４と同様の処理を行う。続くステップ５２１は全見出し語終了判断処理であって、実施例１のステップ２１３と同様の処理を行う。 Step 504 is a base registration process, and the same process as in step 204 of the first embodiment is performed. Subsequent step 521 is all headword end determination processing, which is the same processing as step 213 in the first embodiment.

ステップ５２１において終了条件判断部１７は、見出し語データ１０１に関連付いて記憶されている読み出し済フラグ１１２に空欄があるか否かの問い合わせ処理を行う。本実施例においては、この処理の結果は空欄ありとなるので、基底判断処理が行われていない見出し語データ１０１が存在し全見出し語データに対する基底判断処理は未終了であるので処理はステップ５０１に移行する。 In step 521, the end condition determination unit 17 performs an inquiry process as to whether or not there is a blank in the read flag 112 stored in association with the entry word data 101. In this embodiment, since the result of this process is blank, there is headword data 101 that has not been subjected to the base judgment process, and the base judgment process for all headword data has not been completed. Migrate to

次にステップ５０１において見出し語１０３の次に記憶されている見出し語データ１０１「インターフェース」（見出し語１０５）を読み出し、ステップ５０２において基底データベース４００に見出し語１０５が存在するか否かの判断を行う。基底データベース４００に見出し語１０５は存在しないので、処理はステップ５０３に移行する。ステップ５０３において見出し語１０５に対する基底判断処理を行う。見出し語１０５は基底リスト１８には含まれていないので基底ではないと判断され処理をステップ５０５に移行する（５０３のＮ）。 Next, in step 501, the headword data 101 “interface” (headword 105) stored next to the headword 103 is read, and in step 502, it is determined whether or not the headword 105 exists in the base database 400. . Since the entry word 105 does not exist in the base database 400, the process proceeds to step 503. In step 503, the base determination process for the headword 105 is performed. Since the headword 105 is not included in the base list 18, it is determined that it is not a base, and the process proceeds to step 505 (N in 503).

ステップ５０５は基底化不能判断処理である。ステップ５０５において、基底判断部１５ａは基底化不能データベース記憶部４０に記憶している図示しない基底化不能データベースに対して、上記見出し語データ１０１（見出し語１０５）を用いた問い合わせ処理を行う。基底化データベースには「インターフェース」なる単語は存在しないため本処理結果、ステップ５０６に移行する。 Step 505 is a non-basisable determination process. In step 505, the basis determination unit 15 a performs an inquiry process using the entry word data 101 (entry word 105) with respect to the unbasisable database (not shown) stored in the non-basisability database storage unit 40. Since there is no word “interface” in the base database, the process proceeds to step 506 as a result of this processing.

ステップ５０６は語釈文読み出し処理である。このステップ５０６と、これに続くステップ５０７、ステップ５０８はそれぞれ既に説明したステップ２０３、ステップ２０５、ステップ２０６と同様の処理を行う。ステップ５０８によって生成された未探査データ７０１は、探査情報蓄積部１４ａに未探査リスト７００ａとして記憶される。未探査リスト７００ａは図１３（ａ）に示す通り、未探査データ７０１、探査階層情報７０２とこの未探査データ７０１の元単語データ７０３を関連付けて記憶する。 Step 506 is a sentence reading process. In step 506 and subsequent steps 507 and 508, processing similar to that in steps 203, 205, and 206 described above is performed. The unexplored data 701 generated in step 508 is stored as the unsearched list 700a in the search information storage unit 14a. As shown in FIG. 13A, the unsearched list 700a stores unsearched data 701, search hierarchy information 702, and original word data 703 of the unsearched data 701 in association with each other.

ステップ５０９は探査済判断処理である。ステップ５０９において基底判断部１５ａは未探査リスト７００ａに記憶されている未探査データ７０１と、その未探査データ７０１に関連付いて記憶されている探査階層情報７０２を読み出して、当該未探査データ７０１を未探査リスト７００から削除した後に、探査情報蓄積部１４ａに記憶されている図示しない探査済データに上記の未探査データ７０１が既に含まれているか否かの確認処理を行う。この処理の結果、探査済データ内に合致する単語データが存在しなければ上記未探査データ７０１はまだ基底判断処理を行っていないことになるので、この未探査データ７０１を探査済データに追加して処理をステップ５１０に移行する。 Step 509 is a search completed determination process. In step 509, the base determination unit 15a reads the unexplored data 701 stored in the unsearched list 700a and the search hierarchy information 702 stored in association with the unsearched data 701, and uses the unsearched data 701 as the unsearched data 701. After deleting from the unexplored list 700, a process for confirming whether or not the unsearched data 701 is already included in the searched data (not shown) stored in the search information storage unit 14a is performed. As a result of this processing, if there is no matching word data in the searched data, the above-mentioned unsearched data 701 has not yet undergone the base determination processing, so this unsearched data 701 is added to the searched data. Then, the process proceeds to step 510.

ステップ５１０は基底登録確認処理である。ステップ５１０において基底判断部１５ａは上記未探査リスト７００に記憶される未探査データ７０１（電子計算機）が基底データベース４００のインデックスである単語データ４０１に既に記憶されていて存在するか否か問い合わせ処理を行う。「電子計算機」は基底データベース４００に存在しないことが判明するので、処理をステップ５１２に移行する。 Step 510 is a base registration confirmation process. In step 510, the base determination unit 15 a performs an inquiry process as to whether or not the unsearched data 701 (electronic computer) stored in the unsearched list 700 is already stored in the word data 401 that is an index of the base database 400. Do. Since it is found that “electronic computer” does not exist in the base database 400, the process proceeds to step 512.

ステップ５１２は基底判断処理であって、既に説明した実施例１のステップ２０８と同じ処理を行うので説明は省略する。上記の「電子計算機」は基底リスト１８に含まれていないので処理はステップ５１４に移行する。 Step 512 is a base determination process, and the same process as that of step 208 of the first embodiment described above is performed, and thus the description thereof is omitted. Since the above “electronic computer” is not included in the base list 18, the process proceeds to step 514.

ステップ５１４は基底化不能判断処理であって、既に説明したステップ５０５と同じ処理を行う。ステップ５１４において、上記の「電子計算機」は基底化不能データベースに存在しないためステップ５１５に移行する。 Step 514 is a non-basisability determination process, and the same process as step 505 described above is performed. In step 514, since the “electronic computer” does not exist in the non-basisable database, the process proceeds to step 515.

ステップ５１５は探査終了判断処理であって、実施例１のステップ２１０と同じ処理を行う。ステップ５１５において上記「電子計算機」に関連付く探査階層情報は「１」であるため終了条件を満たさない。従って処理をステップ５１６に移行する。 Step 515 is search end determination processing, which is the same processing as step 210 in the first embodiment. In step 515, the exploration hierarchy information associated with the “electronic computer” is “1”, so the termination condition is not satisfied. Therefore, the process proceeds to step 516.

ステップ５１６は見出し語確認処理であって、実施例１のステップ２１１と同じ処理を行う。ステップ５１６において上記「電子計算機」は見出し語データ１０１に存在するため処理を５０６に移行する（Ｎ）。ステップ５０６では、上記「電子計算機」に該当する語釈文データ１０２を読み出して探査階層情報に１を加算する。 Step 516 is a headword confirmation process, which is the same as step 211 in the first embodiment. In step 516, since the “electronic computer” exists in the entry word data 101, the process proceeds to 506 (N). In step 506, the sentence sentence data 102 corresponding to the “electronic computer” is read and 1 is added to the search hierarchy information.

上記ステップ５０６で読み出した語釈文データ１０２（語釈文１０８）を用いてステップ５０７、ステップ５０８を行い、図１３（ｂ）に示す未探査リスト７００ａを探査情報蓄積部１４ａに生成する。 Steps 507 and 508 are performed using the word interpretation data 102 (word interpretation 108) read in step 506, and the unsearched list 700a shown in FIG. 13B is generated in the search information storage unit 14a.

ステップ５０９において語釈文１０８から生成され未探査リスト（図１３（ｂ））に記憶されている未探査データ７０１（コンピューター）と、この未探査データ７０１に関連付いて記憶されている探査階層情報（値が「２」）を読み出して探査済判断処理を行う。処理の結果、単語データ「コンピューター」は探査済データには含まれていないので、上記未探査データ７０１（コンピューター）を探査済データに追加処理を行い、ステップ５１０に移行する。ステップ５１０の基底登録確認処理において上記「コンピューター」は基底データベース４００に存在しないと判断されて処理はステップ５１２に移行する。 In step 509, the unexplored data 701 (computer) generated from the interpretation sentence 108 and stored in the unexplored list (FIG. 13B) and the exploration hierarchy information (related to the unexplored data 701) ( A value of “2”) is read out and a search completion determination process is performed. Since the word data “computer” is not included in the searched data as a result of the processing, the unsearched data 701 (computer) is added to the searched data, and the process proceeds to step 510. In the base registration confirmation process in step 510, it is determined that the “computer” does not exist in the base database 400, and the process proceeds to step 512.

ステップ５１２の基底判断処理において、上記「コンピューター」は基底リスト１８に含まれていないので、この単語データを探査済データに追加して処理をステップ５１４に移行する。 In the basic determination process in step 512, since the “computer” is not included in the basic list 18, this word data is added to the searched data, and the process proceeds to step 514.

ステップ５１４において、上記「コンピューター」は基底化不能データベースに存在しないので処理をステップ５１５に移行する（Ｎ）。ステップ５１５において探査済データは値が「２」であるので終了条件は満たさず処理をステップ５１６に移行する。ステップ５１６において、上記「コンピューター」は見出し語データ１０１に存在することが確認されるので処理をステップ５０６に移行する（Ｎ）。ステップ５０６において見出し語１０９「コンピューター」に対応する語釈文データ１０２（語釈文１１０）を辞書データ１００から読み出して、ステップ５０７、ステップ５０８の処理によって単語データ「電子計算機」を探査階層情報（値は「３」）と共に未探査リストに記憶し処理をステップ５１０に移行する。ステップ５０９において上記「電子計算機」は探査済データに存在すると判断されるので処理はステップ５１７に移行する。 In step 514, since the “computer” does not exist in the non-basisable database, the process proceeds to step 515 (N). In step 515, since the value of the searched data is “2”, the end condition is not satisfied, and the process proceeds to step 516. In step 516, since it is confirmed that the “computer” exists in the entry word data 101, the process proceeds to step 506 (N). In step 506, the word sentence data 102 (the word sentence 110) corresponding to the headword 109 “computer” is read from the dictionary data 100, and the word data “electronic computer” is searched for the search hierarchy information (value is determined by the processing in step 507 and step 508). “3”) is stored in the unexplored list and the process proceeds to step 510. In step 509, since it is determined that the “electronic computer” exists in the searched data, the process proceeds to step 517.

ステップ５１７は同レベル判断処理である。ステップ５１７において、基底判断部１５ａは探査情報蓄積部１４ａに記憶されている未探査リスト７００ａの中に、現在の処理対象である単語データ（「電子計算機」）の元単語データ（「コンピューター」）と、同一の元単語データを有する単語データが他に記憶されているか否かの判断を行う。この段階の未探査リストに「コンピューター」を呼び出し元単語とする単語データは存在しないので処理はステップ５１８に移行する。 Step 517 is the same level determination process. In step 517, the base determination unit 15a stores the original word data (“computer”) of the word data (“electronic computer”) that is the current processing target in the unexplored list 700a stored in the search information storage unit 14a. Then, it is determined whether other word data having the same original word data is stored. Since there is no word data having “computer” as the caller word in the unexplored list at this stage, the process proceeds to step 518.

ステップ５１８において基底判断部１５ａは上記元単語データである「コンピューター」が基底データベース４００のインデックスである単語４０１に既に記憶されているか否かの問い合わせ処理を行う。この処理の結果、元単語データ「コンピューター」は基底データベース４００のインデックスには存在しないので処理はステップ５１９に移行する。ステップ５１９は基底化不能データベース登録処理である。ステップ５１９において基底判断部１５ａは基底化不能データベース４０に上記読み出し元単語データ「コンピューター」を追加記憶し、処理をステップ５２０に移行する。 In step 518, the base determination unit 15 a performs an inquiry process as to whether or not “computer” that is the original word data is already stored in the word 401 that is an index of the base database 400. As a result of this processing, the original word data “computer” does not exist in the index of the base database 400, so the processing moves to step 519. Step 519 is a non-basisable database registration process. In step 519, the basis determination unit 15a additionally stores the read source word data “computer” in the non-basisable database 40, and the process proceeds to step 520.

ステップ５２０は終了判断処理である。ステップ５２０において終了条件判断部１７は探査情報蓄積部１４ａに記憶する未探査リスト７００ａに何らかの単語データが記憶されているか否かの問い合わせ処理を行う。本実施例では未探査リストに単語データは記憶されているので、処理はステップ５１７に移行する。 Step 520 is an end determination process. In step 520, the end condition determination unit 17 performs an inquiry process as to whether or not any word data is stored in the unsearched list 700a stored in the search information storage unit 14a. In this embodiment, since word data is stored in the unexplored list, the process proceeds to step 517.

ステップ５１７において、上記基底化不能データベース登録処理（ステップ５１９）で基底化不能データベース４０に登録した単語と同じ元単語データを有する単語データが未探査リストに存在するか問い合わせ処理を行う。本実施例において未探査リストに単語データ「電子計算機」（図１３（ｃ））を呼び出し元単語データとして持つ別の単語データは存在しないので処理はステップ５１８に移行する。 In step 517, an inquiry process is performed to check whether word data having the same original word data as the word registered in the non-basisable database 40 in the non-basisability database registration process (step 519) exists in the unsearched list. In this embodiment, since there is no other word data having the word data “electronic computer” (FIG. 13C) as the caller word data in the unexplored list, the process proceeds to step 518.

ステップ５１８において、上記単語データ「電子計算機」が基底データベース４００に記憶されているか否かの照合処理を行う。単語データ「電子計算機」は基底データベース４００に記憶されていないため、処理はステップ５１９に移行する。ステップ５１９において、基底判断部１５ａは、基底化不能データベースに上記単語データ「電子計算機」を記憶する。次にステップ５２０において終了判断処理を行う。 In step 518, collation processing is performed to determine whether or not the word data “electronic computer” is stored in the base database 400. Since the word data “electronic computer” is not stored in the base database 400, the process proceeds to step 519. In step 519, the basis determination unit 15a stores the word data “electronic computer” in the non-basisable database. Next, in step 520, end determination processing is performed.

ステップ５２０において未探査リストには単語データが存在するかことが判断されるので処理はステップ５１７に移行する。ステップ５１７において上記基底化不能データベース登録処理で登録した単語と同じ呼び出し元単語を呼び出し元単語として持つ単語データが未探査リストに存在するか否か判断処理を行う。処理の結果、未探査リストには上記読み出し元単語データと同じ読み出し元単語を有する単語データが存在することが判断され、処理をステップ５０９に移行する。 In step 520, since it is determined whether word data exists in the unexplored list, the process proceeds to step 517. In step 517, it is determined whether or not word data having the same caller word as the caller word registered in the non-basisable database registration process exists in the unsearched list. As a result of the process, it is determined that there is word data having the same read source word as the read source word data in the unsearched list, and the process proceeds to step 509.

ステップ５０９において、未探査リストの記憶先頭位置にある単語データと探査階層情報を読み出して、探査済判断処理を行う。処理の結果、読み出した単語データ「異なる」は未探査であると判断される。次にステップ５１０において基底登録確認処理を上記単語データ「異なる」に対して行う。基底データベース４００の単語４０１には「異なる」は存在しないので処理はステップ５１２に移行する。ステップ５１２の基底判断処理を上記「異なる」に対して行い、当該単語データ「異なる」を探査済データに追加記憶する。上記判断処理の結果「異なる」は基底リスト１８に存在しないと判断されるので、処理はステップ５１４に移行する。 In step 509, the word data and the search hierarchy information at the storage start position of the unsearched list are read out, and a search completed determination process is performed. As a result of the processing, it is determined that the read word data “different” has not been searched. Next, in step 510, the base registration confirmation process is performed on the word data “different”. Since there is no “different” in the word 401 of the base database 400, the process proceeds to step 512. The basic judgment process of step 512 is performed for the “different”, and the word data “different” is additionally stored in the searched data. Since it is determined that “different” is not present in the base list 18 as a result of the determination process, the process proceeds to step 514.

ステップ５１４において、上記単語データ「異なる」に対して基底化不能判断処理を行う。処理の結果、基底化不能データベースには「異なる」は記憶されていないことが判断されるので処理はステップ５１５に移行する。 In step 514, a non-basisable determination process is performed on the word data “different”. As a result of the processing, it is determined that “different” is not stored in the non-basisable database, and the processing moves to step 515.

上記単語データ「異なる」の探査階層情報は１であるのでステップ５１５の探査終了判断処理によってステップ５１６に移行する。ステップ５１６では単語データ「異なる」が辞書データ１００の見出し語データ１０１に記憶されているか否かの照合処理を行う結果、「異なる」は見出し語データ１０１には存在しないことが判断されるので処理はステップ５１７に移行する。 Since the search hierarchy information of the word data “different” is 1, the process proceeds to step 516 by the search end determination process in step 515. In step 516, as a result of checking whether or not the word data “different” is stored in the headword data 101 of the dictionary data 100, it is determined that “different” does not exist in the headword data 101. Shifts to step 517.

ステップ５１７において単語データ「異なる」の読み出し元単語「インターフェース」を読み出し元単語とする、別の単語データが未探査リストに存在するか否かの照合処理を行う。この処理の結果、未探査リストには該当する単語データが存在することが判断され、処理はステップ５０９に再帰する。次にステップ５０９において、基底判断部１５ａは未探査リストの記憶先頭位置に記憶されている単語データ「機器」を読み出して探査済データに存在するか否かの探査済確認処理を行う。処理の結果、「機器」は探査済データに存在しないことが判断されるのでこの「機器」を探査済データに追加して、処理はステップ５１０に移行する。 In step 517, collation processing is performed to determine whether another word data exists in the unsearched list using the read source word “interface” of the word data “different” as the read source word. As a result of this process, it is determined that the corresponding word data exists in the unexplored list, and the process returns to step 509. In step 509, the base determination unit 15 a reads the word data “device” stored at the storage start position of the unsearched list and performs a search completion confirmation process to determine whether it exists in the searched data. As a result of the processing, it is determined that “device” does not exist in the searched data, so this “device” is added to the searched data, and the processing moves to step 510.

次にステップ５１０において基底判断部１５ａは、上記単語データ「機器」に対して基底判断処理を行う。処理の結果「機器」は基底リストに含まれていることが判断されるので、処理をステップ５１３に移行する。ステップ５１３において基底判断部１５ａは基底データベース４００の単語４０１として「機器」を追加し、「機器」に該当する基底４０２に「１」を追加し、「機器」に該当しない基底４０２には「０」を追加する。 Next, in step 510, the base determination unit 15 a performs base determination processing on the word data “device”. Since it is determined that “device” is included in the base list as a result of the processing, the processing proceeds to step 513. In step 513, the base determination unit 15a adds “device” as the word 401 of the base database 400, adds “1” to the base 402 corresponding to “device”, and “0” for the base 402 not corresponding to “device”. "Is added.

次にステップ５１１において、基底判断部１５ａは、上記処理で基底であると判断された単語データ「機器」の読み出し元単語データ「インターフェース」が基底データベースの単語４０１に記憶されているか否かの照合処理を行い、処理の結果、基底データベースの単語４０１即ちインデックスデータに記憶されていなければ、この読み出し元単語データ「インターフェース」を基底データベース４００のインデックスとして追加記憶し、追加記憶した単語４０１「インターフェース」に基底４０２の中で、上記基底であると判断された単語データ「機器」に該当する箇所に「１」を追加し、それ以外の箇所には「０」を追加して処理はステップ５１７に移行する。 Next, in step 511, the base determination unit 15 a checks whether or not the read source word data “interface” of the word data “device” determined to be the base in the above processing is stored in the word 401 of the base database. If the result of the processing is not stored in the word 401 of the base database, that is, the index data, this read source word data “interface” is additionally stored as an index of the base database 400, and the additionally stored word 401 “interface” In the base 402, “1” is added to the part corresponding to the word data “device” determined to be the base, and “0” is added to the other part, and the process goes to step 517. Transition.

ステップ５１７において既に説明したとおり、上記単語データ「機器」の読み出し元単語である「インターフェース」を読み出し元単語データとする単語データが未探査リストに存在するか否かの判断処理を行う。このように未探査リスト内の単語データに対して基底判断処理を終了するまで、本処理は再帰的に実行される。 As already described in step 517, it is determined whether or not there is word data in the unexplored list that uses “interface” that is the read source word of the word data “device” as the read source word data. As described above, this processing is recursively executed until the base determination processing is completed for the word data in the unsearched list.

次に本発明に係る基底リストの生成方法について説明する。図１０は基底リスト生成方法に用いる基底リスト生成装置の構成を示す機能ブロック図である。基底リスト生成装置は本発明に係る基底リスト生成方法を実行するコンピュータプログラムを搭載したコンピューターによって実施される。図１０に示す各機能ブロックは、本発明に係る基底リスト生成プログラムと当該コンピューターのハードウェアとが協働して処理を実行することにより実現するものである。 Next, a base list generation method according to the present invention will be described. FIG. 10 is a functional block diagram showing the configuration of the base list generation device used in the base list generation method. The base list generation apparatus is implemented by a computer equipped with a computer program for executing the base list generation method according to the present invention. Each functional block shown in FIG. 10 is realized by the cooperation between the base list generation program according to the present invention and the hardware of the computer executing the processing.

図１０において基底リスト生成装置は、基底リスト生成プログラム３ａと辞書データ記憶部１０、基底リスト記憶２０ａ，単語頻度データベース５０を有してなる。辞書データ記憶部１０は実施例１及び実施例２において用いた辞書データ記憶部と同様であり、上記辞書データ１００（図２参照）が記憶されている。単語頻度データベースは基底リスト生成プログラムの処理によって生成された情報を蓄積するデータベースである。基底リストデータベース２０ａは本発明の実施によって生成される基底リストを記憶するデータベースである。 In FIG. 10, the base list generation apparatus includes a base list generation program 3a, a dictionary data storage unit 10, a base list storage 20a, and a word frequency database 50. The dictionary data storage unit 10 is the same as the dictionary data storage unit used in the first and second embodiments, and stores the dictionary data 100 (see FIG. 2). The word frequency database is a database that accumulates information generated by processing of the base list generation program. The base list database 20a is a database that stores a base list generated by the implementation of the present invention.

図１２に単語頻度データベース５０に記憶する単語データベースの構造例を示す。図１２において、単語頻度データベース８００は辞書データ１００より読み出した情報に含まれる単語８０１と、単語８０１が辞書データ１００内に含まれる頻度８０２を記録するフィールドを有するテーブル構造によって構成される。しかし、後に説明する各単語の出現頻度数を記録できる構造であればこれに限ることなく本発明の実施は可能である。 FIG. 12 shows an example of the structure of a word database stored in the word frequency database 50. 12, the word frequency database 800 is configured by a table structure having a field for recording a word 801 included in information read from the dictionary data 100 and a frequency 802 in which the word 801 is included in the dictionary data 100. However, the present invention is not limited to this, as long as the structure can record the frequency of appearance of each word, which will be described later.

次に基底リスト生成プログラムの実施形態について説明する。図１１は基底リスト生成プログラムの処理の流れを示すプローチャートである。図１１において、最初に実行されるステップ６０１は見出し語読み出し処理である。この処理において辞書データ読み出し部３１は、辞書データ記憶部１０に記憶されている辞書データ１００の見出し語データ１０１を読み出す。ステップ６０１は語釈文読出し処理である。この処理において上記にて読み出された見出し語データ１０１に関連付いて記憶されている語釈文データ１０２を読み出し、読み出した語釈文データ１０２を形態素解析部３２に渡して、探査階層情報に１を加算し処理をステップ６０２に移行する。ここで探査階層情報の初期値はゼロである。 Next, an embodiment of the base list generation program will be described. FIG. 11 is a flowchart showing the flow of processing of the base list generation program. In FIG. 11, step 601 executed first is a headword reading process. In this process, the dictionary data reading unit 31 reads the entry word data 101 of the dictionary data 100 stored in the dictionary data storage unit 10. Step 601 is an interpretation sentence reading process. In this process, the word sentence data 102 stored in association with the headword data 101 read out above is read, the read word data 102 is passed to the morpheme analyzer 32, and 1 is set in the search hierarchy information. The addition processing proceeds to step 602. Here, the initial value of the exploration hierarchy information is zero.

ステップ６０３は形態素解析処理である。ステップ６０３において形態素解析部３２は、上記語釈文データ１０２に対して形態素解析処理を行う。ここで行う形態素解析処理は実施例１及び実施例２に用いた形態素解析処理を同じものである。ステップ６０４は表記ゆれ補正処理である。ステップ６０４において形態素解析部３２は語釈文データ１０２から生成された単語データに対して表記ゆれの補正を行う。この表記ゆれ補正処理は、実施例１及び実施例２に用いた表記ゆれ補正処理を同じ処理である。ステップ６０４の処理によって生成される単語データは、探査情報蓄積部３３に記憶される未探査リストに単語データを未探査データとして、この未探査データに関連付いて記憶される探査階層情報と共に記憶される。未探査リストの構造や未探査データ、探査階層情報については既に説明した実施例１及び実施例２に用いたものと同等のものである。 Step 603 is a morphological analysis process. In step 603, the morpheme analysis unit 32 performs a morpheme analysis process on the word interpretation data 102. The morpheme analysis process performed here is the same as the morpheme analysis process used in the first and second embodiments. Step 604 is a notation fluctuation correction process. In step 604, the morpheme analyzer 32 corrects the notation fluctuation for the word data generated from the word interpretation data 102. This notation fluctuation correction process is the same process as the notation fluctuation correction process used in the first and second embodiments. The word data generated by the process of step 604 is stored in the unexplored list stored in the search information storage unit 33 as word data as unsearched data together with the search hierarchy information stored in association with the unsearched data. The The structure of the unexplored list, the unexplored data, and the exploration hierarchy information are the same as those used in the first and second embodiments already described.

ステップ６０５は探査済判断処理である。ステップ６０５において単語頻度登録部３４は未探査リストから未探査データと探査階層情報を読み出し、読み出した未探査データが探査済データに含まれているか否かの問い合わせ処理を行い、当該未探査データを未探査リストから削除する。この処理の結果、当該未探査データが探査済データに含まれていれば、処理をステップ６０９に移行する。上記照合処理の結果、読み出した未探査データが探査済データに含まれていなければ、処理をステップ６０６に移行する。 Step 605 is a search completed determination process. In step 605, the word frequency registration unit 34 reads the unexplored data and the exploration hierarchy information from the unexplored list, performs an inquiry process as to whether or not the read unexplored data is included in the searched data, and stores the unexplored data. Delete from the unexplored list. As a result of this process, if the unexplored data is included in the searched data, the process proceeds to step 609. As a result of the collation process, if the read unexplored data is not included in the searched data, the process proceeds to step 606.

ステップ６０６は単語頻度登録処理である。ステップ６０６において単語頻度登録部３４は上記未探査データである単語データが単語頻度データベース５０に既に記録されているか否かを判断するための照合処理を行う。照合処理の結果、未記録であれば当該単語データを単語頻度データベース５０の単語８００即ちインデックスに追加し、その頻度８０２に数値「１」を記録する。上記照合処理の結果、単語頻度データベース５０に上記未探査データが既に記録されていれば、この未探査データに該当する単語８０１に関連付いている頻度８０２に１を加算して記録する。上記頻度８０２への記録処理が終了後、処理をステップ６０７に移行する。 Step 606 is a word frequency registration process. In step 606, the word frequency registration unit 34 performs a collation process for determining whether or not the word data that is the unexplored data is already recorded in the word frequency database 50. If it is not recorded as a result of the collation processing, the word data is added to the word 800 of the word frequency database 50, that is, an index, and a numerical value “1” is recorded in the frequency 802. If the unsearched data has already been recorded in the word frequency database 50 as a result of the matching process, 1 is added to the frequency 802 associated with the word 801 corresponding to the unsearched data and recorded. After the recording process to the frequency 802 is completed, the process proceeds to step 607.

ステップ６０７は終了判断処理である。ステップ６０７において終了条件判断部３５は上記未探査データと一緒に未探査リストから読み出した探査階層情報が終了条件である「４」以上であるか否かを判断する。この判断処理の結果、探査階層情報が４未満であれば、処理はステップ６０８に移行する。上記判断処理の結果、探査階層情報が４以上であれば処理はステップ６０９に移行する。 Step 607 is an end determination process. In step 607, the end condition determination unit 35 determines whether or not the exploration hierarchy information read from the unexplored list together with the unexplored data is equal to or greater than “4” as the end condition. If the result of this determination processing is that the exploration hierarchy information is less than 4, the processing moves to step 608. As a result of the determination process, if the exploration hierarchy information is 4 or more, the process proceeds to step 609.

ステップ６０８は見出し語確認処理である。ステップ６０８において辞書データ読み出し部３１は、探査情報蓄積部３３に記憶する未探査リストから未探査データを読み出し、見出し語データ１０１に該当するデータが存在するか否かの問い合わせ処理を行う。問い合わせ処理の結果、当該未探査データが見出し語データ１０１に存在すれば、当該未探査データを未探査リストから削除し、探査階層情報に１を加算して、上記見出し語データ１０１に関連付いて記憶されている語釈文データ１０２を読み出して形態素解析部３２に渡す。 Step 608 is a headword confirmation process. In step 608, the dictionary data reading unit 31 reads unsearched data from the unsearched list stored in the search information storage unit 33, and performs an inquiry process as to whether data corresponding to the entry word data 101 exists. If the unexplored data exists in the entry word data 101 as a result of the inquiry process, the unexplored data is deleted from the unexplored list, 1 is added to the search hierarchy information, and the entry word data 101 is related. The stored sentence interpretation data 102 is read and passed to the morpheme analyzer 32.

上記問い合わせ処理の結果、上記未探査データが見出し語データ１０１に存在しなければ、この未探査データを未探査リストから削除して、処理をステップ６０９に移行する。 If the unsearched data does not exist in the entry word data 101 as a result of the inquiry process, the unsearched data is deleted from the unsearched list, and the process proceeds to step 609.

ステップ６０９は終了判断処理である。ステップ６０９において、終了条件判断部３５は未探査リストに未探査データが存在するか否かの問い合わせ処理を行う。この処理の結果、未探査データが存在していれば処理をステップ６０６に移行し、未探査データが存在していなければ探査済データをクリアして探査階層情報の値をゼロにして、処理を６１０に移行する。 Step 609 is an end determination process. In step 609, the termination condition determination unit 35 performs an inquiry process as to whether or not unsearched data exists in the unsearched list. If unexplored data exists as a result of this processing, the process proceeds to step 606. If unexplored data does not exist, the searched data is cleared and the value of the exploration hierarchy information is set to zero. Transition to 610.

ステップ６１０は、全終了判断処理である。ステップ６１０において終了条件判断部３５は、見出し語データ１０１に関連付いて記憶されている読み出し済フラグ１１２に空欄があるか否かの問い合わせ処理を行う。この処理の結果、空欄があればその見出し語データ１０１は処理が行われていないことになるので本プログラムの処理をステップ６０１に再帰させる。また、上記処理の結果、読み出し済フラグ１１２に空欄が無ければ本プログラムを終了させる。 Step 610 is an all end determination process. In step 610, the end condition determination unit 35 performs an inquiry process as to whether or not there is a blank in the read flag 112 stored in association with the entry word data 101. As a result of this processing, if there is a blank, the entry word data 101 has not been processed, so the processing of this program is recursed to step 601. As a result of the above processing, if there is no blank in the read flag 112, the program is terminated.

ステップ６１１は、基底リスト抽出処理である。ステップ６１１において、基底リスト登録部３６は、単語頻度データベースに記憶された頻度によってソート処理を行い、ソート処理によって上位５００位に位置する単語８０１に係る単語データを抽出して、抽出した単語データを基底リストデータベースに登録する。なお、単語頻度データベースから単語データを抽出する条件は上記のように一定の閾値を設けて、抽出する方法の他、統計的手法によって抽出するものであってもよい。 Step 611 is a base list extraction process. In step 611, the base list registration unit 36 performs a sort process according to the frequency stored in the word frequency database, extracts word data related to the word 801 located in the top 500 by the sort process, and extracts the extracted word data. Register in the base list database. The condition for extracting word data from the word frequency database may be extracted by a statistical method in addition to the extraction method by providing a constant threshold as described above.

以上より、辞書データ１００を用いて基底を生成することが可能となる。 As described above, a base can be generated using the dictionary data 100.

上述したように、この発明によれば、自然言語を解釈するにあたり、単語の意味概念に基づいた精度の高い単語シソーラスを構築するための基礎となる単語の意味概念抽出法を提供できる効果がある。 As described above, according to the present invention, it is possible to provide a word semantic concept extraction method that is the basis for constructing a highly accurate word thesaurus based on a word semantic concept when interpreting a natural language. .

本発明に係る基底データベース生成装置の実施形態を示す機能ブロック図である。It is a functional block diagram which shows embodiment of the base database production | generation apparatus which concerns on this invention. 本発明に係る基底データベース生成方法に用いる辞書データの例を示す図である。It is a figure which shows the example of the dictionary data used for the base database production | generation method concerning this invention. 本発明に係る基底データベース生成方法によって生成される基底データベースの例を示す図である。It is a figure which shows the example of the base database produced | generated by the base database production | generation method concerning this invention. 本発明に用いることができる未探査リストの例を示す図である。It is a figure which shows the example of the unexplored list | wrist which can be used for this invention. 本発明に用いることができる基底リストの例を示す図である。It is a figure which shows the example of the base list | wrist which can be used for this invention. 本発明に係る基底データベース生成方法を実施するコンピュータプログラムの処理の例を示すフローチャートである。It is a flowchart which shows the example of a process of the computer program which implements the base database production | generation method concerning this invention. 本発明に用いる形態素解析処理による結果の例を示す図である。It is a figure which shows the example of the result by the morphological analysis process used for this invention. 本発明に係る基底データベース生成装置の別の実施例を示す機能ブロック図である。It is a functional block diagram which shows another Example of the basic database production | generation apparatus which concerns on this invention. 本発明に係る基底データベース生成方法を実施する別のコンピュータプログラムの処理の例を示すフローチャートである。It is a flowchart which shows the example of a process of another computer program which implements the base database production | generation method concerning this invention. 本発明に係る基底リスト生成装置の実施形態を示す機能ブロック図である。It is a functional block diagram which shows embodiment of the base list production | generation apparatus which concerns on this invention. 本発明に係る基底リスト生成方法を実施するコンピュータプログラムの処理例を示すフローチャートである。It is a flowchart which shows the process example of the computer program which implements the base list production | generation method concerning this invention. 本発明に係る基底リスト自動作成方法によって生成される単語頻度データベースの構造例を示す図である。It is a figure which shows the structural example of the word frequency database produced | generated by the base list automatic creation method which concerns on this invention. 本発明に用いることができる未探査リストの別の例を示す図である。It is a figure which shows another example of the unexplored list | wrist which can be used for this invention.

Explanation of symbols

１基底データベース生成装置
１００辞書データ
１０１見出し語データ
１０２語釈文データ
２００基底リスト
４００基底データベース
４０１単語
４０２基底
６００解析結果データ
７００単語頻度データベース DESCRIPTION OF SYMBOLS 1 Base database production | generation apparatus 100 Dictionary data 101 Headword data 102 Word interpretation data 200 Base list 400 Base database
401 word 402 base 600 analysis result data 700 word frequency database

Claims

A computer comprising a first storage means for storing a word in association with one or more words explaining the meaning of the word, and a second storage means for storing the word in association with a base, a word base database A method of generating,
The computer
Reading the word from the first storage means;
A word reading step for reading the word sentence stored in association with the word read in the step from the first storage means;
A word dividing step for dividing the above sentence into words;
A base determination step for determining whether a word read from the first storage means or a word divided from the interpretation sentence corresponds to a base;
If it is determined in the basis determination step that the word corresponds to a basis, a base association step of associating the word with the base and storing it in the second storage means;
When it is determined that the word does not correspond to a base in the base determination step, the word reading step, the word division step, and the recursion step for performing the base determination step recursively using the word are included. Characteristic base database generation method.

The recursion step includes a determination step for determining whether or not a basic determination process has been performed on all words divided from the interpretation sentence;
When the same word appears as a determination target in the basis determination step, a step of reading a basis already associated with the word from the second storage means;
2. The base database creation method according to claim 1, further comprising the step of associating the word used for reading the interpretation sentence with the read base and storing it in a second storage means.

First storage means for storing a word and one or more word sentences explaining the meaning of the word in association with each other, second storage means for storing the word and the appearance frequency of the word in association with each other, and third storage means for storing a base list A base list for use in the base database of this word by a computer equipped with:
The computer
Reading a word from the first storage means;
A word reading step for reading a word sentence stored in association with the word read from the first storage means;
A word dividing step for dividing the read word sentence into words;
Registering the divided words in the second storage means;
Reading a sentence corresponding to the divided word from the first storage means;
Recursively performing word division processing, word registration processing, and word interpretation data reading processing on the read word sentence;
And a step of extracting only specific words using the word frequency information stored in the second storage means and storing them in the third storage means.

First storage means for storing a word and one or more word sentences explaining the meaning of the word in association with each other; second storage means for storing the word and the base in association with each other;
Means for reading out the word from the first storage means;
A word reading means for reading a word sentence stored in association with the read word from the first storage means;
A word dividing means for dividing the read word sentence into words;
Base judgment means for judging whether a word read from the first storage means or a word divided from the interpretation sentence corresponds to a base;
A base associating means for associating the word with a base and storing it in the second storage means when the base judging means determines that the word corresponds to a base;
A recursion unit that recursively operates the word sentence reading unit, the word dividing unit, and the base determination unit using the word when the base determination unit determines that the word does not correspond to a base. A base database generation device characterized by that.

A judging means for judging whether or not processing by the base judging means has been performed on all words divided from the interpretation sentence;
A reading means for reading a base associated with the word from the second storage means when it is determined that the word has already been processed by the determination means;
5. The base database generation apparatus according to claim 4, further comprising means for associating a word used for reading the interpretation sentence with the read base and storing it in a second storage means.

First storage means for associating and storing a word and one or more interpretations explaining the meaning of the word;
Second storage means for storing the word and the appearance frequency of the word in association with each other;
Means for reading a word from the first storage means;
Means for reading an interpretation sentence stored in association with the word read from the first storage means;
Means for dividing the read-out sentence into words;
Means for registering the word in the second storage means;
If the word is already registered in the second storage means, means for adding 1 to the frequency information of the same word already registered;
And a third storage means for extracting and storing only specific words using the word frequency information stored in the second storage means.

A computer comprising first storage means for storing a word and one or more word explanations explaining the meaning of the word in association with each other, and second storage means for storing the word in association with a base;
Means for reading out the word from the first storage means;
Means for reading out the memorized sentence stored in association with the read word from the first storage means;
A word dividing means for dividing the read word sentence into words;
Base judgment means for judging whether a word read from the first storage means or a word divided from the interpretation sentence corresponds to a base;
A base associating means for associating a word determined to be a base in the base determining means as a base corresponding to the read word and storing it in a second storage means;
When the basis judgment means determines that the divided word does not correspond to the basis, the word sentence corresponding to the word is read from the first storage means, and the word division means for the read word sentence and the basis judgment means for the word A computer program that operates as a recursive unit that recursively performs the above operation until a word divided from an interpretation sentence becomes a base.

Computer
Determining means for determining whether or not the processing by the base determining means has been performed on the words divided by the dividing means;
A reading means for reading a base associated with the word from the second storage means when it is determined that the word has already been processed by the determining means;
8. The computer program according to claim 7, wherein the computer program is operated as means for storing in the second storage means in association with the word used for reading the word sentence that is the source of the divided word with respect to the read base. .

Computer
A first storage means for associating and storing a word and one or more words explaining the meaning of the word;
Second storage means for storing the word and the appearance frequency of the word in association with each other;
Means for sorting according to the appearance frequency,
Third storage means for extracting and storing the rearranged words;
Means for reading a word from the first storage means;
Means for reading a sentence that is stored in association with the read word;
Means for dividing the read word sentence into words;
Means for registering the divided words in the second storage means;
Means for reading the sentence corresponding to the divided word from the first storage means;
A computer program that operates as a means for recursively performing the dividing process, the registering process, and the reading process of an interpretation sentence.