JP2000259627A

JP2000259627A - Device and method for deciding relation between natural language sentences, retrieving device and method utilizing the deciding device and method and recording medium

Info

Publication number: JP2000259627A
Application number: JP11060046A
Authority: JP
Inventors: Kimiaki Shudo; 公昭首藤; Yasuo Koyama; 泰男小山
Original assignee: AI SOFT KK
Current assignee: AI SOFT KK
Priority date: 1999-03-08
Filing date: 1999-03-08
Publication date: 2000-09-22

Abstract

PROBLEM TO BE SOLVED: To provide technology for suitably deciding similarity between two sentences by grammatical level processing and retrieving a sentence based on the decision. SOLUTION: Respective words are extracted from an input character string A consisting of m words from a word a1 up to a word am and a comparing character string B consisting of n words from a word b1 up to a word bn, and the distance t(ai, bj) between respective words is found out by referring to a synonym dictionary 36 or the like. Arithmetic processing is executed by using the value of the distance t(ai, bj) between respective words and the values of omission costs r, q of words (steps S515 to S525) and distances d(ai, bj) between respective word strings are successively found out in all word strings considered on the assumption that the order of words is kept (step S535). After finding out the values of distances d(am, bn) of word strings including all words, the value of similarity s(am, bn) between respective character strings is found out by arithmetic processing using the value of the distance d(am, bn) and the similarity between respective character strings is decided on the basis of the found value.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、自然言語を扱う手
法に関し、詳しくは自然言語文を対象として、二つの文
間の関係を判定する装置およびその判定方法ならびにそ
の判定を行なう機能を記録した記録媒体に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a method for handling a natural language, and more particularly, to an apparatus for determining the relationship between two sentences in a natural language sentence, and a method for determining the relationship and a function for performing the determination. It relates to a recording medium.

【０００２】[0002]

【従来の技術】人間がコミュニケーションに用いる言語
は、プログラミング用の人工的な言語に対して自然言語
と呼ばれるが、その形態は、言語を用いて行なわれるコ
ミュニケーションの総体と考えるべきであり、論理的な
少数の原則を組み合わせたものとし把握することはでき
ない。いわゆる文法も、各言語について存在するが、こ
れは自然言語に存在する膨大なルールの一部を、いくつ
かのわかりやすい規則により整理しようとする試みに過
ぎず、自然言語を完全に記述するものでないことは良く
知られている。2. Description of the Related Art A language used by humans for communication is called a natural language, as opposed to an artificial language for programming, and its form should be considered as a whole of communication performed using a language. It cannot be grasped as a combination of a small number of principles. There are so-called grammars for each language, but this is just an attempt to organize some of the vast rules that exist in natural language into some easy-to-understand rules, and does not completely describe natural language. It is well known.

【０００３】こうした自然言語を取り扱う技術は、日本
では、例えば、仮名漢字変換という形で独自の発達を遂
げている。入力された仮名文字に基づいて、入力者が期
待する仮名漢字混じり文を得るためには、文法的な解析
だけではなく、最近では用例変換や係り受けを用いた変
換などの手法が実現されている。このような手法を用い
ることで、「あつい」という仮名文字を、「夏が暑い」
の場合と「お茶が熱い」の場合との間で区別して変換す
ることを可能としている。[0003] In Japan, such a technology for handling natural languages has been independently developed, for example, in the form of kana-kanji conversion. Based on the input kana characters, in order to obtain the kana-kanji mixed sentence expected by the input person, not only grammatical analysis but also recent methods such as example conversion and conversion using dependency are realized. I have. By using such a method, the kana character "Atsu" can be changed to "Hot summer"
And the case of "tea is hot".

【０００４】自然言語に関する他の大きな技術として
は、自然言語文の検索や機械翻訳、更に最近では要約文
の生成などが知られている。自然言語文の検索は、検索
対象文字列内に、検索しようとする語と完全一致の文字
列があるかを検索する手法を基本とし、更に複数の検索
語の検索結果の論理和、論理積による検索や、シソーラ
スを用いた概念類似語の検索などが実用化されている。
一例を挙げると、検索しようとする文（以下、「検索キ
ー文」と呼ぶ）として、「東京のうまい店」という語句
を入力して、インターネットのホームページを検索する
ケースでは、単純に概念表現である「東京」や「うまい
店」を検索キーとして用意し、複合語検索を行なったの
では、「東京」の類義語や上位，下位の概念を示す表現
と考えられる「首都圏」や「都内」などの用語を用いた
説明文を検索することはできない。同様に、「うまい
店」については、「名店」や「グルメ」といった言葉
を、検索キーとして用意することが必要になる。こうし
た自然言語文の検索は、例えば全世界に存在するインタ
ーネットのホームページの検索や、大量に蓄積された論
文などの検索において、極めて有用である。[0004] As other large technologies relating to natural languages, search and machine translation of natural language sentences, and more recently, generation of summary sentences are known. Natural language sentence search is based on a method of searching for a character string that exactly matches the word to be searched in the search target character string, and furthermore, a logical sum and a logical product of search results of a plurality of search words. , Search for conceptual similar words using a thesaurus, etc. have been put to practical use.
As an example, in the case of entering the phrase “Tokyo's delicious shop” as a sentence to be searched (hereinafter referred to as a “search key sentence”) and searching the Internet homepage, simply use a conceptual expression. By preparing a certain "Tokyo" or "Delicious shop" as a search key and performing a compound word search, "Tokyo" or "Tokyo", which is considered to be a synonym of "Tokyo" or an expression indicating a higher or lower concept It is not possible to search for explanatory sentences using such terms. Similarly, for “delicious shop”, it is necessary to prepare words such as “famous shop” and “gourmet” as search keys. Such a search for a natural language sentence is extremely useful, for example, in a search of a homepage of the Internet existing in the whole world and a search of a large amount of articles.

【０００５】かかる検索についての提案としては、「イ
ンデックス文の類似性に基づく映像検索」（山田一郎
他、第５回国立国語研究所国際シンポジウム第１専門部
会発表論文、１９９７年８月）や、「構文付きコーパス
の作成と類似用例検索システムへの応用」（兵藤安昭
他、「自然言語処理」Ｖｏｌ３，Ｎｏ．２、１９９７年
８月）などがある。これらの論文では、国立国語研究所
編纂の「分類語彙表」を用い、名詞や動詞を対象とし
て、単語間の類似度を考慮した検索を行なっている。ま
た、「○○が、△△を、□□する」といった構文パター
ンの一致を、前提として各単語間の類似度の判定と、文
全体の類似度の判定とを行なっている。[0005] Proposals for such a search include “video search based on similarity of index sentences” (Ichiro Yamada et al., The 5th National Symposium on International Symposium, 1st Subcommittee, August 1997), Yasuaki Hyodo et al., "Natural Language Processing," Vol 3, No. 2, August 1997. In these papers, we use a "classified vocabulary table" compiled by the National Institute for Japanese Language and Language, and search for nouns and verbs in consideration of the similarity between words. Further, the determination of the similarity between each word and the determination of the similarity of the entire sentence are performed on the premise of the matching of the syntax patterns such as “XX changes △△ to □□”.

【０００６】他方、機械翻訳では、言語間の類似度によ
りいくつかのアプローチが提案されている。例えば、ド
イツ語とフランス語間のように、文法の根本的な規則が
同一の語族に属する言語間では、文を構成する要素間の
置き換えを基本とする手法でも、ある程度の翻訳は可能
である。これに対して、屈折語に属する英語と膠着語の
一つである日本語との間の翻訳などは、語の置き換えに
よって翻訳することは困難であることが知られている。
そこで、構文解析過程を経て、構文対構文の翻訳が試み
られてきたが、解析結果の多様性を絞り込むことが容易
ではないなどの問題点が多く、十分な成果は得られてい
ない。このような状況から、近年では、翻訳者により翻
訳された大量の翻訳例を収集し、翻訳しようとする文が
与えられたとき、この文に類似した文を検索し、その訳
文を参照して単語の置き換えなどにより翻訳していくと
いった手法が注目されている。この場合には、大量の例
文から、翻訳しようとする文に近い構造の文（類似度の
高い文）を検索することが行なわれている。On the other hand, several approaches have been proposed for machine translation depending on the similarity between languages. For example, between languages belonging to the same family with the same fundamental rules of grammar, such as between German and French, a certain degree of translation is possible even with a method based on replacing elements constituting a sentence. On the other hand, it is known that, for example, translation between English, which belongs to the inflected word, and Japanese, which is one of the sticky words, is difficult to translate by replacing words.
Therefore, translation of syntax versus syntax has been attempted through a syntax analysis process, but there are many problems such as difficulty in narrowing down the diversity of analysis results, and sufficient results have not been obtained. Under such circumstances, in recent years, a large number of translation examples translated by a translator are collected, and when a sentence to be translated is given, a sentence similar to this sentence is searched, and the translated sentence is referred to. Attention has been paid to a method of performing translation by replacing words. In this case, a sentence having a structure similar to the sentence to be translated (a sentence with high similarity) is searched from a large number of example sentences.

【０００７】こうした自然言語による文についての検索
や翻訳、更には要約文の作成などの処理を考えると、最
終的には、自然言語による文が表わしている意味につい
ての解析が必要になると考えられる。あるいは、意味に
至る手前の技術として、自然言語により表現された表現
例を大量に用意し、これらを参照する手法を考えること
ができる。前者については、意味規則の設定の難しさ等
もあり、ニューラルネットワークを用いた意味推論やエ
キスパートシステムなどが提案されている。また、後者
については、近年大規模な用例辞書あるいは係り受け辞
書が使用可能な状況になっており、例えば仮名漢字変換
において、「夏が暑い」と「お茶が熱い」とを正しく変
換しようとする提案がなされている。[0007] Considering such processing as searching and translating a sentence in a natural language, and further preparing a summary sentence, it is considered that analysis of the meaning of the sentence in the natural language is ultimately necessary. . Alternatively, as a technique before reaching the meaning, a method of preparing a large number of expression examples expressed in a natural language and referring to them can be considered. Regarding the former, there is difficulty in setting semantic rules, etc., and semantic inference using neural networks and expert systems have been proposed. In the latter case, large-scale example dictionaries or dependency dictionaries have recently become available, and for example, in kana-kanji conversion, it tries to correctly convert "summer is hot" and "tea is hot". A proposal has been made.

【０００８】[0008]

【発明が解決しようとする課題】しかしながら、従来の
自然言語の処理では、検索や翻訳について、未だ十分な
処理ができないという問題があった。自然言語による文
を処理する際、精度の高い検索などを行なうとすると、
最終的には、文が表わしている意味を扱う必要が生じる
と考えられるが、意味を簡易に扱う技術は未だ実用化さ
れておらず、現時点では検索や翻訳に直ちに適用するこ
とができない。かといって、従来の複合語検索程度の技
術では、大量の自然言語データを、精度良く扱うことが
できない。However, conventional natural language processing has a problem in that it is not possible to perform sufficient search and translation processing. When processing sentences in natural language, if you want to perform high-precision searches,
Eventually, it will be necessary to deal with the meaning represented by the sentence. However, a technique for simply handling the meaning has not yet been put to practical use, and cannot be immediately applied to search and translation at present. On the other hand, the conventional technology of compound word search cannot handle a large amount of natural language data with high accuracy.

【０００９】単語間の類似度を考慮した検索手法も提案
されていることは、上述した通りだが、名詞や動詞とい
った概念表現しか検討しておらず、しかも構文が一致し
ないと類似の意味を示す文であっても検索の対象から漏
れてしまう可能性が高い。「東京のうまい店」という語
句を例にとると、「首都圏の名店」や「都内のグルメガ
イド」は検索することができるとしても、「東京２３区
における名店」や「東京にだってうまい店」は、検索す
ることができない可能性が高い。他方、「東京以外のう
まい店」は検索してしまう可能性も高い。As described above, a search method that considers the similarity between words has also been proposed. However, only a conceptual expression such as a noun or a verb is examined, and if the syntax does not match, a similar meaning is shown. There is a high possibility that a sentence will be omitted from the search target. For example, if you use the phrase "Tokyo's delicious shops", you can search for "Great restaurants in the Tokyo metropolitan area" and "Gourmet guides in Tokyo", but you can search for "Good shops in the 23 wards of Tokyo" and "Good restaurants in Tokyo." Is likely not searchable. On the other hand, there is a high possibility that "good restaurants outside Tokyo" will be searched.

【００１０】本発明は、これらの問題点を解決し、意味
の認識という技術には深くは立ち入らず、主として文法
レベルの処理により、自然言語文同士の関係を適切に判
定したり、その判定に基づいて自然言語文の検索を行な
う技術を提案することを目的とする。The present invention solves these problems and does not enter into the technology of recognizing meaning deeply. Rather, it mainly determines the relationship between natural language sentences by processing at the grammar level. An object of the present invention is to propose a technology for searching for natural language sentences based on the sentence.

【００１１】[0011]

【課題を解決するための手段およびその作用・効果】上
記課題の少なくとも一部を解決するために、次の一連の
発明がなされたが、これらの技術は、つまるところ、自
然言語文判定装置を基礎に置いている。即ち、本発明の
自然言語文関係判定装置は、一定のまとまりを持った内
容を表わす所定言語の文であり、判定対象となる第１の
文と、該第１の文との関係が判定される第２の文とを入
力し、前記言語による文を構成する構成単位であって、
まとまった意味を持つものとして類別された構成単位を
用いて、前記第１の文と前記第２の文との関係を判定す
る自然言語文関係判定装置であって、前記類別された構
成単位のうち、意味概念を表わす構成単位として類別さ
れた概念表現と、文構造の枠組を支える表現に対応した
構成単位として抽出された枠組み表現とについて、少な
くとも該概念表現同士および枠組み表現同士の関係を表
わす情報を記憶した関係情報記憶手段と、前記第１の文
および第２の文から、前記構成単位を抽出する構成単位
抽出手段と、前記関係情報記憶手段に記憶された前記情
報を参照して、該抽出された第１および第２の文を構成
する前記構成単位のうち前記概念表現同士および前記枠
組み表現同士の関係を、語順の対応関係を考慮しつつ判
断することにより、前記第１の文と前記第２の文との関
係を判定する関係判定手段とを備えることを要旨として
いる。Means for Solving the Problems and Their Functions / Effects In order to solve at least a part of the above-mentioned problems, the following series of inventions have been made. Is placed. In other words, the natural language sentence relation determination device of the present invention is a sentence in a predetermined language representing content having a certain unity, and the relationship between the first sentence to be determined and the first sentence is determined. A second sentence, and a constituent unit constituting a sentence in the language,
A natural language sentence relation determination device that determines a relationship between the first sentence and the second sentence by using constituent units classified as having united meaning, wherein Of the conceptual expressions classified as structural units representing semantic concepts and the framework expressions extracted as structural units corresponding to the expressions supporting the framework of the sentence structure, at least the relationships between the conceptual expressions and the relationships between the framework expressions are expressed. With reference to the information stored in the relational information storage means, the constituent unit extraction means for extracting the constituent unit from the first sentence and the second sentence, and the relational information storage means, By determining the relationship between the conceptual expressions and the framework expressions among the constituent units constituting the extracted first and second sentences while considering the correspondence in word order. It is summarized in that and a determining relationship determination means the relationship between the second sentence and the first sentence.

【００１２】ここで、「文」とは、単語のまとまりによ
って何らかの意味内容を表わす言語表現をいい、主語，
述語を備える完結した表現のほか、言葉の一区切りであ
る句を単位とした表現や、一語文の組み合わせからなる
表現等を含む。例えば、「私は東京のうまい店を知りた
い。」という完結した文、「東京のうまい店」や「うま
い店」のような句，「先生、こんにちは」のような一語
文の組み合わせも、「文」に含まれる。Here, "sentence" refers to a linguistic expression that expresses some semantic content by a group of words.
In addition to a complete expression including a predicate, the expression includes an expression in units of a phrase that is a delimiter of a word, an expression composed of a combination of one-word sentences, and the like. For example, the statement "I want to. You know a good store of Tokyo", which was completed that, a phrase such as "Tokyo of good shops" and "good store", "Teacher, Hello" is also a combination of one word sentences such as, " Sentence ".

【００１３】この装置によれば、第１の文と第２の文と
から、これらの文に含まれる概念表現と枠組み表現とを
抽出し、少なくとも概念表現同士および枠組み表現同士
の関係を表わす情報を参照して、両文に含まれる表現の
語順の対応関係を考慮して、第１の文と第２の文との関
係を判定する。According to this device, the concept expression and the framework expression included in these sentences are extracted from the first sentence and the second sentence, and at least information indicating the relationship between the concept expressions and the relationship between the framework expressions is obtained. , The relationship between the first sentence and the second sentence is determined in consideration of the correspondence between the word orders of the expressions included in both sentences.

【００１４】ここで、「第１の文と第２の文との関係」
とは、第１の文と第２の文との間に認められる関わり合
いを意味し、例えば、文の意味が同じである，似てい
る，異なる，反対である等の文の意味に関する類否や異
同のほか、文の用法に関する類否や異同などが含まれ
る。この文の用法としては、例えば、文語文と口語文，
常体文と敬体文という文法的に区別可能なもののほか、
標準語を用いた文と方言を用いた文，男性が作成した文
と女性が作成した文，２０代の人が作成した文と５０代
の人が作成した文という、文法規則のみによっては区別
できないものを含む。このような関わり合いとしての各
要素を複数組み合わせたものを、「第１の文と第２の文
との関係」としても差し支えない。また、この第１の文
と第２の文との関係については、予め特定の関係（例え
ば類似関係）を指定しておくものとしても良いし、判定
に先立って、判定しようとする関係を、使用者が与える
ものとしても良い。Here, "the relation between the first sentence and the second sentence"
Means a perceived relationship between the first sentence and the second sentence, for example, a class relating to the meaning of a sentence such that the sentence has the same, similar, different, opposite, etc. In addition to nos and nos, it includes similarities and differences regarding the usage of sentences. The usage of this sentence is, for example, sentence sentences and colloquial sentences,
In addition to the grammatical distinctions of stereotypes and honorifics,
Sentences based only on grammatical rules: sentences using standard languages and dialects, sentences created by men and sentences created by women, sentences created by people in their 20s, and sentences created by people in their 50s Including things that cannot be done. A combination of a plurality of elements as such a relationship may be referred to as a “relationship between the first sentence and the second sentence”. As for the relationship between the first sentence and the second sentence, a specific relationship (for example, a similar relationship) may be specified in advance. It may be given by the user.

【００１５】概念表現とは、所定の言語による文を構成
する構成単位であって、まとまった意味を持つものとし
て類別された構成単位のうち、意味概念を表わす表現で
あり、名詞，動詞，形容詞などの概念語や、この概念語
同士の結合などが含まれる。また、枠組み表現とは、所
定言語による文を構成する構成単位であって、まとまっ
た意味を持つものとして類別された構成単位のうち、文
構造の枠組を支える表現に対応した構成単位として抽出
されたものである。発明者は、かかる枠組み表現を、膠
着語の一つである日本語について広く採取し、「日本語
の文構造のわく組を与える表現−機能カテゴリーと接続
ルール−」（福岡大学総合研究所報第６３号、昭和５８
年３月）および「日本語の文構造のわく組を与える表現
−構造的意味情報の整理−」（福岡大学総合研究所報第
６３号、昭和５８年３月）として公表している。枠組み
表現には、これらの論文で類別された関係表現（１の文
中において格関係，因果関係などの概念間関係を表わす
表現、日本語における格助詞、接続助詞、およびこれら
に相当する表現）や広義の様相情報（話し手や書き手の
判断や態度、時制、相、否定、態など）を与える日本語
における助述表現などが含まれる。The concept expression is a structural unit constituting a sentence in a predetermined language, and is an expression representing a semantic concept among constituent units classified as having a collective meaning, and is a noun, a verb, an adjective. And the combination of the concept words. In addition, the framework expression is a structural unit constituting a sentence in a predetermined language, and is extracted as a structural unit corresponding to an expression supporting a framework of a sentence structure among structural units classified as having a unity meaning. It is a thing. The inventor has widely sampled such framework expressions for Japanese, which is one of the sticky words, and said, "Expressions that give a framework of Japanese sentence structure-functional categories and connection rules-" (Report of Fukuoka University Research Institute) No. 63, Showa 58
March, 1988) and "Expression that gives a framework of Japanese sentence structure-arrangement of structural semantic information-" (Fukuoka University Research Institute Bulletin No. 63, March 1983). Framework expressions include relational expressions categorized in these papers (expressions representing inter-concept relations such as case relations and causal relations in one sentence, case particles in Japanese, connective particles, and expressions equivalent to these). Includes advisory expressions in Japanese that give information in a broader sense (speaker or writer's judgment and attitude, tense, aspect, negation, attitude, etc.).

【００１６】「枠組み表現同士の関係」とは、枠組み表
現と枠組み表現との間に認められる繋がりを意味し、こ
の関係の種類には、例えば、枠組み表現の意味が同じで
ある，似ている，異なる，反対である等の枠組み表現の
意味に関する類否や異同のほか、枠組み表現の属性に関
する異同や類否などが含まれる。この枠組み表現の属性
としては、例えば、枠組み表現の時制や態，推定の程
度，強調や限定の程度，肯定と否定の別，文語と口語の
別，常体と敬体の別などの文法的に区別可能なもののほ
か、標準語と方言の別、男言葉と女言葉の別，枠組み表
現を用いる年齢層のような文法規則のみによっては区別
できないものを考えることができる。勿論、上記した関
係の種類を複数組み合わせたものを、「枠組み表現同士
の関係」としてもよい。The term "relationship between framework expressions" means a connection recognized between framework expressions, and the type of this relationship is, for example, the same or similar in meaning of framework expressions. In addition to the similarity and similarity regarding the meaning of the framework expression such as, different, opposite, etc., the similarity and similarity regarding the attributes of the framework expression are included. Attributes of this framework expression include, for example, tense and state of the framework expression, degree of estimation, degree of emphasis and limitation, distinction between affirmation and negation, distinction between spoken and spoken words, distinction between ordinary and honorific, etc. In addition to those that can be distinguished from each other, those that cannot be distinguished only by grammatical rules such as the distinction between standard and dialects, the distinction between male and female languages, and the age group using framework expressions can be considered. Of course, a combination of a plurality of the above-described types of relationships may be referred to as a “relationship between framework expressions”.

【００１７】かかる自然言語文関係判定装置において、
第１の文と前記第２の文との関係を判定する際、該構成
単位の語順の対応関係として、抽出した構成単位の出現
順の相違を許容するものとすることも望ましい。自然言
語文同士の判定では、各語の語順が同一でないことも多
いからである。なお、前記語順の対応関係において出現
順の相違を許容する際、構成単位間の２組以上の対応関
係の交差を禁止しておくことも、判定処理を簡素化する
上で望ましい。In such a natural language sentence relation determining apparatus,
When determining the relationship between the first sentence and the second sentence, it is preferable that the difference in the appearance order of the extracted constituent units is allowed as the correspondence of the word order of the constituent units. This is because, in the judgment between natural language sentences, the word order of each word is often not the same. In order to simplify the determination process, it is also preferable to prohibit the intersection of two or more sets of correspondence between constituent units when allowing the difference in the order of appearance in the correspondence in word order.

【００１８】また、こうした自然言語文関係判定装置に
おいて、第１，第２の文を構成する前記構成単位のう
ち、前記概念表現同士の関係と、前記枠組み表現同士の
関係とを、前記関係情報記憶手段に記憶された情報を参
照してそれぞれ判定し、概念表現同士についての判定結
果と枠組み表現同士についての判定結果を、前記語順の
対応関係をとりつつ利用することにより、前記第１およ
び第２の文同士の関係を判定するものとしても良い。か
かる構成によれば、概念表現同士の関係と枠組み表現同
士の関係とを、個別に判断できるので、処理を簡素化す
ることができる。Further, in the natural language sentence relation determining apparatus, the relation between the conceptual expressions and the relation between the framework expressions among the constituent units constituting the first and second sentences is defined by the relation information. By making a determination with reference to the information stored in the storage means and using the determination result between the conceptual expressions and the determination result between the framework expressions while associating the word order, the first and the second are obtained. The relationship between the two sentences may be determined. According to such a configuration, the relationship between conceptual expressions and the relationship between framework expressions can be determined individually, so that the processing can be simplified.

【００１９】構成単位同士の関係を判定する際、第１ま
たは第２のいずれかの文から抽出された各構成単位に関
し、対応する構成単位が他方の文に必ず存在するとは限
らない。また、一方の文の一つの構成単位に対して、一
定の関係がある構成単位が他方の文に複数存在する場合
も考えられる。こうした場合に、対応する構成単位が脱
落していると判断した方が、全体として両文の関係を正
しく判定できることが考えられる。この判定は、対応す
る構成単位が存在しない場合の値を、脱落値として予め
設定しておき、語順の対応関係をとりながら、前記第
１，第２の文から抽出された構成単位同士のうち、所定
の関係にあるもの同士に、該関係に基づく値を、関係値
として付与し、該付与された関係値および前記設定され
た脱落値を評価することにより、行なうことができる。
この評価を行なうことで、第１および第２の文同士の関
係値を求め、この関係値の大小により、判定すればよ
い。When determining the relationship between constituent units, for each constituent unit extracted from either the first or second sentence, the corresponding constituent unit does not always exist in the other sentence. It is also conceivable that one constituent unit of one sentence has a plurality of constituent units having a certain relation in the other sentence. In such a case, it is considered that the relationship between the two sentences can be correctly determined as a whole by determining that the corresponding constituent unit is missing. In this determination, a value in the case where the corresponding constituent unit does not exist is set in advance as a missing value, and the constituent units extracted from the first and second sentences are determined while associating the word order. It can be performed by assigning a value based on the relationship as a relationship value to those having a predetermined relationship, and evaluating the assigned relationship value and the set dropout value.
By performing this evaluation, a relationship value between the first and second sentences is obtained, and the determination may be made based on the magnitude of the relationship value.

【００２０】対応する構成単位が存在しない場合の脱落
値は、一律の値に設定しても良いが、関係を判定する文
の長さに応じて可変するものとしても良い。構成単位の
多い文は冗長度が高いと考えれば、脱落の影響を小さく
評価するよう設定することが望ましい。また、脱落値
を、構成単位の重要度に応じた値に設定する手段とする
ことも、文と文との関係をより実質的に判定できる点で
望ましい。さらに、他方の文に存在しない構成単位が、
概念表現であるか枠組み表現であるかによって、異なる
脱落値を設定するものとしてもよい。判定しようとする
関係が、例えば、文間の類似度の場合、概念表現の脱落
は類似度の判定に及ぼす影響が大きいと考えられるか
ら、概念表現が脱落している場合には、枠組み表現が脱
落している場合より大きく評価されるものとしておくこ
とが考えられる。また、意味内容ではなく、表現の構造
についての関係を判定するような場合には、枠組み表現
の脱落値の方を大きくしておくと言ったことも好適であ
る。The missing value when there is no corresponding constituent unit may be set to a uniform value, but may be changed according to the length of the sentence for which the relation is to be determined. If it is considered that a sentence having many constituent units has a high degree of redundancy, it is desirable to set so that the influence of dropout is evaluated to be small. It is also desirable to provide a means for setting the dropout value to a value corresponding to the importance of the constituent unit, since the relationship between sentences can be substantially determined. In addition, a unit that does not exist in the other sentence
A different dropout value may be set depending on whether the expression is a concept expression or a framework expression. If the relationship to be determined is, for example, the similarity between sentences, the omission of the conceptual expression is considered to have a large effect on the determination of the similarity. It is conceivable that the evaluation is made larger than the case of dropping. In addition, it is also preferable to say that the dropout value of the framework expression is set to be larger in the case of determining the relation of the structure of the expression instead of the semantic content.

【００２１】文を構成する構成単位間の関係を判定して
文間の関係を判定する場合には、通常概念表現同士、枠
組み表現同士の関係を検討すれば良く、概念表現と枠組
み表現との間の関係を予め規定しておく必要性は低いと
考えられる。しかし、意味概念を単独で表わす概念語同
士の関係に加えて、概念性接辞と概念語との関係を情報
として記憶しておくことは望ましい。例えば、概念性接
辞の「新」と概念語の「新しい」とは、類似関係がある
ものとして評価した方が望ましい。概念性接辞の「的」
と概念語の「スタイル」なども、同様である。When judging the relation between constituent units constituting a sentence and judging the relation between sentences, it is sufficient to usually examine the relation between conceptual expressions and between frame expressions. It is thought that there is little need to pre-define the relationship between the two. However, it is desirable to store, as information, the relationship between the concept affix and the concept word, in addition to the relationship between concept words that independently represent the semantic concept. For example, it is desirable to evaluate the concept suffix “new” and the concept word “new” as having a similarity. Conceptual affix "target"
The same applies to the concept word "style".

【００２２】判定する文間の関係に種々のものを想定で
きることは既に述べたが、この関係が類似関係である場
合には、概念表現同士および枠組み表現同士の類似の程
度を表わす情報を記憶しておくことができる。この場合
には、記憶された概念表現同士および枠組み表現同士の
類似の程度を参照して、前記第１および第２の文同士の
類否を判定することになる。文同士の関係として類否を
判定する場合には、自然言語文の検索や翻訳のための例
文検索、あるいは文の変形、圧縮（要約）などの処理に
おいて、最も適用範囲が広い。It has already been described that various relationships can be assumed between the sentences to be determined. If the relationships are similar, information indicating the degree of similarity between conceptual expressions and framework expressions is stored. Can be kept. In this case, the similarity between the first and second sentences is determined by referring to the degree of similarity between the stored conceptual expressions and between the framework expressions. When similarity is determined as the relationship between sentences, the application range is the widest in processing such as searching for an example sentence for searching or translating a natural language sentence, or deforming or compressing (summarizing) a sentence.

【００２３】こうした類否の判定は、概念表現同士およ
び枠組み表現同士の対と、当該対にされた表現間の意味
上の類似度を数値によって表わした類似度数値データと
を、類否の程度を表わす情報として、記憶しておき、こ
の類似度数値データの大小を用いて、前記類否の判定を
行なうことが簡便である。The determination of similarity is performed by comparing pairs of conceptual expressions and frame expressions with similarity numerical data representing numerically the semantic similarity between the paired expressions. It is convenient to store the information representing the similarity and to determine the similarity using the magnitude of the similarity numerical value data.

【００２４】こうした類似度数値データとしては、値０
ないし１の範囲で、類似の程度が高いほど値１に近づく
値を用いることができ、この類似度数値データの値か
ら、概念表現同士および枠組み表現同士の距離を演算
し、この距離の総和が最も短くなる組み合わせを特定す
ることで、第１および第２の文同士の類似の程度を求め
るものとしてもよい。距離の総和を用いて類似の程度を
判定する手法は、二つの文間の類似を議論する上で、具
体的なイメージを提供し、理解を容易にする。また、従
来から知られたパターンマッチングなどの手法の適用を
容易にする。As such similarity numerical data, the value 0
In the range from 1 to 1, a value approaching the value 1 can be used as the degree of similarity increases, and the distance between conceptual expressions and between frame expressions is calculated from the value of the similarity numerical data. The degree of similarity between the first and second sentences may be obtained by specifying the shortest combination. The method of determining the degree of similarity by using the sum of distances provides a concrete image in discussing the similarity between two sentences, and facilitates understanding. Further, it is easy to apply a conventionally known technique such as pattern matching.

【００２５】かかる文間の類否を判定する自然言語文関
係判定装置を用いて、複数の検索対象文から、検索のた
めのキーとして与えられた検索キー文に類似した文を検
索する自然言語文検索装置を構成することができる。即
ち、この自然言語文検索装置は、上述した文間の類否の
判定を行なう自然言語文関係判定装置と、前記検索キー
文を、第１の文として特定する第１文特定手段と、前記
複数の検索対象文から、順次、一の文を選択して、第２
の文として特定する第２文特定手段と、前記特定された
第１の文および前記第２の文を、前記自然言語文関係判
定装置に付与して、前記類否判定を行なわせる判定実行
手段と、前記自然言語文関係判定装置の判定結果を、前
記付与された第２の文に応じて保存し、前記第１の文と
して与えられた検索キー文に最も類似する第２の文を、
前記複数の検索対象文中から選択する選択手段とを備え
たことを要旨としている。A natural language for retrieving a sentence similar to a search key sentence given as a key for a search from a plurality of search target sentences using the natural language sentence relation determining apparatus for judging similarity between sentences. A sentence retrieval device can be configured. That is, the natural language sentence search device includes a natural language sentence relationship determination device that determines the similarity between the sentences described above, a first sentence specifying unit that specifies the search key sentence as a first sentence, One sentence is sequentially selected from a plurality of search target sentences, and the second sentence is selected.
Second sentence specifying means for specifying the first sentence and the second sentence to the natural language sentence relation determination device, and performing the similarity determination. And the determination result of the natural language sentence relation determination device is stored according to the assigned second sentence, and a second sentence most similar to the search key sentence given as the first sentence is
Selecting means for selecting from the plurality of search target sentences.

【００２６】かかる自然言語文検索装置は、複数の検索
対象文から一の文を順次選択し、この文と検索キー文と
の類似を自然言語文判定装置により判定し、この判定の
結果を保存し、複数の検索対象文の各文についての類否
の判定結果から、検索キー文に最も類似する文を選択す
ることができる。かかる構成をとれば、二つの文間の類
似を、語順の対応関係を考慮しつつ、かつ概念表現同士
および枠組み表現同士の類似に基づいて、判定すること
ができ、検索キー文に最も類似する文を、容易に検索す
ることができる。This natural language sentence retrieval device sequentially selects one sentence from a plurality of retrieval target sentences, determines the similarity between this sentence and the retrieval key sentence by a natural language sentence determination device, and stores the result of this determination. Then, a sentence that is most similar to the search key sentence can be selected from the similarity determination result of each sentence of the plurality of search target sentences. With this configuration, the similarity between two sentences can be determined based on the similarity between concept expressions and framework expressions while considering the correspondence between word orders, and is most similar to the search key sentence. Sentences can be easily searched.

【００２７】自然言語文関係判定装置の文と文との関係
の判定するための構成を、語句と語句との関係を判定す
る語句関係判定装置に応用することも可能である。即
ち、本発明の語句関係判定装置は、意味概念を表わす表
現である概念表現につき、少なくとも該概念表現同士の
関係を表わす情報を記憶した辞書と、第１の語句と第２
の語句とを入力する入力手段と、該入力された第１の語
句および第２の語句から該語句を構成する単語を抽出す
る抽出手段と、該抽出された第１の語句を構成する単語
と該第２の語句を構成する単語との関係を、前記辞書を
参照して判断する判断手段と、該判断手段による判断結
果に基づいて前記第１の語句と前記第２の語句との関係
を判定する判定手段とを備えた装置であって、前記第１
の語句または第２の語句のうちの少なくとも一方には、
２以上の単語の結合により１のまとまった意味概念を表
わす表現である複合表現を含み、該複合表現と、該複合
表現に対応する表現との関係を評価する評価手段を備
え、前記判定手段は、該評価手段による評価結果を考慮
して、前記第１の語句と前記第２の語句との関係を判定
する手段であることを要旨としている。The structure for determining the relationship between sentences in the natural language sentence relationship determination device can be applied to a phrase relationship determination device that determines the relationship between words. That is, for the concept expression that is an expression representing a semantic concept, the word / phrase relationship determination device of the present invention includes a dictionary storing at least information indicating the relationship between the concept expressions, a first word and a second word.
Input means for inputting a word, an extraction means for extracting a word constituting the phrase from the input first word and the second word, and a word constituting the extracted first word. Determining means for determining the relationship between words forming the second word by referring to the dictionary; and determining the relationship between the first word and the second word based on the determination result by the determining means. A determination means for determining whether the first
Or at least one of the second words,
A composite expression which is an expression representing a united semantic concept by combining two or more words, comprising an evaluation unit for evaluating a relationship between the composite expression and an expression corresponding to the composite expression; The gist is that it is means for determining the relationship between the first word and the second word in consideration of the evaluation result by the evaluation means.

【００２８】このような構成を採れば、複合表現と複合
表現に対応する表現との関係を正確に判定することがで
きる。With such a configuration, the relationship between the composite expression and the expression corresponding to the composite expression can be accurately determined.

【００２９】更に、上記の各装置に対応した方法の発明
として、自然言語文関係判定方法自然言語文検索方
法などを請求項１７ないし２２に記載した通り、考える
ことができる。Further, as an invention of a method corresponding to each of the above apparatuses, a natural language sentence relation determination method and a natural language sentence search method can be considered as described in claims 17 to 22.

【００３０】同様に、上記の各方法に対応した記録媒体
の発明として、自然言語文の関係を判定するプログラ
ムを記録した記録媒体自然言語文を検索するプログラ
ムを記録した記録媒体などを請求項２３ないし２８に記
載した通り、考えることができる。Similarly, as the invention of a recording medium corresponding to each of the above methods, a recording medium recording a program for determining the relationship between natural language sentences, a recording medium recording a program for searching for a natural language sentence, and the like. To 28, it can be considered.

【００３１】[0031]

【発明の他の態様】本願発明は、専用機として構成して
も良いし、汎用性の高いパーソナルコンピュータなどで
実現しても良い。また、記録媒体に記憶された各機能を
実現するプログラムは、ネットワークに接続されたサー
バなどに保存・記憶しておき、必要に応じて実行用のマ
シンにダウンロードして利用することも可能である。こ
うしたサーバの形態あるいはサーバからプログラムを公
衆送信する場合も、本願の媒体の一形態とみなすことが
できる。Other Embodiments of the Invention The present invention may be configured as a dedicated machine or may be realized by a highly versatile personal computer or the like. Further, the program for realizing each function stored in the recording medium may be stored and stored in a server or the like connected to a network, and may be used by downloading to an execution machine as needed. . Such a form of the server or a case where the program is transmitted to the public from the server can be regarded as one form of the medium of the present application.

【００３２】[0032]

【発明の実施の形態】以上説明した本発明の構成および
作用を一層明らかにするために、以下本発明の実施の形
態を実施例に基づき説明する。図１は、本発明の自然言
語文関係判定装置の一例である文間類似度判定装置１Ａ
のハードウェアの構成を示す。この第１実施例としての
文間類似度判定装置１Ａは、類似度検索エンジン１０Ａ
と外部装置９０とを備え、類似度検索エンジン１０Ａ
は、当該エンジン１０Ａに入力された、一定のまとまり
を持った内容を表わす第１の文と第２の文とが類似する
程度（以下、文字列間類似度という）を判定する。以
下、第１の文を文字で表わしたものを「入力文字列」
と、第２の文を文字で表わしたものを「対比文字列」と
読み替えて説明する。文字列間類似度の判定処理は、類
似度検索エンジン１０Ａ内部のコンピュータにより実行
される。DESCRIPTION OF THE PREFERRED EMBODIMENTS In order to further clarify the configuration and operation of the present invention described above, embodiments of the present invention will be described below based on examples. FIG. 1 is a sentence similarity determination device 1A which is an example of the natural language sentence relationship determination device of the present invention.
1 shows a hardware configuration. The inter-sentence similarity determination apparatus 1A according to the first embodiment includes a similarity search engine 10A.
And an external device 90, a similarity search engine 10A
Determines the degree of similarity between the first sentence and the second sentence representing the content having a certain unity and input to the engine 10A (hereinafter, referred to as similarity between character strings). Hereinafter, the first sentence expressed in characters is referred to as “input character string”.
And the second sentence expressed in characters as "contrast character string". The process of determining the similarity between character strings is executed by a computer inside the similarity search engine 10A.

【００３３】コンピュータは、各種演算処理を実行する
ためのＣＰＵ２２を中心に、バス３５により相互に接続
された次の各部を備えている。ＲＯＭ２４は、ＣＰＵ２
２で各種演算処理を実行するのに必要なプログラムや参
照データなどを予め格納しているメモリであり、後述す
る文字列間類似度の判定の実行に関するプログラムを格
納する。ＲＡＭ２６は、ＣＰＵ２２で各種演算処理を実
行するのに必要な各種データを一時的に格納するための
メモリである。The computer is provided with the following components interconnected by a bus 35, centering on the CPU 22 for executing various arithmetic processing. The ROM 24 is a CPU 2
2 is a memory in which programs and reference data necessary for executing various arithmetic processes are stored in advance, and stores a program relating to execution of determination of similarity between character strings described later. The RAM 26 is a memory for temporarily storing various data necessary for the CPU 22 to execute various arithmetic processes.

【００３４】ハードディスクコントローラ（ＨＤＣ）３
０は、外部記憶装置としてのハードディスク１０ａへの
信号出力を制御する。ハードディスク１０ａには、必要
に応じてＲＡＭ２６にロードされて実行される各種プロ
グラムや、デバイスドライバの形式やモジュールの形式
で提供されるプログラム、あるいは国語辞書や後述する
類義語辞書３６等の各種辞書などが記憶されている。勿
論、ＲＯＭ２４やＣＤ−ＲＯＭ等（図示せず）に、上記
したと同様な各種プログラムや必要な参照データなどを
記憶しておき、これら各種プログラムや参照データをロ
ードすることにより、コンピュータに実行させることも
可能である。Hard Disk Controller (HDC) 3
0 controls signal output to the hard disk 10a as an external storage device. The hard disk 10a stores various programs loaded and executed in the RAM 26 as necessary, programs provided in the form of device drivers and modules, and various dictionaries such as a Japanese language dictionary and a synonym dictionary 36 described later. It is remembered. Of course, various programs similar to those described above, necessary reference data, and the like are stored in the ROM 24, CD-ROM, or the like (not shown), and the computer is executed by loading these various programs and reference data. It is also possible.

【００３５】入力インタフェース２０は、外部装置９０
からのデータや文字列の入力を司り、出力インタフェー
ス３４は、外部装置９０およびプリンタ９２へのデータ
や文字列の出力を制御する。即ち、類似度検索エンジン
１０Ａは、図示しないケーブルを用いて外部装置９０と
接続されており、外部装置９０との間でデータや文字の
情報の入出力を行なう。勿論、入力インタフェース２０
を介してキーボードや手書き文字認識ボード等を接続
し、所望の文字列を入力可能な構成としても差し支えな
い。The input interface 20 is connected to an external device 90
The output interface 34 controls output of data and character strings to the external device 90 and the printer 92. That is, the similarity search engine 10A is connected to the external device 90 using a cable (not shown), and performs input and output of data and character information with the external device 90. Of course, the input interface 20
A keyboard or a handwritten character recognition board or the like may be connected through the interface to input a desired character string.

【００３６】なお、入力インタフェース２０は、文字や
データをコード情報の形で入力するが、これ以外の形態
で入力可能な構成としてもよい。例えば、音声情報や文
字の形状に関する情報を入力するためのインタフェース
を設け、入力された情報を、ＣＰＵ２２が判読可能なデ
ジタル情報に変換し、これを音声認識や文字認識により
文字列に変換してから入力する構成などを考えることが
できる。Although the input interface 20 inputs characters and data in the form of code information, the input interface 20 may be configured to allow input in other forms. For example, an interface for inputting voice information and information on the shape of characters is provided, and the input information is converted into digital information that can be read by the CPU 22, and this is converted into a character string by voice recognition or character recognition. Can be considered.

【００３７】ディスプレイコントローラ（ＤＣ）２８
は、表示装置としての液晶ディスプレイ１０ｂへの信号
出力を制御する。また、シリアル入出力インタフェース
（ＳＩＯ）３２は、モデム９４を介して公衆電話回線Ｐ
ＴＬに接続されており、この公衆電話回線ＰＴＬを介し
て、コンピュータ１０ｃを外部のネットワークＮＷに接
続することができる。さらに、特定のサーバーＳＶにア
クセスして、必要なプログラムやデータをハードディス
ク１０ａにダウンロードすることも可能である。Display controller (DC) 28
Controls the signal output to the liquid crystal display 10b as a display device. The serial input / output interface (SIO) 32 is connected to a public telephone line P
The computer 10c can be connected to an external network NW via the public telephone line PTL. Further, it is also possible to access a specific server SV and download necessary programs and data to the hard disk 10a.

【００３８】本実施例では、外部装置９０を、ＣＰＵや
ＲＯＭ，ＲＡＭ等からなるコンピュータ９０ｃやハード
ディスク９０ａ，ディスプレイ９０ｂ，キーボード９０
ｄ等を備えるデスクトップ型のパソコンとしている。従
って、外部装置９０は、各種のアプリケーションプログ
ラムをインストールすることにより、種々の機能を実行
可能な装置となる。例えば、ワープロ機能を実現する文
書作成装置、電話回線を通じて文字情報を授受するデー
タ通信装置をはじめ、１の言語で作成された文章を他の
言語に翻訳する翻訳装置、入力された文字列と同一ない
し近似する文字列を有する情報を検索する情報検索装
置、作成した文書中からの特定の文字列の検索や保存さ
れているファイルからの所望のファイルの検索を実行す
る文字列検索装置、文章の要約文をコンピュータにより
作成する要約作成装置などを考えることができる。勿
論、パソコン以外の装置であっても、文字列を含むデー
タの情報を出力する機能を備えた装置であれば、外部装
置９０とすることができる。なお、本実施例では、外部
装置９０も、類似度検索エンジン１０Ａと同様に、図示
しないモデムを介して公衆電話回線ＰＴＬに接続されて
いる。In this embodiment, the external device 90 is constituted by a computer 90c comprising a CPU, ROM, RAM and the like, a hard disk 90a, a display 90b, a keyboard 90
It is a desktop personal computer equipped with d and the like. Therefore, the external device 90 is a device that can execute various functions by installing various application programs. For example, a document creation device that implements a word processing function, a data communication device that sends and receives character information through a telephone line, a translation device that translates a sentence created in one language into another language, and a character string that is the same as an input character string Or an information search device for searching for information having a similar character string, a character string search device for searching for a specific character string from a created document, or a desired file from a stored file, An abstract creation device or the like that creates an abstract by a computer can be considered. Of course, any device other than a personal computer can be the external device 90 as long as it has a function of outputting data information including a character string. In this embodiment, the external device 90 is also connected to the public telephone line PTL via a modem (not shown), similarly to the similarity search engine 10A.

【００３９】次に、このようなハードウェアを用いて実
行される文字列間類否判定処理の内容について説明す
る。図２は、文字列間類否判定処理が実行される際の、
類似度検索エンジン１０Ａと外部装置９０との間の情報
の流れを示す説明図である。類似度検索エンジン１０Ａ
は、外部装置９０から送出された文字列を入力し、この
入力文字列との類否判断の対象となる対比文字列を参照
する。本実施例では、対比文字列の情報を、外部装置９
０のハードディスク９０ａ内に格納しているため、対比
文字列の参照先をハードディスク９０ａとしている。勿
論、対比文字列が読み取り可能に格納されている場所で
あれば、類似度検索エンジン１０Ａのハードディスク１
０ａやサーバーＳＶ等、どこを参照しても差し支えな
い。Next, the contents of the character string similarity determination processing executed using such hardware will be described. FIG. 2 illustrates a case where the character string similarity determination process is performed.
FIG. 4 is an explanatory diagram showing a flow of information between a similarity search engine 10A and an external device 90. Similarity search engine 10A
Inputs a character string sent from the external device 90 and refers to a comparison character string to be compared with the input character string. In this embodiment, the information of the contrast character string is stored in the external device 9.
0 is stored in the hard disk 90a, so that the reference destination of the comparison character string is the hard disk 90a. Of course, if the comparison character string is stored in a readable manner, the hard disk 1 of the similarity search engine 10A
0a or the server SV can be referred to.

【００４０】類似度検索エンジン１０Ａは、ハードディ
スク１０ａ内の類義語辞書３６を参照して、入力文字列
と対比文字列との類似度を判定する。この類義語辞書３
６の内容については後述する。類似度検索エンジン１０
Ａは、入力文字列と対比文字列との類似度の判定結果を
外部装置９０へ出力する。従って、外部装置９０は、自
己が保持する２つの文字列につき、当該文字列間の類否
判定の結果を利用可能となる。The similarity search engine 10A determines the similarity between the input character string and the comparison character string with reference to the synonym dictionary 36 in the hard disk 10a. This synonym dictionary 3
The content of No. 6 will be described later. Similarity search engine 10
A outputs the determination result of the similarity between the input character string and the comparison character string to the external device 90. Therefore, the external device 90 can use the result of the similarity determination between the character strings for the two character strings held by the external device 90.

【００４１】次に、文字列間類否判定処理の処理手順を
図３の文字列間類否判定ルーチンを参照しつつ説明す
る。文字列間類否判定ルーチンは、文字列が入力される
旨の信号を外部装置９０から受領したときに起動する。
図３に示すように、本ルーチンが起動されると、まず、
外部装置９０から送られてきた文字列の情報を入力し
（ステップＳ１００）、文字列間の類似度を比較する対
象である対比文字列を参照する処理を行なう（ステップ
Ｓ１２０）。Next, the processing procedure of the character string similarity determination processing will be described with reference to the character string similarity determination routine of FIG. The character string similarity determination routine starts when a signal indicating that a character string is input is received from the external device 90.
As shown in FIG. 3, when this routine is started, first,
The character string information sent from the external device 90 is input (step S100), and a process of referring to a comparison character string to be compared for similarity between the character strings is performed (step S120).

【００４２】次に、入力文字列と対比文字列のそれぞれ
につき、各文字列を構成する構成単位を文法情報ととも
に抽出する処理を行なう（ステップＳ１３０）。本実施
例では、まとまった意味を持つ表現、例えば、意味概念
を表わす概念表現や文構造の枠組を支える枠組み表現を
１の構成単位として抽出する。概念表現としての構成単
位には、概念語のほか、複数の単語の組み合わせにより
意味概念を表わすものも含む。枠組み表現としての構成
単位には、関係表現や助述表現等の機能的表現がある。Next, for each of the input character string and the comparison character string, a process of extracting constituent units constituting each character string together with grammatical information is performed (step S130). In the present embodiment, an expression having a coherent meaning, for example, a conceptual expression representing a semantic concept or a framework expression supporting a framework of a sentence structure is extracted as one constituent unit. The structural unit as a conceptual expression includes not only a conceptual word but also a unit expressing a semantic concept by a combination of a plurality of words. The structural units as framework expressions include functional expressions such as relational expressions and advisory expressions.

【００４３】この概念表現や機能的表現の抽出処理は、
この表現に関する文法情報を格納した国語辞書を参照す
ることにより行なわれる。この国語辞書には、文法情報
として、各表現が概念語，接辞，関係表現や助述表現の
うちのいずれに該当するかが記憶されている。これらの
文法情報は、その読みをインデックスとして参照するこ
とができる。The extraction process of the concept expression and the functional expression is as follows.
This is performed by referring to a Japanese language dictionary that stores grammatical information about this expression. The Japanese language dictionary stores, as grammatical information, whether each expression corresponds to a concept word, an affix, a relational expression, or an auxiliary expression. These grammatical information can refer to the reading as an index.

【００４４】ここで、概念語と機能的表現につき、図４
を参照しつつ説明する。概念語とは、それ自体で何らか
の意味概念を表わす語をいい、主として自立語がこれに
該当する。例えば、名詞の「バス」、動詞の「来る」、
形容詞の「美しい」等は、概念語の範疇に属する。一
方、機能的表現とは、それ自体では意味概念を表わさな
いが、概念語に付随して概念語が表わす概念の意味的役
割を限定する働きをする表現をいい、助詞や助動詞のよ
うな附属語の他、接頭語や接尾語のような接辞、および
これら以外の付随的表現、例えば、関係表現や助述表現
を表わす語等がこれに該当する。例えば、主体を表わす
助詞の「が」や場所や手段を表わす助詞の「で」、受け
身を表わす助動詞の「れる」や推量を表わす助動詞の
「らしい」、接頭語の「新」、接尾語の「難い（がた
い）」等は、機能的表現の範疇に属する。FIG. 4 shows the concept word and the functional expression.
This will be described with reference to FIG. The concept word refers to a word that expresses some semantic concept by itself, and is mainly an independent word. For example, the noun "bus", the verb "come",
The adjective "beautiful" etc. belongs to the category of concept words. On the other hand, a functional expression is an expression that does not express a semantic concept by itself, but acts to limit the semantic role of the concept represented by the concept word, and is attached to a concept word, such as a particle or auxiliary verb. In addition to words, affixes such as prefixes and suffixes, and ancillary expressions other than these, for example, words representing relational expressions and auxiliary expressions, etc., correspond to this. For example, the particle "ga" representing the subject, the particle "de" representing the place or means, the auxiliary verb "re" representing the passive, the auxiliary verb "like" representing the guesswork, the prefix "new", and the suffix " “Difficult” belongs to the category of functional expression.

【００４５】関係表現とは、１の文中において、概念語
と概念語との間に用いられることにより、格関係，因果
関係などの概念語間の関係を表わす表現をいい、前述し
た助詞の「で」の他、「によって」のような原因や手段
を表わす連語や「において」のような場所を表わす連語
等がこれに該当する。助述表現とは、主として述語であ
る概念語の後に用いられて、述語がそれ自体で持ってい
る意味内容を変化させる表現をいう。例えば、「かもし
れない」や「なければならない」のような連語、推量を
表わす助動詞「べし」の連体形と断定を表わす助動詞
「だ」の終止形とが結合した「べきだ」のような語等が
これに該当する。The relational expression is an expression that is used between concept words in one sentence and expresses a relationship between concept words such as a case relationship and a causal relationship. In addition to "de", collocations indicating a cause or means such as "by" or collocations indicating a place such as "in" correspond to this. An auxiliary expression is an expression mainly used after a concept word, which is a predicate, to change the semantic content of the predicate itself. For example, collocations such as "may" and "must", and adverb "adidas" in which the adjunct form of the auxiliary verb "beshi" representing guesswork and the final form of the auxiliary verb "da" representing assertion are combined Words correspond to this.

【００４６】概念語と機能的表現との具体的関係を図４
に示す。図４（Ａ）に示すように、文字列が概念語だけ
で構成されている場合には、それぞれの概念語の持つ意
味内容が別個独立に表象されるため、文字列全体が表わ
す意味内容は多義的となる。例えば、「バス」と「来
る」という２つの概念語で構成された「バス来る」とい
う文字列は、「バスが来る」や「バスも来る」，「バス
で来る」，「バスによって来る」等のいずれの意味内容
を示すのか明らかでない。FIG. 4 shows a concrete relationship between the concept word and the functional expression.
Shown in As shown in FIG. 4A, when a character string is composed of only concept words, the meaning content of each concept word is represented separately and independently, so the meaning content represented by the entire character string is Becomes ambiguous. For example, a character string "bus coming" composed of two concept words "bus" and "coming" is "coming bus", "bus coming", "coming by bus", "coming by bus". It is not clear which meaning is shown.

【００４７】一方、図４（Ｂ）に示すように、２つの概
念語の間に機能的表現の１つである関係表現が存在する
場合には、関係表現は、直前の概念語の持つ意味上の働
きを特定する働きをする。例えば、「バス」という概念
語は、その後に「が」という機能的表現が置かれた場合
には「主体としてのバス」の意味を表わすこととなり、
「によって」という関係表現が置かれた場合には「交通
手段としてのバス」の意味を、「によって」という関係
表現が置かれた場合には「交通手段としてのバス」若し
くは「場所としてのバス」の意味を表わすこととなる。
即ち、概念語は、関係表現のような機能的表現と結びつ
いて初めて文の一構造となるのである。On the other hand, as shown in FIG. 4B, when there is a relational expression that is one of the functional expressions between two concept words, the relational expression has the meaning of the immediately preceding concept word. It works to identify the above work. For example, the concept word "bus", if followed by a functional expression "ga", would mean "bus as subject".
When the relational expression "by" is placed, it means "bus as a means of transportation", and when the relational expression "by" is placed, it is "bus as a means of transportation" or "bus as a place."".
That is, a concept word becomes a sentence structure only when it is linked to a functional expression such as a relational expression.

【００４８】このように機能的表現によって先の概念語
の意味が特定されることにより、先の概念語から後の概
念語へ概念の有機的結合が生じ、文全体としてまとまっ
た１つの意味内容を表わすことになる。例えば、「バス
が来る」という文は、文全体として「バスが動いて自分
の存在する場所にやって来る」ということを意味し、
「バスによって来る」という文は、文全体として「ある
人が、数ある交通手段のうちバスという交通手段を用い
て自分の存在する場所にやって来る」ことを意味する。
また、それ自体で「交通手段」および「場所」という２
つの意味内容を有していた「バスで」という文字列は、
その後に「来る」という概念語が用いられることによっ
て、「交通手段としてのバス」の意味に限定され、この
結果、「バスで来る」という文は、「バスによって来
る」という文と極めて近似した意味を表わすものとな
る。By specifying the meaning of the preceding concept word by the functional expression in this way, an organic combination of the concept from the preceding concept word to the following concept word occurs, and one meaning content as a whole sentence is obtained. Will be represented. For example, the sentence "The bus is coming" means that the bus as a whole comes and goes to where you are,
The sentence "coming by bus" as a whole means that "a certain person comes to the place where he / she exists by using the bus means of transportation".
In addition, they are themselves called “transportation” and “place”
The word "in the bus", which had two meanings,
The use of the concept word "coming" was subsequently limited to the meaning of "bus as a means of transportation", and as a result, the sentence "coming by bus" was very similar to the sentence "coming by bus". It represents the meaning.

【００４９】また、図４（Ｃ）に示すように、述語であ
る概念語の後に機能的表現の１つである助述表現が存在
する場合には、助述表現は、直前の述語である概念語の
持つ意味内容を変化させる働きをする。例えば、「来
る」という概念語は、その後に「かもしれない」という
助述表現が置かれた場合には、「来る」という行為に関
する推定の意味を表わすこととなり、「べきだ」という
助述表現が置かれた場合には、「来る」という行為が義
務である旨の意味を表わすこととなる。Also, as shown in FIG. 4C, when a predicate, which is one of the functional expressions, is present after the concept word, which is a predicate, the predicate is the immediately preceding predicate. It works to change the meaning of concept words. For example, the concept word "coming", when followed by an advisory expression "may", indicates the inferred meaning of the act of "coming", and the advisory expression "should" When an expression is placed, it means that the act of "coming" is an obligation.

【００５０】このように、２つの文が同じ概念語を用い
ていても、用いられている機能的表現が異なることによ
り、全く意味の違う文となることがある一方で、時には
近似した意味を持つ文となる場合もある。即ち、機能的
表現は、文の持つ意味を大きく左右する役割を果たして
いるのである。As described above, even if two sentences use the same concept word, the sentence may have a completely different meaning due to a difference in the functional expression used. It may be a statement to have. In other words, the functional expression plays a role that greatly affects the meaning of the sentence.

【００５１】図３に説明を戻す。ステップＳ１３０にお
いて入力文字列と対比文字列から各文字列を構成する構
成単位を概念語，関係表現，助述表現，接辞に分けて抽
出した後、これらの抽出された各構成単位同士の類似度
（以下、語間類似度という）を判定する単語間類似度判
定処理を行なう（ステップＳ１４０）。なお、以下の説
明では、文を構成する各構成単位を、説明の便宜上、広
義の「単語」と呼ぶものとする。次に、この判定結果に
基づいて、入力文字列を構成する各単語列と対比文字列
を構成する単語列との間の類似度（以下、単語列間類似
度という）を判定する単語列間類似度判定処理を行なう
（ステップＳ１５０）。これらの処理の詳細については
後述する。次に、判定された単語列間類似度を示す数値
を判定結果として外部装置９０に出力する処理を行なっ
て（ステップＳ１６０）、本ルーチンを終了する。Returning to FIG. In step S130, constituent units constituting each character string are extracted from the input character string and the comparison character string into concept words, relational expressions, adjunct expressions, and affixes. An inter-word similarity determination process for determining (hereinafter, referred to as an inter-word similarity) is performed (step S140). In the following description, each constituent unit constituting a sentence is referred to as a “word” in a broad sense for convenience of description. Next, based on the determination result, a similarity between word strings constituting the input character string and a word string constituting the comparison character string (hereinafter, referred to as similarity between word strings) is determined. A similarity determination process is performed (step S150). Details of these processes will be described later. Next, a process of outputting a numerical value indicating the determined degree of similarity between word strings to the external device 90 as a determination result is performed (step S160), and this routine ends.

【００５２】次に、図３のステップＳ１４０の単語間類
似度判定処理の詳細につき、図５から図８を参照しつつ
説明する。図５および図６は、単語間類似度判定ルーチ
ンを示すフローチャートである。本ルーチンは、入力文
字列および対比文字列を構成する単語が、各文字列から
文法情報とともに抽出されたときに起動する。以後、説
明の便宜を図るため、入力文字列Ａからは、「単語ａ１
／単語ａ２／単語ａ３／…／単語ａｉ／…／単語ａｍ
（記号／は単語の区切りを、英字ｉは文字列Ａ中におけ
る単語の序数を、英字ｊは文字列Ｂ中における単語の序
数を、それぞれ示す。以下同じ）」という総数ｍ個の単
語を、対比文字列Ｂからは、「単語ｂ１／単語ｂ２／単
語ｂ３／…／単語ｂｊ／…／単語ｂｎ」という総数ｎ個
の単語を、それぞれ抽出したものとして説明する。Next, the details of the inter-word similarity determination processing in step S140 of FIG. 3 will be described with reference to FIGS. FIGS. 5 and 6 are flowcharts showing an inter-word similarity determination routine. This routine is started when words constituting the input character string and the comparison character string are extracted from each character string together with grammatical information. Hereinafter, for the sake of convenience of explanation, the input word A
/ Word a2 / word a3 /.../ word ai /.../ word am
(The symbol / indicates a word delimiter, the letter i indicates the ordinal number of a word in the character string A, and the letter j indicates the ordinal number of the word in the character string B. The same applies hereinafter.) A description will be given assuming that a total of n words “word b1 / word b2 / word b3 /.../ word bj /.../ word bn” are extracted from the comparison character string B.

【００５３】本ルーチンが起動されると、まず、入力文
字列Ａについての単語の序数ｉを値１にセットするとと
もに（ステップＳ２００）、対比文字列Ｂについての単
語の序数ｊを値１にセットする処理を行なう（ステップ
Ｓ２１０）。これによって、語間類似度の判定対象は、
単語ａ１と単語ｂ１に特定される。When this routine is started, first, the ordinal number i of the word for the input character string A is set to the value 1 (step S200), and the ordinal number j of the word for the comparison character string B is set to the value 1. (Step S210). As a result, the word similarity determination target is:
Word a1 and word b1 are specified.

【００５４】次に、単語ａ１と単語ｂ１が、ともに接辞
であるか否かを判断し（ステップＳ２２０）、ともに接
辞でない場合には、ともに概念語であるか否かを判断す
る処理を行なう（ステップＳ２２５）。ともに概念語で
もない場合には、ともに関係表現であるか否かを判断し
（ステップＳ２３０）、ともに関係表現でもない場合に
は、ともに助述表現であるか否かを判断する処理を行な
う（ステップＳ２３５）。単語ａ１と単語ｂ１が、とも
に接辞，ともに概念語，ともに関係表現，ともに助述表
現のいずれかである場合には、単語ａ１について類義語
辞書３６を参照し（ステップＳ２４０）、類義語として
単語ｂ１が登録されているか否かを判断する処理を行な
う（ステップＳ２４５）。Next, it is determined whether or not the words a1 and b1 are both affixes (step S220). If both are not affixes, a process is performed to determine whether or not both are conceptual words (step S220). Step S225). If neither is a concept word, it is determined whether or not both are relational expressions (step S230). If neither is a relational expression, a process is performed to determine whether or not both are adjunct expressions (step S230). Step S235). If the words a1 and b1 are either affixes, both concept words, both relational expressions, and both adjunct expressions, the synonym dictionary 36 is referred to for the word a1 (step S240), and the word b1 is used as a synonym for the word a1. A process for determining whether or not the information has been registered is performed (step S245).

【００５５】類義語辞書３６の構造について図７および
図８を参照しつつ説明する。本実施例において、類義語
辞書３６は、図７に示す概念語類義語辞書３６ａと図８
に示す機能的表現類義語辞書３６ｂを備え、概念語類義
語辞書３６ａは、概念語とこの概念語に類似する意味を
持つ語（以下、概念類似語という）の情報を、機能的表
現類義語辞書３６ｂは、機能的表現とこの機能的表現に
類似する意味を持つ語（以下、機能類似語という）の情
報を格納する。図７および図８に示すように、概念語類
義語辞書３６ａおよび機能的表現類義語辞書３６ｂは、
検索用の見出しであるインデックスに対応して、各概念
語や各機能的表現に関する文字情報および品詞情報を五
十音順に格納するとともに、これらの各概念語や各機能
的表現に対応する概念類似語や機能類似語の文字情報，
品詞情報を格納する。なお、類義語辞書３６は、概念語
と機能的表現に関する情報をまとめた１の辞書としても
よく、また、概念語，接辞，関係表現，機能表現という
格納される単語の種類ごとに別々の辞書を設ける構成と
しても差し支えない。The structure of the synonym dictionary 36 will be described with reference to FIGS. In the present embodiment, the synonym dictionary 36 includes the concept synonym dictionary 36a shown in FIG.
The conceptual expression synonym dictionary 36a includes a functional expression synonym dictionary 36b, and the functional expression synonym dictionary 36b stores information on a concept word and a word having a meaning similar to the concept word (hereinafter referred to as a concept similar word). , Information of a functional expression and a word having a similar meaning to the functional expression (hereinafter referred to as a function similar word). As shown in FIGS. 7 and 8, the concept word synonym dictionary 36a and the functional expression synonym dictionary 36b
Character information and part-of-speech information related to each concept word and each functional expression are stored in alphabetical order corresponding to the index, which is a search heading, and the similarity of concepts corresponding to each of these concept words and each functional expression is stored. Character information of words and functional similar words,
Stores part of speech information. Note that the synonym dictionary 36 may be a single dictionary in which information on concept words and functional expressions is put together, or separate dictionaries are stored for each type of stored words such as concept words, affixes, relational expressions, and functional expressions. The configuration may be provided.

【００５６】併せて、概念語類義語辞書３６ａおよび機
能的表現類義語辞書３６ｂは、概念語と各概念類似語お
よび機能的表現と各機能類似語とが意味上類似する度合
いを示す語間類似度の数値データを格納する。本実施例
では、語間類似度を「０≦Ｇ≦１」の範囲の数値を用い
て表わし、数値が１に近づくほど意味の類似する程度が
高いものと定義している。例えば、前述した「バス」と
いう概念語については、概念類似語として「車」という
語の情報が、「０．３」という比較的低い語間類似度の
値とともに登録されており、「私」という概念語につい
ては、概念類似語として「僕」という語の情報が、
「０．９」という高い語間類似度の値とともに登録され
ている。In addition, the concept word synonym dictionary 36a and the functional expression synonym dictionary 36b provide an inter-word similarity degree indicating the degree of meaning similarity between the concept word and each concept similar word and the functional expression and each function similar word. Stores numerical data. In the present embodiment, the inter-word similarity is represented using a numerical value in the range of “0 ≦ G ≦ 1”, and it is defined that the closer the numerical value is to 1, the higher the degree of similarity in meaning is. For example, as for the concept word “bus” described above, information of the word “car” is registered as a concept similar word along with a relatively low inter-word similarity value of “0.3”. Information about the word "I" as a concept similar word,
It is registered with a high inter-word similarity value of “0.9”.

【００５７】なお、概念語類義語辞書３６ａには、概念
語以外の語も概念類似語として登録されている。例え
ば、文中において概念語と同様の意味や用法で用いられ
る接辞（以下、概念語性接辞という）も登録されてい
る。この概念語性接辞には、例えば、「新」や「大」，
「実」という接頭語があり、これらは、「新しい」や
「大きな」，「実際の」という形容詞と同様に、直後の
名詞を修飾する形で用いられる（例えば、「新企画」と
「新しい企画」，「大発見」と「大きな発見」，「実
話」と「実際の話」）。このため、概念語類義語辞書３
６ａには、「新しい」や「大きな」，「実際の」という
形容詞としての概念語に対応する概念類似語として、
「新」や「大」，「実」という接頭語が登録されてい
る。このことは、機能的表現類義語辞書３６ｂについて
も同様であり、「新」や「大」，「実」という接辞とし
ての機能的表現に対応する機能類似語として、「新し
い」や「大きな」，「実際の」という概念語が登録され
ている。Note that words other than concept words are also registered as concept similar words in the concept word synonym dictionary 36a. For example, an affix used in the sentence in the same meaning and usage as the concept word (hereinafter, referred to as a concept word suffix) is also registered. This concept word suffix includes, for example, "new", "large",
There is a prefix of "real", which is used to modify the noun immediately following it, similar to the adjectives "new", "big", and "actual" (eg, "new plan" and "new" Planning, “great discovery” and “great discovery”, “true story” and “actual story”). Therefore, the concept word synonym dictionary 3
6a has conceptual similarities corresponding to conceptual words as adjectives such as "new", "big", and "actual".
Prefixes such as "new", "large", and "real" are registered. This is the same for the functional expression synonym dictionary 36b. Functional similar words corresponding to functional expressions as affixes such as “new”, “large”, and “real” include “new”, “large”, The concept word "actual" is registered.

【００５８】このことに関連して、ステップＳ２２０で
は、単語ａｉと単語ｂｊの双方が接辞でない場合であっ
ても、一方が概念語性接辞で一方が概念語である場合に
は、ともに接辞であるとみなして、類義語辞書３６を参
照することとしている。勿論、ステップＳ２２５におい
て、単語ａｉと単語ｂｊの双方が概念語ではないが、一
方が概念語で一方が概念語性接辞である場合に、ともに
概念語とみなして、類義語辞書３６を参照することとし
てもよい。従って、概念語性接辞と概念語との間におい
ても、妥当性の高い語間類似度を求めることが可能とな
る。In this connection, in step S220, even if both the words ai and bj are not affixes, if one is a concept word suffix and the other is a concept word, both are affixes. Assuming that there is, the synonym dictionary 36 is referred to. Of course, in step S225, if both the word ai and the word bj are not concept words, but one is a concept word and one is a concept word suffix, the words are regarded as concept words and the synonym dictionary 36 is referred to. It may be. Therefore, it is possible to obtain a highly appropriate word similarity between the concept word suffix and the concept word.

【００５９】図５に説明を戻す。ステップＳ２４５にお
いて、単語ａ１についての類義語として単語ｂ１が登録
されていると判断した場合には、類義語辞書３６に記憶
された語間類似度の値を単語ａ１と単語ｂ１との間の語
間類似度として記憶する処理を行なう（ステップＳ２５
０）。本実施例では、単語ａｉと単語ｂｊとの語間類似
度をｔ（ａｉ，ｂｊ）として表わす。従って、単語ａ１
と単語ｂ１との間の語間類似度は、ｔ（ａ１，ｂ１）と
表わされる。Referring back to FIG. If it is determined in step S245 that the word b1 is registered as a synonym for the word a1, the value of the inter-word similarity stored in the synonym dictionary 36 is set to the inter-word similarity between the word a1 and the word b1. A process of storing as degrees is performed (step S25).
0). In the present embodiment, the inter-word similarity between the word ai and the word bj is represented as t (ai, bj). Therefore, the word a1
The inter-word similarity between the word and the word b1 is represented as t (a1, b1).

【００６０】ステップＳ２４５において、単語ａ１につ
いての類義語として単語ｂ１が登録されていないと判断
した場合、またはステップＳ２３５において、単語ａ１
と単語ｂ１が、ともに概念語（一方が概念語性接辞であ
る場合を除く），ともに接辞，ともに関係表現，ともに
助述表現のいずれでもない場合には、語間類似度の値と
して０（ゼロ）を記憶する処理を行なう（ステップＳ２
５５）。この語間類似度の値は、ＲＡＭ２６上の単語間
情報記録テーブルＧＴに記録される。If it is determined in step S245 that the word b1 is not registered as a synonym for the word a1, or in step S235, the word a1
If the word b1 and the word b1 are neither a concept word (except when one is a concept word suffix), both are affixes, both are relational expressions, and neither are adjunct expressions, the inter-word similarity value is 0 ( (Step S2)
55). The value of the word similarity is recorded in the word information recording table GT on the RAM 26.

【００６１】次に、ステップＳ２５０およびＳ２５５で
設定された語間類似度の値に基づいて、単語と単語との
間の距離（以下、語間距離という）を求める処理を行な
う（ステップＳ２６０）。本実施例では、単語ａｉと単
語ｂｊとの語間距離を、単語ａｉと単語ｂｊとの語間類
似度の値の補数の２倍値、即ち、２｛１−ｔ（ａｉ，ｂ
ｊ）｝として表わす。従って、単語ａ１と単語ｂ１との
間の語間距離は、２｛１−ｔ（ａ１，ｂ１）｝と表わさ
れる。この語間距離の値は、ＲＡＭ２６上の単語間情報
記録テーブルＧＴに記録される。Next, a process for obtaining a distance between words (hereinafter, referred to as an inter-word distance) is performed based on the value of the inter-word similarity set in steps S250 and S255 (step S260). In this embodiment, the inter-word distance between the word ai and the word bj is twice the complement of the value of the inter-word similarity between the words ai and bj, that is, 2 ｛1-t (ai, b
j) Expressed as｝. Therefore, the inter-word distance between word a1 and word b1 is expressed as 2 {1-t (a1, b1)}. The value of the inter-word distance is recorded in the inter-word information recording table GT on the RAM 26.

【００６２】この結果、語間距離は「０≦Ｇ≦２」とい
う数値範囲となり、この数値が０に近づくほど単語間の
距離が近いものとなる。例えば、前述した語間類似度の
値が「０．３」である「バス」という概念語と「車」と
いう語の場合、語間距離は「１．４」という比較的遠い
距離を示す値となり、語間類似度の値が「０．９」であ
る「私」という概念語と「僕」という語の場合、語間距
離は「０．２」という近い距離を示す値となる。As a result, the inter-word distance is in a numerical range of “0 ≦ G ≦ 2”, and the closer the numerical value is to 0, the shorter the inter-word distance is. For example, in the case of the concept word “bus” and the word “car” having the above-mentioned word similarity value “0.3”, the word distance is “1.4” which is a value indicating a relatively long distance. In the case of the concept word “I” and the word “I” whose inter-word similarity is “0.9”, the inter-word distance is a value indicating a short distance of “0.2”.

【００６３】次に、対比文字列Ｂについての単語の序数
ｊに値１を加え（ステップＳ２６５）、序数ｊと対比文
字列Ｂの単語総数ｎとを比較し、序数ｊが総数ｎを超え
たと判断するまでステップＳ２２０に戻って上記の処理
を繰り返す（ステップＳ２７０）。序数ｊが総数ｎを超
えたと判断した場合には、入力文字列Ａ中の単語ａ１に
ついては対比文字列Ｂの各単語との類似度の判定が完了
したものとして、入力文字列Ａについての単語の序数ｉ
に値１を加え（ステップＳ２７５）、序数ｉと入力文字
列Ａの単語総数ｍとを比較する（ステップＳ２８０）。
序数ｊが総数ｎを超えていない場合には、ステップＳ２
１０に戻って上記の処理を繰り返す。序数ｊが総数ｎを
超えている場合には、入力文字列Ａ中の全ての単語と対
比文字列Ｂの全ての単語との間における類似度の判定が
完了したものとして、本ルーチンを終了する。Next, the value 1 is added to the ordinal number j of the word in the comparative character string B (step S265), and the ordinal number j is compared with the total number n of words in the comparative character string B. If the ordinal number j exceeds the total number n, The process returns to step S220 and repeats the above processing until it is determined (step S270). If it is determined that the ordinal number j has exceeded the total number n, it is determined that the similarity of the word a1 in the input character string A with each word in the contrast character string B has been determined, and the word of the input character string A Ordinal number i
Is added (step S275), and the ordinal number i is compared with the total word number m of the input character string A (step S280).
If the ordinal number j does not exceed the total number n, step S2
Returning to step 10, the above processing is repeated. If the ordinal number j exceeds the total number n, it is determined that the similarity between all the words in the input character string A and all the words in the comparison character string B has been determined, and this routine ends. .

【００６４】この単語間類似度判定処理が２つの文字列
について実際に行なわれた場合について説明する。図９
は、「新日米防衛協定締結のための指針については」と
いう入力文字列Ａと「新しい日米の協力ガイドラインに
関して」という対比文字列Ｂについて、単語間類似度判
定処理が行なわれた後の単語間情報記録テーブルＧＴの
様子を示す。The case where this inter-word similarity determination processing is actually performed for two character strings will be described. FIG.
Is obtained after the word similarity determination process is performed on the input character string A “About the guideline for concluding a new Japan-US defense agreement” and the comparison character string B “On the new Japan-US cooperation guidelines” 6 shows a state of an inter-word information recording table GT.

【００６５】入力文字列Ａは、図３のステップＳ１３０
の単語抽出処理により、接頭辞である「新」，名詞であ
る「日米」，サ変名詞である「防衛」，名詞である「協
定」，サ変名詞である「締結」，関係表現である「のた
めの」，名詞である「指針」，関係表現である「につい
て」および関係表現である「は」という総数９個の単語
に区分される。以下、これらの各単語をそれぞれ単語ａ
１，単語ａ２，単語ａ３，単語ａ４，単語ａ５，単語ａ
６，単語ａ７，単語ａ８，単語ａ９として説明する。一
方、対比文字列Ｂは、形容詞である「新しい」，名詞で
ある「日米」，関係表現である「の」，サ変名詞である
「協力」，名詞である「ガイドライン」および関係表現
である「に関して」という総数６個の単語に区分され
る。以下、これらの各単語をそれぞれ単語ｂ１，単語ｂ
２，単語ｂ３，単語ｂ４，単語ｂ５，単語ｂ６として説
明する。The input character string A is stored in step S130 in FIG.
By the word extraction process, the prefix "new", the noun "Japan and the United States", the sa noun "defense", the noun "agreement", the sa noun "conclusion", and the relational expression " For ", a noun" guideline ", a relational expression" about ", and a relational expression" ha "are divided into nine words in total. Hereinafter, each of these words is referred to as a word a
1, word a2, word a3, word a4, word a5, word a
6, description will be made as a word a7, a word a8, and a word a9. On the other hand, the comparison character string B is an adjective "new", a noun "Japan-US", a relational expression "no", a sa noun "cooperation", a noun "guideline", and a relational expression. It is divided into a total of six words “about”. Hereinafter, these words are referred to as word b1 and word b, respectively.
Description will be made as 2, word b3, word b4, word b5, and word b6.

【００６６】単語間情報記録テーブルＧＴには、これら
の９個の各単語と６個の各単語の全ての組み合わせにつ
いての語間類似度と語間距離が記録されている。例え
ば、入力文字列Ａ中の「新しい」という単語ａ１と対比
文字列Ｂ中の「新」という単語ｂ１との間のデータを記
録する欄（図９の表においてｉの値が１でｊの値が１の
場合）には、語間類似度の値として、概念語類義語辞書
３６ａへの登録値である「１．０」という値が、語間距
離の値として、「２×（１−１．０）」という計算式の
演算値である「０．０」という値が、それぞれ記録され
ている。The inter-word information recording table GT records the inter-word similarity and the inter-word distance for all combinations of these nine words and the six words. For example, a column for recording data between the word “new” in the input character string A and the word b1 “new” in the comparison character string B (in the table of FIG. 9, the value of i is 1 and j In the case where the value is 1, the value of “1.0” which is a registered value in the concept word synonym dictionary 36a is used as the value of the inter-word similarity, and the value of “2 × (1−1)” is used as the value of the inter-word distance. 1.0) ”, which is the calculated value of the calculation formula, is recorded.

【００６７】一方、入力文字列Ａ中の「新しい」という
単語ａ１と対比文字列Ｂ中の「日米」という単語ｂ２と
の間のデータを記録する欄（図９の表においてｉの値が
１でｊの値が２の場合）には、単語ａ１と単語ｂ２とは
それぞれ接辞と名詞であり、類義語辞書３６が参照され
ないので、語間類似度の値として「０．０」という最低
値が記録されている。この結果、語間距離の値として、
「２×（１−０．０）」という計算式の演算値である
「２．０」という最高値が、それぞれ記録されている。On the other hand, a column for recording data between the word "a1" of "new" in the input character string A and the word "b2" of "Japan-US" in the comparison character string B (the value of i in the table of FIG. 1 and the value of j is 2), the words a1 and b2 are affixes and nouns, respectively, and the synonym dictionary 36 is not referred to. Is recorded. As a result, as the value of the word distance,
The highest value of “2.0”, which is the operation value of the calculation formula of “2 × (1−0.0)”, is recorded.

【００６８】次に、図３のステップＳ１５０の単語列間
類似度判定処理の詳細につき、図１０から図２６を参照
しつつ説明する。図１０は、単語列間類似度判定ルーチ
ンＡを示すフローチャートである。本ルーチンは、ステ
ップＳ１４０の単語間類似度判定処理において判定され
た文字列を構成する各単語の語間距離の値から単語列同
士の類似度の値を求めるルーチンであり、単語間類似度
判定処理の終了とともに起動する。Next, the details of the inter-word string similarity determination processing in step S150 in FIG. 3 will be described with reference to FIGS. FIG. 10 is a flowchart showing the inter-word string similarity determination routine A. This routine is a routine for calculating a similarity value between word strings from a value of an inter-word distance of each word constituting a character string determined in the inter-word similarity determination processing in step S140. It starts when the processing ends.

【００６９】本ルーチンが起動されると、まず、一方の
単語列の単語と類似する単語が他方の文字列に存在しな
い場合における距離の加算値を設定する脱落コスト設定
処理を行ない（ステップＳ３００）、次に、単語列間の
距離を演算する単語列間距離演算処理を行なう（ステッ
プＳ３２０）。最後に、単語列間の距離の値を用いて文
字列間の類似度を演算する演算処理を行なって（ステッ
プＳ３４０）、本ルーチンを終了し、次の判定結果出力
処理（図５のステップＳ１６０）に移る。以下、本ルー
チンの３つのステップを、それぞれ「脱落コスト設定処
理」，「単語列間距離演算処理」，「単語列間類似度の
演算処理」として、詳細に説明する。When this routine is started, first, a dropout cost setting process for setting an added value of the distance when a word similar to a word in one word string does not exist in the other character string is performed (step S300). Next, a word string distance calculation process for calculating the distance between word strings is performed (step S320). Finally, a calculation process of calculating the similarity between the character strings using the value of the distance between the word strings is performed (step S340), and this routine is terminated, and the next determination result output processing (step S160 in FIG. 5) Move to). Hereinafter, the three steps of this routine will be described in detail as “dropout cost setting processing”, “inter-word string distance calculation processing”, and “inter-word string similarity calculation processing”, respectively.

【００７０】まず、ステップＳ３００の脱落コスト設定
処理について、図１１および図１２を参照しつつ説明す
る。図１１は、脱落コスト設定ルーチンを示すフローチ
ャートであり、図１２は、脱落コスト設定処理により座
標軸が設定された距離グラフＹＧを示す。距離グラフＹ
Ｇは、文字列間の類否を判断する前提として、相互に類
似しているとすべき単語とそうでない単語とを区別して
表わすためのグラフであり、その横軸には入力文字列Ａ
を構成する各単語ａ１〜ａｍが、縦軸には対比文字列Ｂ
を構成する各単語ｂ１〜ｂｍが割り付けられている。First, the drop-out cost setting process in step S300 will be described with reference to FIG. 11 and FIG. FIG. 11 is a flowchart showing a dropout cost setting routine, and FIG. 12 shows a distance graph YG in which coordinate axes are set by dropout cost setting processing. Distance graph Y
G is a graph for distinguishing and indicating words that should be similar to each other and words that are not similar, on the premise of judging the similarity between character strings.
Are composed of words a1 to am, and a vertical axis represents a comparison character string B.
Are assigned to the words b1 to bm.

【００７１】図１１の脱落コスト設定設定ルーチンは、
図１０の単語列間類似度判定ルーチンＡの起動に伴って
起動する。以後、入力文字列Ａ中の単語ａ１から単語ａ
ｉまでの単語列と対比文字列Ｂ中の単語ｂ１から単語ｂ
ｊまでの単語列との間の距離を、ｄ（ａｉ，ｂｊ）とし
て説明する。The drop-off cost setting routine shown in FIG.
It starts with the start of the inter-word string similarity determination routine A in FIG. Thereafter, the words a1 to a in the input character string A
Word string up to i and word b1 to word b in comparison string B
The distance from the word string up to j will be described as d (ai, bj).

【００７２】本ルーチンが起動されると、まず、入力文
字列Ａ中における単語の序数ｉと入力文字列Ｂ中におけ
る単語の序数ｊの値を０（ゼロ）にセットし（ステップ
Ｓ４００）、ｄ（ａ０，ｂ０）の値、即ち、入力文字列
Ａと対比文字列Ｂとの対比前における入力文字列Ａと対
比文字列Ｂとの間の距離の値を０（ゼロ）として設定す
る処理を行なう（ステップＳ４１０）。この処理によ
り、距離グラフＹＧにおける文字列間の距離を計測する
開始点が、距離グラフＹＧ上の原点Ｏ（オー）として決
定される（図１２のを参照）。When this routine is started, first, the values of the ordinal number i of the word in the input character string A and the ordinal number j of the word in the input character string B are set to 0 (step S400), and d is set. A process of setting the value of (a0, b0), that is, the value of the distance between the input character string A and the comparison character string B before the comparison between the input character string A and the comparison character string B as 0 (zero). Perform (step S410). With this processing, the starting point for measuring the distance between the character strings in the distance graph YG is determined as the origin O on the distance graph YG (see FIG. 12).

【００７３】次に、入力文字列Ａ中における単語ａの序
数ｉの値を１にセットした後（ステップＳ４２０）、こ
のときのｄ（ａｉ，ｂｊ）の値を、ｄ｛ａ（ｉ−１），
ｂｊ｝の値に単語ａｉの脱落コストｒの値を加えたもの
に設定する処理を行なう（ステップＳ４３０）。この処
理は、「単語ｂ１から単語ｂｊまでの単語列の中に、単
語ａｉと意味の類似する単語が存在しなかった場合に
は、単語ａｉの直前の単語までの単語列と単語ｂ１から
単語ｂｊまでの単語列との距離に距離ｒを付加する」と
いうことを意味する。例えば、単語ａの序数ｉの値が１
の場合には、ｄ（ａ１，ｂ０）の値として、ｄ（ａ０，
ｂ０）の値である０（ゼロ）に単語ａ１の脱落コストｒ
の値を加えた「ｒ」という値が設定される。この処理に
より、距離グラフＹＧ上において、原点Ｏ（オー）から
横軸上の単語ａ１までの距離が「ｒ」として設定される
（図１２のを参照）。Next, after the value of the ordinal number i of the word a in the input character string A is set to 1 (step S420), the value of d (ai, bj) at this time is changed to d ｛a (i−1) ),
A process of setting the value of bj # to the value of the drop cost r of the word ai is performed (step S430). This processing is performed as follows: “If a word having a similar meaning to the word ai does not exist in the word string from the word b1 to the word bj, the word string up to the word immediately before the word ai and the word The distance r is added to the distance from the word string up to bj. " For example, if the value of ordinal i of word a is 1
In the case of, as the value of d (a1, b0), d (a0,
The drop cost r of the word a1 is set to 0 (zero) which is the value of b0).
Is added to the value of “r”. With this processing, the distance from the origin O (O) to the word a1 on the horizontal axis is set as “r” on the distance graph YG (see FIG. 12).

【００７４】次に、入力文字列Ａ中における単語ａの序
数ｉの値に１を加え（ステップＳ４３５）、序数ｉの値
が、入力文字列Ａの単語の総数ｍの値を超えているか否
かを判断し（ステップＳ４４０）、序数ｉの値が総数ｍ
の値を超えていると判断されるまで、ステップＳ４３０
に戻って上記の処理を繰り返す。例えば、ステップＳ４
３５で序数ｉの値が２とされた場合には、ステップＳ４
３０の演算処理により、ｄ（ａ２，ｂ０）の値が、先に
求めたｄ（ａ１，ｂ０）の値ｒに脱落コストの値ｒを付
加した２ｒという値に設定される。この結果、距離グラ
フＹＧ上において、原点Ｏ（オー）から横軸上の単語ａ
２までの距離が「２ｒ」として設定される（図１２の
を参照）。Next, 1 is added to the value of the ordinal number i of the word a in the input character string A (step S435), and whether or not the value of the ordinal number i exceeds the value of the total number m of words of the input character string A is determined. (Step S440), and the value of the ordinal number i is equal to the total number m.
Step S430 until it is determined that the value exceeds
And the above processing is repeated. For example, step S4
If the value of the ordinal i is set to 2 at 35, step S4
By the calculation processing of 30, the value of d (a2, b0) is set to a value of 2r obtained by adding the value r of the dropout cost to the value r of d (a1, b0) previously obtained. As a result, on the distance graph YG, the word a on the horizontal axis is shifted from the origin O (O).
The distance to 2 is set as “2r” (see FIG. 12).

【００７５】このような繰り返し処理により、ｄ（ａ
１，ｂ０）からｄ（ａｍ，ｂ０）までの値が設定され
る。この結果、距離グラフＹＧの横軸の各単語は、入力
文字列Ａの各単語の脱落コストｒの値に等分されて割り
付けられる（図１２のを参照）。By such repetitive processing, d (a
1, (b0) to d (am, b0) are set. As a result, each word on the horizontal axis of the distance graph YG is equally divided and assigned to the value of the drop cost r of each word of the input character string A (see FIG. 12).

【００７６】ステップＳ４４０で序数ｉの値が総数ｍの
値を超えていると判断した場合には、入力文字列Ａ中に
おける単語の序数ｉの値を０（ゼロ）に、入力文字列Ｂ
中における単語の序数ｉの値を１にそれぞれセットした
後（ステップＳ４５０）、このときのｄ（ａｉ，ｂｊ）
の値を、ｄ｛ａｉ，ｂ（ｊ−１）｝の値に単語ｂｊの脱
落コストｑの値を加えたものに設定する処理を行なう
（ステップＳ４６０）。この処理は、「単語ａ１から単
語ａｉまでの単語列の中に、単語ｂｊと意味の類似する
単語が存在しなかった場合には、単語ｂｊの直前の単語
までの単語列と単語ａ１から単語ｂｊまでの単語列との
距離に距離ｑを付加する」ということを意味する。例え
ば、単語ｂの序数ｊの値が１の場合には、ｄ（ａ０，ｂ
１）の値として、ｄ（ａ０，ｂ０）の値である０（ゼ
ロ）に単語ｂ１の脱落コストｑの値を加えた「ｑ」とい
う値が設定される。この処理により、距離グラフＹＧ上
において、原点Ｏ（オー）から縦軸上の単語ｂ１までの
距離が「ｑ」として設定される（図１２のを参照）。If it is determined in step S440 that the value of the ordinal number i exceeds the value of the total number m, the value of the ordinal number i of the word in the input character string A is set to 0 (zero) and the input character string B
After setting the value of the ordinal number i of the word in it to 1 (step S450), d (ai, bj) at this time
Is set to the value obtained by adding the value of the drop cost q of the word bj to the value of d {ai, b (j-1)} (step S460). This processing is performed as follows: “If a word having a similar meaning to the word bj does not exist in the word string from the word a1 to the word ai, the word string up to the word immediately before the word bj and the word The distance q is added to the distance to the word string up to bj ”. For example, if the value of the ordinal j of the word b is 1, d (a0, b
As the value of 1), a value “q” is set, which is obtained by adding the value of the drop cost q of the word b1 to 0 (zero) which is the value of d (a0, b0). By this processing, the distance from the origin O (O) to the word b1 on the vertical axis is set as “q” on the distance graph YG (see FIG. 12).

【００７７】次に、対比文字列Ｂ中における単語ｂの序
数ｊの値に１を加え（ステップＳ４６５）、序数ｊの値
が、対比文字列Ｂの単語の総数ｎの値を超えているか否
かを判断し（ステップＳ４７０）、序数ｊの値が総数ｎ
の値を超えていると判断されるまで、ステップＳ４６０
に戻って上記の処理を繰り返す。例えば、ステップＳ４
６５で序数ｊの値が２とされた場合には、ステップＳ４
６０の演算処理により、ｄ（ａ０，ｂ２）の値が、先に
求めたｄ（ａ０，ｂ１）の値ｑに脱落コストの値ｑを付
加した２ｑという値に設定される。この結果、距離グラ
フＹＧ上において、原点Ｏ（オー）から縦軸上の単語ｂ
２までの距離が「２ｑ」として設定される（図１２の
を参照）。Next, 1 is added to the value of the ordinal j of the word b in the comparison character string B (step S465), and whether or not the value of the ordinal j exceeds the value of the total number n of words in the comparison character string B is determined. Is determined (step S470), and the value of the ordinal j is equal to the total number n.
Step S460 until it is determined that the value exceeds
And the above processing is repeated. For example, step S4
If the value of the ordinal j is set to 2 at 65, step S4
By the calculation processing of 60, the value of d (a0, b2) is set to a value of 2q obtained by adding the value q of the dropout cost to the value q of d (a0, b1) obtained earlier. As a result, on the distance graph YG, the word b on the vertical axis is shifted from the origin O (O).
The distance to 2 is set as “2q” (see FIG. 12).

【００７８】このような繰り返し処理により、ｄ（ａ
０，ｂ０）からｄ（ａ０，ｂｎ）までの値が設定され
る。この結果、距離グラフＹＧの縦軸の各単語は、対比
文字列Ｂの各単語の脱落コストｑの値に等分されて割り
付けられる（図１２のを参照）。By such a repetitive processing, d (a
0, b0) to d (a0, bn) are set. As a result, each word on the vertical axis of the distance graph YG is equally divided and assigned to the value of the drop cost q of each word of the contrast character string B (see FIG. 12).

【００７９】ステップＳ４７０で序数ｊの値が総数ｎの
値を超えていると判断した場合には、本ルーチンを終了
し、次の単語列間距離演算処理（図１０のステップＳ３
２０）へ移る。この結果、距離グラフＹＧ上において各
単語ａ１〜ａｍおよび各単語ｂ１〜ｂｎが横軸および縦
軸に割り付けられる位置が確定される。If it is determined in step S470 that the value of the ordinal j exceeds the value of the total number n, the routine ends, and the next inter-word string distance calculation processing (step S3 in FIG. 10)
Move to 20). As a result, positions where the words a1 to am and the words b1 to bn are allocated to the horizontal axis and the vertical axis on the distance graph YG are determined.

【００８０】なお、本実施例では、脱落コストｒ，ｑの
値を「１」とするが、比較される文中における単語の脱
落の頻度や重要性に応じ、これ以外の数値を採用するも
のとしても差し支えない。In this embodiment, the values of the drop costs r and q are set to “1”. However, according to the frequency and importance of the drop of the word in the compared sentences, it is assumed that other numerical values are used. No problem.

【００８１】以上のように各単語が割り付けられた距離
グラフＹＧ上における、各単語列間の距離の表わし方に
つき、図１３に基づいて説明する。入力文字列Ａが総数
ｍ個の単語から、対比文字列Ｂが総数ｎ個の単語から構
成される場合には、始点である原点Ｏから終点である座
標点（ａｍ，ｂｎ）までの長さが入力文字列Ａと対比文
字列Ｂとの文字列間の距離となる。この距離が最短とな
る場合を白色の矢印で、距離が最長となる場合を斜線付
きの矢印で示す。この「距離が最長となる場合」とは、
対比文字列Ｂ中に、入力文字列Ａを構成する各単語と類
似する単語が全く存在しない場合である。この場合に
は、単語ａ１から単語ａｍまでの各単語および単語ｂ１
から単語ｂｍまでの各単語が全て脱落していることにな
るので、入力文字列Ａと対比文字列Ｂとの距離は、「脱
落コストｒ×ｍ個＋脱落コストｑ×ｎ個」の値である
「ｍｒ＋ｑｎ」となる。The way of expressing the distance between each word string on the distance graph YG to which each word is allocated as described above will be described with reference to FIG. If the input character string A is composed of m words in total and the comparison character string B is composed of n words in total, the length from the origin O as the starting point to the coordinate point (am, bn) as the ending point Is the distance between the character strings of the input character string A and the comparison character string B. The case where the distance is the shortest is indicated by a white arrow, and the case where the distance is the longest is indicated by a hatched arrow. This "when the distance is the longest"
This is a case where no word similar to each word constituting the input character string A exists in the comparison character string B at all. In this case, each word from word a1 to word am and word b1
Therefore, the distance between the input character string A and the contrast character string B is represented by the value of “dropping cost r × m + dropping cost q × n”. It becomes a certain “mr + qn”.

【００８２】次に、ステップＳ３２０の単語列間距離演
算処理Ａについて、図１４から図２４までを参照しつつ
説明する。この単語列間距離演算処理Ａでは、各単語間
の語間距離の値や各単語の脱落コストｒ，ｑの値のよう
な個々の単語に関する情報を用いて、入力文字列Ａを構
成する各単語列と対比文字列Ｂを構成する各単語列との
距離を求める。この処理手順を、図１４の単語列間距離
演算ルーチンＡに示す。本ルーチンは、脱落コスト設定
処理の終了とともに起動する。Next, the inter-word string distance calculation processing A in step S320 will be described with reference to FIGS. In this inter-word string distance calculation processing A, each of the input character strings A constituting the input character string A is used by using information about each word such as the value of the inter-word distance between each word and the value of the drop cost r, q of each word. The distance between the word string and each of the word strings forming the comparison character string B is determined. This processing procedure is shown in the word string distance calculation routine A in FIG. This routine starts when the dropout cost setting process ends.

【００８３】本ルーチンが起動されると、まず、入力文
字列Ａ中における単語の序数ｉの値を１にセットし（ス
テップＳ５００）、と入力文字列Ｂ中における単語の序
数ｊの値を１にセットする（ステップＳ５１０）。When this routine is started, first, the value of the ordinal number i of the word in the input character string A is set to 1 (step S500), and the value of the ordinal number j of the word in the input character string B is set to 1 (Step S510).

【００８４】次に、入力文字列Ａ中の単語ａ１から単語
ａｉまでの単語列と対比文字列Ｂ中の単語ｂ１から単語
ｂｊまでの単語列との間の距離であるｄ（ａｉ，ｂｊ）
の値を求める。この値は、以下の要領で求められる。ま
ず、３つの値Ｘ１，Ｘ２，Ｘ３を求める処理を行なう
（ステップＳ５１５，Ｓ５２０，Ｓ５２５）。値Ｘ１
は、単語ａｉの直前の単語までの単語列と単語ｂｊの直
前の単語までの単語列との間の距離であるｄ｛ａ（ｉ−
１），ｂ（ｊ−１）｝の値に、単語ａｉと単語ｂｊとの
語間距離である２｛１−ｔ（ａｉ，ｂｊ）｝の値を加え
ることにより求める（以下、この値をＸ１値という）。
値Ｘ２は、単語ａｉの直前の単語までの単語列と単語ｂ
ｊまでの単語列との間の距離であるｄ｛ａ（ｉ−１），
ｂｊ｝の値に、単語ａｉの脱落コストｒの値を加えるこ
とにより求める（以下、この値をＸ２値という）。値Ｘ
３は、単語ａｉまでの単語列と単語ｂｊの直前の単語ま
での単語列との間の距離であるｄ｛ａｉ，ｂ（ｊ−
１）｝の値に、単語ｂｊの脱落コストｑの値を加えるこ
とにより求める（以下、この値をＸ３値という）。Next, d (ai, bj) is the distance between the word string from the word a1 to the word ai in the input character string A and the word string from the word b1 to the word bj in the comparison character string B.
Find the value of This value is obtained in the following manner. First, processing for obtaining three values X1, X2, and X3 is performed (steps S515, S520, and S525). Value X1
Is the distance d ｛a (i-
1), b (j-1)} and the value of 2 {1-t (ai, bj)}, which is the inter-word distance between the word ai and the word bj (hereinafter, this value is referred to as X1 value).
The value X2 is the word string up to the word immediately before the word ai and the word b
d ｛a (i−1), which is the distance from the word string up to j
It is obtained by adding the value of the drop cost r of the word ai to the value of bjｂ (this value is hereinafter referred to as X2 value). Value X
3 is the distance between the word string up to the word ai and the word string up to the word immediately before the word bj, d ｛ai, b (j−
1) It is obtained by adding the value of the drop cost q of the word bj to the value of｝ (hereinafter, this value is referred to as X3 value)

【００８５】次に、これらのＸ１値からＸ３値までの値
のうちの最も小さい値をｄ（ａｉ，ｂｊ）の値としてＲ
ＡＭ２６上の単語列間距離記録テーブルＤＬに記憶し
（ステップＳ５３５，Ｓ５４０）、この値を、入力文字
列Ａ中の単語ａ１から単語ａｉまでの単語列と対比文字
列Ｂ中の単語ｂ１から単語ｂｊまでの単語列との間の距
離として決定する。即ち、単語ａ１から単語ａｉまでの
単語列と対比文字列Ｂ中の単語ｂ１から単語ｂｊまでの
単語列との間の距離を求める際に、単語ａ１から「単語
ａｉの直前の単語」までの距離、または単語ｂ１から
「単語ｂｊの直前の単語」までの距離しか考慮しないの
で、入力文字列Ａと対比文字列Ｂとの間において単語間
の類否関係が２組以上交差することは、必然的に禁止さ
れる。Next, the smallest value among the values from X1 to X3 is defined as d (ai, bj) and R
It is stored in the word string distance recording table DL on the AM 26 (steps S535 and S540), and this value is stored in the input character string A from the word string from the word a1 to the word ai and the word string from the word b1 in the comparison character string B It is determined as the distance from the word string up to bj. That is, when calculating the distance between the word string from the word a1 to the word ai and the word string from the word b1 to the word bj in the comparison character string B, the distance from the word a1 to the "word immediately before the word ai" Since only the distance or the distance from the word b1 to the “word immediately before the word bj” is considered, two or more sets of similarity between words between the input character string A and the comparison character string B intersect. Inevitably banned.

【００８６】次に、対比文字列Ｂ中における単語ｂの序
数ｊの値に１を加え（ステップＳ５５０）、序数ｊの値
が、対比文字列Ｂの単語の総数ｎの値を超えているか否
かを判断し（ステップＳ５６０）、序数ｊの値が総数ｎ
の値を超えていると判断されるまで、ステップＳ５１５
に戻って上記の処理を繰り返す。これによって、入力文
字列Ａ中の単語ａ１までの単語列と対比文字列Ｂ中の単
語ｂ１から単語ｂｊまでの各単語列との距離が順次求め
られる。Next, 1 is added to the value of the ordinal number j of the word b in the comparison character string B (step S550). Is determined (step S560), and the value of the ordinal j is equal to the total number n.
Step S515 until it is determined that the value exceeds
And the above processing is repeated. Thus, the distance between the word string up to the word a1 in the input character string A and each word string from the word b1 to the word bj in the comparison character string B is sequentially obtained.

【００８７】ステップＳ５６０で序数ｊの値が総数ｎの
値を超えていると判断した場合には、入力文字列Ａ中に
おける単語の序数ｉの値に１を加え（ステップＳ５７
０）、序数ｉの値が入力文字列Ａ中における単語の総数
ｍの値を超えていると判断されるまで、ステップＳ５１
０に戻って上記の処理を繰り返す。これによって、入力
文字列Ａ中の単語ａ１から単語ａｉまでの各単語列と対
比文字列Ｂ中の単語ｂ１から単語ｂｊまでの各単語列と
の距離が順次求められ、最後に、単語ａ１から単語ａｍ
までの単語列と単語ｂ１から単語ｂｎまでの単語列との
間の距離ｄ（ａｍ，ｂｎ）の値が求められる。If it is determined in step S560 that the value of the ordinal j exceeds the value of the total number n, 1 is added to the value of the ordinal i of the word in the input character string A (step S57).
0), until it is determined that the value of the ordinal number i exceeds the value of the total number m of words in the input character string A, step S51.
Returning to 0, the above processing is repeated. Thereby, the distance between each word string from the word a1 to the word ai in the input character string A and each word string from the word b1 to the word bj in the comparison character string B is sequentially obtained. Word am
The value of the distance d (am, bn) between the word string up to and the word string from the word b1 to the word bn is determined.

【００８８】即ち、単語列間の距離ｄ（ａｉ，ｂｊ）
は、語順の対応関係を考慮して、全ての単語ａ１〜ａ
ｍ，ｂ１〜ｂｎについて求められ、この際、図５のステ
ップＳ２５０で設定された単語ａ１〜ａｍ，ｂ１〜ｂｎ
についての語間類似度ｔ（ａｉ，ｂｊ）の値が用いられ
る。例えば、単語ａ１までの単語列と単語ｂ１から単語
ｂ２までの単語列との距離であるｄ（ａ１，ｂ２）や単
語ａ１から単語ａ２までの単語列と単語ｂ１までの単語
列との距離であるｄ（ａ２，ｂ１），単語ａ１から単語
ａ２までの単語列と単語ｂ１から単語ｂ２までの単語列
との距離であるｄ（ａ２，ｂ２）を求める場合には、単
語ａ１と単語ｂ１との語間類似度ｔ（ａ１，ｂ１）の値
が用いられる。また、単語ａ１から単語ａ７までの単語
列と単語ｂ１から単語ｂ３までの単語列との距離である
ｄ（ａ７，ｂ３）や単語ａ１から単語ａ６までの単語列
と単語ｂ１から単語ｂ４までの単語列との距離であるｄ
（ａ６，ｂ４），単語ａ１から単語ａ７までの単語列と
単語ｂ１から単語ｂ４までの単語列との距離であるｄ
（ａ７，ｂ４）を求める場合には、文字列中での出現順
が異なる単語同士である単語ａ６と単語ｂ３との語間類
似度ｔ（ａ６，ｂ３）の値が用いられる。That is, the distance d (ai, bj) between the word strings
Are all words a1 to a
m, b1 to bn, and at this time, the words a1 to am, b1 to bn set in step S250 in FIG.
The value of the inter-word similarity t (ai, bj) for is used. For example, d (a1, b2), which is the distance between the word string from word a1 to the word string from word b1 to word b2, and the distance between the word string from word a1 to word a2, and the word string from word b1 When d (a2, b2), which is the distance between a word string from word a1 to word a2 and a word string from word b1 to word b2, is determined by d (a2, b1). The value of the inter-word similarity t (a1, b1) is used. Also, d (a7, b3) which is the distance between the word string from the word a1 to the word a7 and the word string from the word b1 to the word b3, the word string from the word a1 to the word a6, and the word string from the word b1 to the word b4 D which is the distance from the word string
(A6, b4), d which is the distance between the word string from word a1 to word a7 and the word string from word b1 to word b4
When calculating (a7, b4), the value of the inter-word similarity t (a6, b3) between the words a6 and b3, which are words having different appearance orders in the character string, is used.

【００８９】ステップＳ５７０で序数ｉの値が総数ｍの
値を超えていると判断した場合には、入力文字列Ａを構
成する単語列と対比文字列Ｂを構成する単語列との距離
ｄ（ａｍ，ｂｎ）が求められたとして、本ルーチンを終
了し、次の文字列間類似度の演算処理（図１０のステッ
プＳ３４０）へ移る。If it is determined in step S570 that the value of the ordinal i exceeds the value of the total number m, the distance d () between the word string forming the input character string A and the word string forming the comparison character string B is determined. am, bn), the routine ends, and the process proceeds to the next character string similarity calculation process (step S340 in FIG. 10).

【００９０】以上の単語列間距離演算処理の内容を距離
グラフＹＧを用いつつ具体例に即して説明する。ここで
は、単語列間距離演算処理が、前述した「新／日米／防
衛／協定／締結／のための／指針／について／は」とい
う入力文字列Ａと「新しい／日米／の／協力／ガイドラ
イン／に関して」という対比文字列Ｂに対して行なわれ
た場合を例にとって説明する。前述したように、入力文
字列Ａは、単語ａ１から単語ａ９までの９個の単語を、
対比文字列Ｂは単語ｂ１から単語ｂ６までの６個の単語
をそれぞれ含む単語列とされている。The contents of the above-described inter-word string distance calculation processing will be described with reference to a specific example using the distance graph YG. Here, the word string distance calculation processing is performed by using the input character string A of “new / Japan / US / defense / agreement / conclusion / for / guideline / about /” and “new / Japan / US / cooperation” / Guideline / "will be described as an example. As described above, the input character string A is composed of nine words from the word a1 to the word a9,
The comparison character string B is a word string including six words from the word b1 to the word b6.

【００９１】この２つの単語列について、ステップＳ５
００およびステップＳ５１０の処理により序数ｉの値と
序数ｊの値とがともに１にセットされた場合には、ステ
ップＳ５１５からステップＳ５３５までの演算処理によ
り、ｄ（ａ１，ｂ１）の値が求められる。この演算の過
程および結果を一時的に記憶した演算バッファＥＴの様
子を図１５に示す。即ち、ステップＳ５１５の演算処理
によりＸ１値として０（ゼロ）が、ステップＳ５２０の
演算処理によりＸ２値として２が、ステップＳ５２５の
演算処理によりＸ３値として２が、それぞれ求められ、
演算バッファＥＴには、ｄ（ａ１，ｂ１）の値として、
このうちの最小の値であるＸ１値の０（ゼロ）が記憶さ
れている。For these two word strings, step S5
When both the value of the ordinal i and the value of the ordinal j are set to 1 by the processing of 00 and step S510, the value of d (a1, b1) is obtained by the arithmetic processing from step S515 to step S535. . FIG. 15 shows the state of the operation buffer ET which temporarily stores the process and the result of this operation. That is, 0 (zero) is obtained as the X1 value by the operation processing of step S515, 2 is obtained as the X2 value by the operation processing of step S520, and 2 is obtained as the X3 value by the operation processing of step S525.
In the operation buffer ET, as the value of d (a1, b1),
0 (zero) of the X1 value which is the minimum value among them is stored.

【００９２】このｄ（ａ１，ｂ１）の０（ゼロ）という
値が、「新」という単語ａ１からなる単語列と「新し
い」という単語ｂ１からなる単語列との距離となる。こ
のことを図１６の距離グラフＹＧを参照しつつ説明す
る。The value of 0 (zero) of d (a1, b1) is the distance between the word string composed of the word “new” and the word string composed of the word “new” b1. This will be described with reference to the distance graph YG in FIG.

【００９３】距離グラフＹＧ上において、「新」という
単語ａ１からなる単語列と「新しい」という単語ｂ１か
らなる単語列との距離は、原点Ｏから座標（ａ１，ｂ
１）に至るまでの到達経路として表わされる。図１６に
示すように、この到達経路には、原点Ｏから直接に座標
（ａ１，ｂ１）に至る第１の経路，座標（０，ｂ１）を
経由して座標（ａ１，ｂ１）に至る第２の経路，座標
（０，ｂ１）を経由して座標（ａ１，ｂ１）に至る第３
の経路がある。ステップＳ５１５の演算結果であるＸ１
値は、この３つの経路のうちの第１の経路を通る場合に
かかる距離の値を示し、ステップＳ５２０の演算結果で
あるＸ２値およびステップＳ５２５の演算結果であるＸ
３値は、それぞれ第２の経路および第３の経路を通る場
合にかかる距離の値を示している。On the distance graph YG, the distance between the word string consisting of the word “new” a1 and the word string consisting of the word “new” b1 is represented by coordinates (a1, b)
It is represented as a reaching route to 1). As shown in FIG. 16, this arrival path includes a first path from the origin O directly to the coordinates (a1, b1), and a first path to the coordinates (a1, b1) via the coordinates (0, b1). The third route which reaches the coordinates (a1, b1) via the second route, the coordinates (0, b1)
There is a route. X1 which is the calculation result of step S515
The value indicates the value of the distance required when passing through the first of the three routes, and is the X2 value that is the calculation result of step S520 and the X2 value that is the calculation result of step S525.
The three values indicate the value of the distance required when passing through the second route and the third route, respectively.

【００９４】第１の経路を通って座標（ａ１，ｂ１）へ
到達した場合にかかる距離の値は０（ゼロ）であり、こ
の距離の値は、第２の経路および第３の経路を通ったと
きに必要な距離の値である２（＝ｒ＋ｑ）よりも小さ
い。よって、第１の経路を通るときのＸ１値の０（ゼ
ロ）が、単語ａ１からなる単語列と単語ｂ１からなる単
語列との距離ｄ（ａ１，ｂ１）の値となる。この値とな
るときの経路を、図１６に斜線付きの矢印で示す。The value of the distance when the coordinates (a1, b1) reach the coordinates (a1, b1) through the first route is 0 (zero), and the value of this distance is determined by passing through the second route and the third route. Is smaller than 2 (= r + q), which is the value of the distance required when Therefore, 0 (zero) of the X1 value when passing through the first route is the value of the distance d (a1, b1) between the word string including the word a1 and the word string including the word b1. The path at which this value is reached is indicated by the hatched arrow in FIG.

【００９５】次に、ステップＳ５５０の処理により、序
数ｊの値が２とされた場合の処理内容について説明す
る。ステップＳ５１５からステップＳ５２５までの演算
処理により、Ｘ１値として値３が、Ｘ２値として値３
が、Ｘ３値として値１がそれぞれ求められ、このうちの
最小の値であるＸ３値の値である１が、ｄ（ａ１，ｂ
２）の値として演算バッファＥＴに一時的に記憶され
る。この様子を図１７に示す。Next, a description will be given of the processing contents when the value of the ordinal j is set to 2 by the processing of step S550. By the arithmetic processing from step S515 to step S525, the value 3 is set as the X1 value and the value 3 is set as the X2 value.
Is obtained as the X3 value, and the value 1 of the X3 value, which is the minimum value, is calculated as d (a1, b
The value of 2) is temporarily stored in the calculation buffer ET. This is shown in FIG.

【００９６】このｄ（ａ１，ｂ１）の０（ゼロ）という
値が、「新」という単語ａ１からなる単語列と「新しい
／日米」という単語ｂ１および単語ｂ２からなる単語列
との距離となる。このことを図１８の距離グラフＹＧを
参照しつつ説明する。The value of 0 (zero) of d (a1, b1) is determined by the distance between the word string consisting of the word "new" and the word string consisting of the words "new / US" b1 and b2. Become. This will be described with reference to the distance graph YG in FIG.

【００９７】図１８に示すように、原点Ｏから座標（ａ
１，ｂ２）に至るまでの到達経路には、原点Ｏから座標
（０，ｂ１）を経由して座標（ａ１，ｂ２）に至る第１
の経路，座標（０，ｂ１）および座標（０，ｂ２）を経
由して座標（ａ１，ｂ２）に至る第２の経路，座標（ａ
１，ｂ１）を経由して座標（ａ１，ｂ２）に至る第３の
経路がある。ステップＳ５１５の演算結果であるＸ１値
は、この３つの経路のうちの第１の経路を通る場合にか
かる距離の値を示し、ステップＳ５２０の演算結果であ
るＸ２値およびステップＳ５２５の演算結果であるＸ３
値は、それぞれ第２の経路および第３の経路を通る場合
にかかる距離の値を示している。As shown in FIG. 18, the coordinates (a
The first path from the origin O to the coordinates (a1, b2) via the coordinates (0, b1) includes a reaching path leading to the coordinates (1, b2).
The second route and the coordinates (a) to the coordinates (a1, b2) via the route, the coordinates (0, b1) and the coordinates (0, b2)
There is a third route to the coordinates (a1, b2) via (1, b1). The X1 value that is the calculation result of step S515 indicates the value of the distance required when passing through the first of the three paths, and is the X2 value that is the calculation result of step S520 and the calculation result of step S525. X3
The value indicates the value of the distance required when passing through the second route and the third route, respectively.

【００９８】即ち、第１の経路を通る場合とは、「新し
い／日米」という単語列のうち、「新しい」という単語
ｂ１については、「新」という単語ａ１からなる単語列
から脱落している単語とし、一方、「日米」という単語
ｂ２については、「新」という単語ａ１と類似関係にあ
るとみなして、双方の単語列間の距離を測定した場合を
意味する。第２の経路を通る場合とは、「新しい」およ
び「日米」という単語ｂ１および単語ｂ２とは、ともに
「新」という単語ａ１からなる単語列から脱落している
単語であり、また、単語ｂ２「新」という単語ａ１も、
「新しい／日米」という単語ｂ１および単語ｂ２からな
る単語列から脱落した単語である、とみなして単語列間
の距離を測定した場合を意味する。また、第３の経路を
通る場合とは、「新」という単語ａ１と「新しい」とい
う単語ｂ１とを相互に類似する単語であり、「日米」と
いう単語ｂ２は、単語ａ１からなる単語列から脱落した
単語である、とみなして単語列間の距離を測定した場合
を意味する。That is, the case of passing through the first route means that the word b1 of "new" in the word string of "new / Japan / US" drops from the word string of the word a1 of "new". On the other hand, it means that the word b2 of "Japan and the United States" is regarded as having a similar relationship with the word a1 of "new", and the distance between both word strings is measured. In the case of passing through the second route, the words b1 and b2 of "new" and "Japan-US" are words that are both dropped from the word string consisting of the word a1 of "new", and b2 The word "new" a1 is also
This means that the distance between the word strings is measured assuming that the word is a word dropped from the word string composed of the words "new / Japan / US" b1 and b2. The case of passing through the third route is a word in which the word “new” a1 and the word “new” b1 are similar to each other. The word “b2” of “Japan-US” is a word string composed of the word a1. This means that the distance between word strings is measured assuming that the word is dropped from the word string.

【００９９】第３の経路を通って座標（ａ１，ｂ２）へ
到達した場合にかかる距離の値は１であり、この距離の
値は、第１の経路を通ったときに必要な距離の値である
３（＝ｑ＋２）や第２の経路を通ったときに必要な距離
の値である３（＝２ｑ＋ｒ）よりも小さい。よって、第
３の経路を通るときのＸ３値の１が、単語ａ１からなる
単語列と単語ｂ１および単語ｂ２からなる単語列との距
離ｄ（ａ１，ｂ２）の値となる。この値となるときの経
路を、図１８に斜線付きの矢印で示す。The value of the distance when the coordinates (a1, b2) reach the coordinates (a1, b2) through the third route is 1, and the value of the distance is the value of the distance required when the vehicle passes through the first route. Is smaller than 3 (= q + 2) or 3 (= 2q + r), which is the value of the distance required when passing through the second route. Therefore, the X3 value of 1 when passing through the third route is the value of the distance d (a1, b2) between the word string consisting of the word a1 and the word string consisting of the words b1 and b2. The route at which this value is reached is indicated by the hatched arrow in FIG.

【０１００】このような演算処理の繰り返しにより、
「新」という単語ａ１からなる単語列と、「新しい」と
いう単語ｂ１からなる単語列，「新しい／日米」という
単語ｂ１および単語ｂ２からなる単語列，「新しい／日
米／の」という単語ｂ１から単語ｂ３までからなる単語
列，「新しい／日米／の／協力」という単語ｂ１から単
語ｂ４までからなる単語列，「新しい／日米／の／協力
／ガイドライン」という単語ｂ１から単語ｂ５までから
なる単語列，および「新しい／日米／の／協力／ガイド
ライン／に関して」という単語ｂ１から単語ｂ６までか
らなる単語列との距離が求められる。これらの場合にお
ける距離の値が単語列間距離記録テーブルＤＬに記憶さ
れたときの様子を図１９に示す。By repeating such arithmetic processing,
A word string consisting of the word "new" a1, a word string consisting of the word "new" b1, a word string consisting of the words b1 and b2 "new / Japan / US", and a word "new / Japan / US / no" A word string consisting of words b1 to b3, a word string consisting of words b1 to b4 of "new / Japan / US / cooperation", a word string of words b1 to b5 of "new / Japan / US / cooperation / guidelines" And the distance between the word string consisting of the word b1 and the word string consisting of the words b6 to "new / Japan / US / cooperation / guidelines /". FIG. 19 shows how the distance values in these cases are stored in the inter-word string distance recording table DL.

【０１０１】こうして、「新」という単語ａ１からなる
単語列と対比文字列Ｂにおける各単語列との距離が記憶
された後は、ステップＳ５７０，Ｓ５８０，Ｓ５１０の
処理により「新／日米」という単語ａ１および単語ａ２
からなる単語列と対比文字列Ｂにおける各単語列との距
離が演算される。まず、序数ｉの値が２、序数ｊの値が
１とされ、「新／日米」という単語ａ１および単語ａ２
からなる単語列と「新しい」という単語ｂ１からなる単
語列との距離が演算される。After the distance between the word string composed of the word “new” a1 and each word string in the comparison character string B is stored, the processing in steps S570, S580, and S510 is performed to call “new / Japan-US”. Word a1 and word a2
Then, the distance between the word string composed of and the respective word strings in the comparison character string B is calculated. First, the value of the ordinal i is set to 2 and the value of the ordinal j is set to 1.
The distance between the word string consisting of the word string and the word string consisting of the word “new” b1 is calculated.

【０１０２】この場合には、Ｘ２値である値１を、即
ち、「新」という単語ａ１と「新しい」という単語ｂ１
とを相互に類似する単語とし、「日米」という単語ｂ１
は、単語ｂ１からなる単語列から脱落した単語である、
とみなした場合の距離の値を、双方の単語列間の距離ｄ
（ａ２，ｂ１）としている。In this case, the value 1 that is the X2 value, that is, the word “new” and the word “new” b1
Are similar words to each other, and the word b1
Is a word dropped from the word sequence consisting of the word b1.
Is the distance between both word strings d
(A2, b1).

【０１０３】続いて、ステップＳ５５０の処理により、
序数ｊの値が２とされた場合には、「新／日米」という
単語ａ１および単語ａ２からなる単語列と「新しい／日
米」という単語ｂ１および単語ｂ２からなる単語列との
距離が演算される。この演算処理の経過および結果を記
憶した演算バッファＥＴの様子を図２０に、この演算結
果に応じて採り得る距離グラフＹＴ上の経路を、図２１
にそれぞれ示す。Subsequently, by the processing in step S550,
If the value of the ordinal j is 2, the distance between the word string consisting of the words “new / US” and the word a2 and the word string consisting of the words “new / Japan and the United States” b1 and b2 is It is calculated. FIG. 20 shows the progress of this arithmetic processing and the state of the arithmetic buffer ET storing the result, and FIG.
Are shown below.

【０１０４】Ｘ１値を採用した場合の距離グラフ上の経
路は、図１５，図１６において求めた距離ｄ（ａ１，ｂ
１）の値０（ゼロ）を採用しつつ、「新／日米」からな
る単語列のうちの「日米」という単語ａ２と「新しい／
日米」からなる単語列のうちの「日米」という単語ｂ２
とを類似する関係とみなした場合の経路である。一方、
Ｘ２値を採用した場合の距離グラフ上の経路は、図１
７，図１８において求めた距離ｄ（ａ１，ｂ２）の値１
を採用しつつ、「新／日米」からなる単語列のうちの
「日米」という単語ａ２を「新しい／日米」という単語
列から脱落しているとみなした場合の経路であり、Ｘ３
値を採用した場合の距離グラフ上の経路は、以前に求め
た距離ｄ（ａ２，ｂ１）の値１を採用しつつ、「新しい
／日米」からなる単語列のうちの「日米」という単語ｂ
２を「新／日米」という単語列から脱落しているとみな
した場合の経路である。The route on the distance graph when the X1 value is adopted is the distance d (a1, b) obtained in FIGS.
While adopting the value 0 (zero) of 1), the word a2 of “Japan / US” in the word string of “new / Japan / US” and “new / US /
The word b2 "Japan-US" in the word string consisting of "Japan-US"
Is a path when it is considered that the relation is similar. on the other hand,
The route on the distance graph when the X2 value is adopted is shown in FIG.
7, the value 1 of the distance d (a1, b2) obtained in FIG.
X3 is a path when it is considered that the word a2 of “Japan / US” in the word string of “New / Japan / US” is omitted from the word string of “New / Japan / US”.
The route on the distance graph when the value is adopted is referred to as “Japan / US” in the word string “new / Japan / US” while adopting the value 1 of the previously obtained distance d (a2, b1). Word b
2 is a path in the case where it is considered that the number 2 is dropped from the word string of “new / US / US”.

【０１０５】図２１に斜線付きの矢印で示すように、座
標（ａ２，ｂ２）へ到達した場合にかかる距離は、Ｘ１
値を採用した場合に最も短くなり、このＸ１値の値０
（ゼロ）が、単語ａ１および単語ａ２からなる単語列と
単語ｂ１および単語ｂ２からなる単語列との距離ｄ（ａ
２，ｂ２）の値となる。As shown by the hatched arrow in FIG. 21, the distance required to reach the coordinates (a2, b2) is X1
The value becomes the shortest when the value is adopted.
(Zero) is a distance d (a) between the word string composed of the words a1 and a2 and the word string composed of the words b1 and b2.
2, b2).

【０１０６】以上の処理を繰り返すことにより、最終的
に、入力文字列Ａおよび対比文字列Ｂに関する全ての単
語列同士の距離が求められる。図２２は、求められた全
ての距離ｄ（ａｉ，ｂｊ）の値が単語列間距離記録テー
ブルＤＬに記憶されたときの様子を示している。なお、
距離の値に続けて記載された括弧内は、この値を算出し
た演算式を示し、「※」印は、２以上の演算式で同じ値
が算出されてともに最小値となった場合を示す。図２２
の単語列間距離記録テーブルＤＬ上のｄ（ａ９，ｂ６）
の値である数値５．８が、入力文字列Ａ中の全ての単語
ａ１〜ａ９を含む単語列と対比文字列Ｂ中の全ての単語
ｂ１〜ａ６を含む単語列との距離の値、即ち、入力文字
列Ａと対比文字列Ｂとの文字列間の距離の値となる。By repeating the above processing, finally, the distances between all the word strings relating to the input character string A and the comparison character string B are obtained. FIG. 22 shows a state where all the obtained values of the distance d (ai, bj) are stored in the inter-word string distance recording table DL. In addition,
The expression in parentheses following the distance value indicates the arithmetic expression that calculated this value, and the “*” mark indicates that the same value was calculated by two or more arithmetic expressions and both values were the minimum value. . FIG.
D (a9, b6) on the inter-word string distance recording table DL
Is a value of the distance between the word string including all the words a1 to a9 in the input character string A and the word string including all the words b1 to a6 in the comparison character string B, that is, , The distance between the character strings of the input character string A and the comparison character string B.

【０１０７】ｄ（ａ９，ｂ６）の値である数値５．８
が、入力文字列Ａと対比文字列Ｂとの文字列間の距離と
なるイメージを、図２３および図２４の距離グラフＹＧ
を用いてより具体的に説明する。図２３に示すように、
距離グラフＹＧ上の各座標には、図２２の単語列間距離
記録テーブルＤＬに記録された全ての単語列同士の距離
の値が、対比された単語列の組み合わせに対応して記さ
れている。Numerical value 5.8 which is the value of d (a9, b6)
Is the distance between the character strings of the input character string A and the comparison character string B, and the distance graph YG in FIG. 23 and FIG.
This will be described more specifically with reference to FIG. As shown in FIG.
At each coordinate on the distance graph YG, the value of the distance between all the word strings recorded in the inter-word string distance recording table DL of FIG. 22 is described corresponding to the combination of the compared word strings. .

【０１０８】例えば、座標（ａ２，ｂ１）に記された
「１」という値は、「新／日米」という単語列と「新し
い」という単語列との距離ｄ（ａ２，ｂ１）の値であ
る。また、入力文字列Ａの最後尾の単語「は」が位置す
るａ９と対比文字列Ｂの最後尾の単語「に関して」が位
置するｂ６との交点の座標（ａ９，ｂ６）（以下、終点
座標という）には、ｄ（ａ９，ｂ６）の値である数値
５．８が記されている。For example, the value “1” described at the coordinates (a2, b1) is the value of the distance d (a2, b1) between the word string “new / US” and the word string “new”. is there. Also, the coordinates (a9, b6) of the intersection of a9 where the last word “ha” of the input character string A is located and b6 where the last word “about” of the comparison character string B is located (hereinafter referred to as the end point coordinates) ) Describes a numerical value 5.8, which is the value of d (a9, b6).

【０１０９】「入力文字列Ａと対比文字列Ｂとの文字列
間の距離が値５．８である」ということは、「始点であ
る原点Ｏから、いずれかの経路を辿って終点座標（ａ
９，ｂ６）に向かうためには、数値５．８に相当する距
離を移動しなければならない」ということを意味する。
この経路のうちの１つを図２４に矢印のパスを用いて示
す。なお、始点から終点への到達までに移動することが
必要な距離を「全体移動距離」といい、１の座標から次
の座標への到達までに移動することが必要な距離を「区
間移動距離」という。"The distance between the character strings of the input character string A and the comparison character string B is a value of 5.8" means that "from the origin O, which is the starting point, follows any one of the paths and the end point coordinates ( a
In order to head to 9, b6), one has to travel a distance corresponding to the numerical value 5.8. "
One of these paths is shown in FIG. 24 using the path of the arrow. The distance required to move from the start point to the end point is referred to as “overall travel distance”, and the distance required to travel from one coordinate to the next coordinate is referred to as “section travel distance”. "

【０１１０】図２４において、横向きの矢印は、矢印の
終端の座標に位置する単語ａｉが脱落したものとみなさ
れて、Ｘ２値が単語列間の距離の値とされた場合を、縦
向きの矢印は、矢印の終端の座標に位置する単語ｂｊが
脱落したものとみなされて、Ｘ３値が単語列間の距離の
値とされた場合を、斜め向きの矢印は、矢印の終端の座
標に位置する単語ａｉと単語ｂｊとが相互に類似するも
のと評価され、Ｘ１値が単語列間の距離の値とされた場
合を、それぞれ意味する。例えば、斜め向きの矢印の終
端の座標の１つである（ａ８，ｂ６）に記された距離の
値４．８は、Ｘ１値、即ち、語間類似度を要素とした演
算式に基づく値である（図２２を参照）。In FIG. 24, a horizontal arrow indicates a case where the word ai located at the coordinates at the end of the arrow is considered to have been dropped and the X2 value is the value of the distance between word strings. The arrow indicates that the word bj located at the coordinates of the end of the arrow is considered to be dropped, and the X3 value is the value of the distance between word strings. The word ai and the word bj are evaluated as being similar to each other, and the X1 value is the value of the distance between the word strings, respectively. For example, the distance value 4.8 described at (a8, b6), which is one of the coordinates of the end of the oblique arrow, is a value based on an X1 value, that is, an arithmetic expression using the word similarity as an element. (See FIG. 22).

【０１１１】即ち、図２４に示した始点から終点座標ま
での経路は、「新」という単語ａ１と「新しい」とい
う単語ｂ１、「日米」という単語ａ２と「日米」という
単語ｂ２、「のための」という単語ａ６と「の」という
単語ｂ３、「指針」という単語ａ７と「ガイドライン」
という単語ｂ５、および「について」という単語ａ８と
「に関して」という単語ｂ６とを、相互に類似する単語
と評価し、「防衛」という単語ａ３，「協定」という
単語ａ４，「締結」という単語ａ５，および「は」とい
う単語ａ９と類似する単語は、対比文字列Ｂに存在しな
いと評価し、「協力」という単語ｂ４は、入力文字列
Ａに存在しないと評価した場合の経路を表わしている。
このような経路を示すパスが意味する単語間の関係を図
２５に示す。That is, the path from the start point to the end point coordinates shown in FIG. 24 includes the word “new” a1 and the word “new” b1, the word “Japan and the United States” a2 and the word “Japan and the United States” b2, “ Word a6 for "for" and word b3 for "no", word a7 for "guideline" and "guideline"
The word b5 and the word a8 of "about" and the word b6 of "about" are evaluated as mutually similar words, and the word a3 of "defense" a3, the word a5 of "agreement", the word a5 of "conclusion" , And the word a9 similar to the word a9 are evaluated as not existing in the comparison character string B, and the word b4 of "cooperation" is evaluated as not existing in the input character string A. .
FIG. 25 shows the relationship between words represented by paths indicating such paths.

【０１１２】図２４および図２５に示すように、この経
路では、始点から終点に至るまでの各座標間において、
区間移動距離として「０（ゼロ）」，「０（ゼロ）」，
「１」，「１」，「１」，「０．６」，「１」，
「０」，「０」，「１」という値を消費しており、この
消費された区間移動距離の値の合計値が全体移動距離の
値である数値５．８となる。As shown in FIGS. 24 and 25, in this route, between each coordinate from the start point to the end point,
"0 (zero)", "0 (zero)",
"1", "1", "1", "0.6", "1",
The values “0”, “0”, and “1” are consumed, and the total value of the consumed section movement distances is a numerical value 5.8, which is the value of the entire movement distance.

【０１１３】なお、図２４に示した経路以外にも、終点
座標に到達するための経路があるが、語間類似度の値の
幅や脱落コストの値を変更することにより、１の経路に
絞り込むことも可能である。Note that there is a route other than the route shown in FIG. 24 to reach the end point coordinates. It is also possible to narrow down.

【０１１４】以上、単語列間距離演算処理Ａの内容につ
いて説明した。次に、この処理の終了とともに起動する
文字列間類似度の演算処理Ａ（図１０のステップＳ３４
０）の内容につき、図２６の文字列間類似度演算ルーチ
ンＡを参照しつつ説明する。まず、入力文字列Ａ中の単
語の総数ｍ個分の脱落コストｒに対比文字列Ｂ中の単語
ｂの総数ｎ個分の脱落コストｑを加えた値、即ち、入力
文字列Ａと対比文字列Ｂとの距離の最大値を求め、この
値をＵとする。次に、単語列間距離演算処理で求めた、
全ての単語を含んだ単語列同士の距離ｄ（ａｍ，ｂｎ）
の値をＵの値で除算し、この値をＶとする。次に、Ｖの
値の補数を求め、この値を入力文字列Ａと対比文字列Ｂ
との文字列間の類似度ｓ（ａｍ，ｂｎ）として、本ルー
チンを終了し、次の処理へ移る。The details of the word string distance calculation processing A have been described above. Next, the character string similarity calculation processing A (step S34 in FIG. 10), which is started upon completion of this processing.
The content of 0) will be described with reference to a character string similarity calculation routine A in FIG. First, a value obtained by adding the dropping cost r for the total number m of words in the input character string A to the dropping cost q for the total number n of words b in the comparison character string B, that is, the input character string A and the comparison character The maximum value of the distance from column B is determined, and this value is set to U. Next, the distance calculated between word strings was calculated.
Distance d (am, bn) between word strings including all words
Is divided by the value of U, and this value is defined as V. Next, the complement of the value of V is obtained, and this value is compared with the input character string A and the contrast character string B.
This routine ends as the degree of similarity s (am, bn) between the character strings, and proceeds to the next processing.

【０１１５】前述の例では、１５個の単語についての脱
落コストｒ，ｑの総計値は１５であり、全ての単語を含
んだ単語列同士の距離ｄ（ａ９，ｂ６）の値は５．８で
あるため、文字列間の類似度は０．６１という値とな
る。この値が１に近づくほど文字列間の意味上の類似度
が高いものと判定される。こうして数値を用いて判定さ
れた文字列間の類似度を、判定結果として出力し（図３
のステップＳ１６０）、文字列間の類否判定処理を終了
する。In the above example, the total value of the drop costs r and q for 15 words is 15, and the value of the distance d (a9, b6) between word strings including all words is 5.8. Therefore, the similarity between the character strings has a value of 0.61. It is determined that the closer the value is to 1, the higher the semantic similarity between the character strings. The similarity between the character strings determined using the numerical values is output as the determination result (FIG. 3).
Step S160), the similarity determination process between character strings is terminated.

【０１１６】以上説明した第１実施例の文間類似度判定
装置１Ａは、入力文字列および対比文字列から抽出され
た単語につき、類義語辞書３６を参照して概念語同士の
類似度および機能的表現同士の類似度を判定し、この類
似度の値を用いて、語順の対応関係を考慮しつつ文字列
間の類似度を判定する。従って、語順の異なる２つの文
の間において、意味の類似する度合いを正確に判定する
ことができる。The inter-sentence similarity determining apparatus 1A of the first embodiment described above refers to the synonym dictionary 36 for the similarity between the concept words and the functional The similarity between expressions is determined, and using the value of the similarity, the similarity between character strings is determined in consideration of the correspondence in word order. Accordingly, it is possible to accurately determine the degree of similarity between two sentences having different word orders.

【０１１７】また、関係表現や助述表現同士の類似度の
値を用いて文字列間の類似度を判定するので、互いに同
じ意味概念を示す概念語と互いに異なる枠組みを表現す
る関係表現や助述表現から構成される文同士であって
も、類否判定を正確に行なうことができる。Further, since the similarity between character strings is determined using the value of the similarity between the relational expressions and the adjunct expressions, the relational expressions and the auxiliary expressions expressing concept words having the same semantic concept and mutually different frameworks are obtained. The similarity determination can be accurately performed even between sentences composed of predicate expressions.

【０１１８】さらに、本実施例では、文間における語順
の対応関係を考慮しつつも、単語間の類否関係が、文字
列間で２組以上交差することを禁止する。従って、文字
列間の類似度の判定に伴う処理を簡素化することができ
る。Further, in this embodiment, it is forbidden that two or more sets of similarity between words intersect between character strings while considering the correspondence of word order between sentences. Therefore, the processing involved in determining the similarity between character strings can be simplified.

【０１１９】また、単語列同士の類似度を判定する際に
脱落コストｒ，ｑを用いることにより、１の単語と類似
する単語が他の単語列になかった場合と存在する場合と
の両方を想定し、このうち類似度が高くなる場合の値を
単語列同士の類似度として決定する。従って、２つの文
の間の類似度を、文全体として正しく判定することがで
きる。Further, when the similarity between word strings is determined, the drop costs r and q are used to determine whether a word similar to one word exists in another word string or not. Assuming that the value in the case where the similarity is high is determined as the similarity between the word strings. Therefore, the similarity between two sentences can be correctly determined as a whole sentence.

【０１２０】以上説明した文字列間の類否判定処理で
は、一方の文字列中の単語に類似する単語が他方の文字
列に存在しない場合に、当該単語の種類を問わず、全て
の単語に同じ脱落コストｒ，ｑを用いて、単語列間の距
離を演算した。これに対して、脱落しているとみなされ
る単語の種類や重要度に応じて、異なる値の脱落コスト
を設定する構成とすることも可能である。以下、このよ
うな構成につき、図２７から図３５を用いて説明する。In the similarity judgment processing between character strings described above, when a word similar to a word in one character string does not exist in the other character string, all words are recognized regardless of the type of the word. The distance between word strings was calculated using the same dropout costs r and q. On the other hand, it is also possible to adopt a configuration in which a different value of the cost of dropping is set in accordance with the type and importance of the word considered to be dropped. Hereinafter, such a configuration will be described with reference to FIGS.

【０１２１】図２７は、単語列間類似度判定ルーチンＢ
を示すフローチャートである。本ルーチンでは、図１０
の単語列間類似度ルーチンＡとほぼ同様の処理を行なう
が、本ルーチンが、単語重要度設定処理（ステップＳ６
００）を行なう点で、これを行なわない単語列間類似度
ルーチンＡと異なる。この相違に対応して、次の処理で
ある単語列間距離演算処理Ｂ（ステップＳ６２０）およ
び文字列間類似度の演算処理Ｂも、演算に利用される脱
落コストｒ，ｑの値に関連する部分において、図１４の
単語列間距離演算処理Ａおよび図２６の文字列間類似度
の演算処理Ａと異なっている。FIG. 27 shows an inter-word string similarity determination routine B.
It is a flowchart which shows. In this routine, FIG.
Of the word string similarity routine A, the routine performs a word importance setting process (step S6).
00) is different from the inter-word string similarity routine A in which this is not performed. Corresponding to this difference, the next processing, word string distance calculation processing B (step S620) and character string similarity calculation processing B are also related to the values of the drop-out costs r and q used in the calculation. The processing differs from the processing A for calculating the distance between word strings in FIG. 14 and the processing A for calculating the similarity between character strings in FIG.

【０１２２】単語重要度設定処理の手順および内容を図
２８の単語重要度設定ルーチンに示す。本ルーチンで
は、図１１の脱落コスト設定ルーチンとほぼ同様の処理
を行なうため、対応するステップ番号の下二桁を図１１
と同じ番号とした。The procedure and contents of the word importance setting process are shown in a word importance setting routine of FIG. In this routine, almost the same process as in the dropout cost setting routine of FIG. 11 is performed.
And the same number.

【０１２３】本ルーチンでは、入力文字列Ａに関し、単
語ａｉの序数ｉが１から総数ｍまでのｄ（ａｉ，０）の
値を、ｄ｛ａ（ｉ−１），０｝の値に単語ａｉの重要度
の値ｗ（ａｉ）を加えたものに設定する処理を行なう
（ステップＳ７３０）。この処理は、「単語ｂ１から単
語ｂｊまでの単語列の中に、単語ａｉと意味の類似する
単語が存在しなかった場合には、単語ａｉの直前の単語
までの単語列と単語ｂ１から単語ｂｊまでの単語列との
距離に、単語ａｉの重要度の値ｗ（ａｉ）を付加する」
ということを意味する。In this routine, for the input character string A, the value of d (ai, 0) when the ordinal number i of the word ai is 1 to the total number m is changed to the value of d {a (i-1), 0}. A process is performed to set the value of ai to the value obtained by adding the value w (ai) (step S730). This processing is performed as follows: “If a word having a similar meaning to the word ai does not exist in the word string from the word b1 to the word bj, the word string up to the word immediately before the word ai and the word b1 to the word Add importance value w (ai) of word ai to the distance from word string to bj "
Means that

【０１２４】例えば、単語ａの序数ｉの値が１の場合に
は、ｄ（ａ１，ｂ０）の値として、ｄ（ａ０，ｂ０）の
値である０（ゼロ）に単語ａｉの重要度の値ｗ（ａｉ）
を加えた「ｗ（ａｉ）」という値が設定される。この処
理により、図２９に示すように、距離グラフＹＧ上にお
いて、原点Ｏ（オー）から横軸上の単語ａ１までの距離
が「ｗ（ａｉ）」として設定される。For example, when the value of the ordinal i of the word a is 1, the value of d (a0, b0) is set to 0 (zero) as the value of d (a1, b0), and the importance of the word ai is set to 0 (zero). Value w (ai)
Is added to the value “w (ai)”. By this process, as shown in FIG. 29, on the distance graph YG, the distance from the origin O (O) to the word a1 on the horizontal axis is set as "w (ai)".

【０１２５】また、対比文字列Ｂに関しても、単語ｂｊ
の序数ｊが１から総数ｎまでのｄ（０，ｂｊ）の値を、
ｄ｛０，ｂ（ｊ−１）｝の値に単語ｂｊの重要度の値ｗ
（ｂｊ）を加えたものに設定する処理を行なう（ステッ
プＳ７６５）。よって、この処理により、距離グラフＹ
Ｇ上において、原点Ｏ（オー）から縦軸上の単語ｂ１ま
での距離が「ｗ（ｂｊ）」として設定される（図２９を
参照）。Further, regarding the comparison character string B, the word bj
The value of d (0, bj) from ordinal j of 1 to the total number n is
The value of the importance level w of the word bj is added to the value of d {0, b (j-1)}.
A process is performed to set the value to which (bj) has been added (step S765). Therefore, the distance graph Y
On G, the distance from the origin O (O) to the word b1 on the vertical axis is set as “w (bj)” (see FIG. 29).

【０１２６】ステップＳ７３０およびステップＳ７６５
の処理を繰り返すことにより（ステップＳ７４０，Ｓ７
７０）、ｄ（ａ１，ｂ０）からｄ（ａｍ，ｂ０）までの
値およびｄ（ａ０，ｂ０）からｄ（ａ０，ｂｎ）までの
値が設定される。この結果、距離グラフＹＧの横軸およ
び縦軸の各単語ａ１〜ａｍ，ｂ１〜ｂｎは、図２９に示
すように、各単語の重要度の値ｗ（ａｉ），ｗ（ｂｊ）
に応じた間隔を置いて割り付けられる。Steps S730 and S765
(Steps S740 and S7)
70), values from d (a1, b0) to d (am, b0) and values from d (a0, b0) to d (a0, bn) are set. As a result, the words a1 to am and b1 to bn on the horizontal axis and the vertical axis of the distance graph YG are, as shown in FIG. 29, the importance values w (ai) and w (bj) of each word.
Are allocated at intervals according to

【０１２７】各単語の重要度の値ｗ（ａｉ），ｗ（ｂ
ｊ）は、前述した国語辞書内に格納されており、図３の
ステップＳ１３０における各文字列からの単語の抽出処
理の際に、その語の文字情報や文法情報とともに抽出さ
れる。各単語の重要度の値ｗ（ａｉ），ｗ（ｂｊ）が国
語辞書に格納されている様子を図３０に示す。The importance value w (ai), w (b) of each word
j) is stored in the above-mentioned national language dictionary, and is extracted together with the character information and grammatical information of the word when the word is extracted from each character string in step S130 of FIG. FIG. 30 shows how the importance values w (ai) and w (bj) of each word are stored in the Japanese language dictionary.

【０１２８】本実施例では、単語の重要度を「０≦ｗ
（ａｉ），ｗ（ｂｊ）≦１」の範囲の数値を用いて表わ
し、数値が１に近づくほど文意を左右する度合いが高い
ものと定義している。また、名詞のような概念語の重要
度を、関係表現等の機能的表現の重要度よりも大きな値
に設定する一方、機能的表現の種類に応じて重要度に差
を設けている。例えば、主として助詞の後に接続される
係助詞の「は」や名詞と名詞の間に位置して同格を表わ
す格助詞の「の」等は、省略しても文の意味が大きく変
化しないので、重要度の値は、他の機能的表現よりも低
いものとなっている。In this embodiment, the importance of a word is set to “0 ≦ w
(Ai), w (bj) ≦ 1 ”, and is defined as the degree to which the sentiment is influenced more as the numerical value approaches 1. Further, the importance of a concept word such as a noun is set to a value larger than the importance of a functional expression such as a relational expression, and the importance is different depending on the type of the functional expression. For example, the main particle "ha", which is connected after the particle, and the case particle "no", which is located between noun and noun and indicates the same status, do not change the meaning of the sentence even if omitted, The importance value is lower than other functional expressions.

【０１２９】こうして、単語が欠落した場合に付加され
る重要度の値ｗ（ａｉ），ｗ（ｂｊ）を設定して単語重
要度設定ルーチンを終了すると、続いて、図３１に示す
単語列間距離演算ルーチンＢを起動する。本ルーチンで
は、図１４の単語列間距離演算ルーチンＡとほぼ同様の
処理を行なうため、対応するステップ番号の下二桁を図
１４と同じ番号とした。When the word importance setting values w (ai) and w (bj) to be added when a word is lost are set and the word importance setting routine is completed, the inter-word sequence shown in FIG. The distance calculation routine B is started. In this routine, almost the same processes as those in the inter-word string distance calculation routine A of FIG. 14 are performed, and therefore, the last two digits of the corresponding step numbers are set to the same numbers as in FIG.

【０１３０】本ルーチンでは、単語列間距離演算ルーチ
ンＡとほぼ同様に、Ｘ１値，Ｘ２値，Ｘ３値のうちの最
も小さい値を単語列間の距離として決定するが、Ｘ２
値，Ｘ３値を演算する過程において、単語列間距離演算
ルーチンＡで用いていた脱落コストｒに替えて、単語の
重要度の値ｗ（ａｉ），ｗ（ｂｊ）を用いる（ステップ
Ｓ８２０，Ｓ８２５）。よって、単語列間の距離ｄ（ａ
ｉ，ｂｊ）として決定される最小の値が３つの値のうち
のいずれの値となるかが、単語列間距離演算ルーチンＡ
における結果とは異なってくる。In this routine, the smallest value among the X1, X2, and X3 values is determined as the distance between word strings, almost in the same manner as the distance calculation routine between word strings A.
In the process of calculating the value and the X3 value, the word importance values w (ai) and w (bj) are used instead of the dropout cost r used in the word string distance calculation routine A (steps S820 and S825). ). Therefore, the distance d (a
Which of the three values is the minimum value determined as i, bj) is determined by the word string distance calculation routine A.
Will differ from the results in

【０１３１】図３２は、前述した「新日米防衛協定締結
のための指針については」という入力文字列Ａと「新し
い日米の協力ガイドラインに関して」という対比文字列
Ｂについて、単語列間類似度判定処理Ｂが行なわれた後
の単語列間距離記録テーブルＤＬの様子を示す。図２２
に示した単語列間距離演算ルーチンＡによる結果と比較
すると、入力文字列Ａの「のための」という単語ａ６，
「について」という単語ａ８および「は」という単語ａ
９や対比文字列Ｂの「の」という単語ｂ３，「に関し
て」という単語ｂ６に関しては、これらに類似する単語
が他方の文字列に存在しないものとみなされた場合に、
脱落コストｒ，ｑとして設定されていた１よりもよりも
小さい値が、直前の単語までの単語列の距離に対して付
加される。従って、これらの単語が欠落している場合の
演算値であるＸ２値やＸ３値が、３つの値のうちの最小
値となりやすくなり、この結果、図２２のテーブルと比
べて、Ｘ２値やＸ３値が単語列間の距離ｄ（ａｉ，ｂ
ｊ）として決定される頻度が多くなり、入力文字列Ａ中
の全ての単語ａ１〜ａ９を含む単語列と対比文字列Ｂ中
の全ての単語ｂ１〜ａ６を含む単語列との距離の値、即
ち、入力文字列Ａと対比文字列Ｂとの文字列間の距離ｄ
（ａ９，ｂ６）の値も、値３．６というより小さい値と
なる。FIG. 32 shows the similarity between the word strings of the input character string A described above “On the guideline for concluding a new Japan-US defense agreement” and the comparison character string B described in “Regarding the new Japan-US cooperation guidelines”. 7 shows a state of the inter-word string distance recording table DL after the determination processing B is performed. FIG.
Is compared with the result of the word string distance calculation routine A shown in FIG.
The word a8 of "about" and the word a of "ha"
9 and the word b3 of “no” in the comparison character string B, and the word b6 of “about”, when words similar to these are considered to be absent in the other character string,
A value smaller than 1 set as the dropout cost r, q is added to the distance of the word string to the immediately preceding word. Therefore, the calculated values X2 and X3 when these words are missing are likely to be the minimum of the three values. As a result, the X2 and X3 values are smaller than those in the table of FIG. The value is the distance d (ai, b) between word strings.
j) increases, the value of the distance between the word string including all the words a1 to a9 in the input character string A and the word string including all the words b1 to a6 in the comparison character string B, That is, the distance d between the character strings of the input character string A and the comparison character string B
The value of (a9, b6) is also a smaller value of 3.6.

【０１３２】この単語列間距離記録テーブルＤＬに記録
された全ての単語列同士の距離の値を記した距離グラフ
ＹＷの様子を図３３に示す。座標（ａ３，ｂ２）から座
標（ａ３，ｂ３）への移動を意味する「の」という単語
ｂ３の欠落や、座標（ａ８，ｂ６）から座標（ａ９，ｂ
６）への移動を意味する「は」という単語ａ９の欠落に
よる「区間移動距離」がより小さくなっており、このこ
とが全体移動距離の減少に寄与していることがわかる。FIG. 33 shows a state of a distance graph YW in which the values of the distances between all the word strings recorded in the inter-word string distance recording table DL are described. The absence of the word b3 “no” meaning the movement from the coordinates (a3, b2) to the coordinates (a3, b3), or the coordinates (a9, b) from the coordinates (a8, b6)
The “section moving distance” due to the lack of the word “a” indicating “moving” to 6) is smaller, and it can be seen that this contributes to the reduction in the overall moving distance.

【０１３３】距離グラフＹＷ上に示した矢印のパスが意
味する単語間の関係を図３４に示す。既述した脱落コス
トｒ，ｑが一律の場合を示す図２５と比較すると、図３
４の場合には、図２５で脱落とみなされていた「協定」
という単語ａ４と「協力」という単語ｂ４とが相互に類
似関係があるものとみなされる一方、相互に類似関係が
あるものとみなされていた「のための」という単語ａ６
と「の」という単語ｂ３とが、それぞれ他方の文字列か
ら脱落しているとみなされている。前者は、「の」とい
う単語ｂ３が脱落した場合の加算値が値１から値０．２
に減ったことにより、ｄ（ａ４，ｂ４）の値においてＸ
１値が最も小さい値となったことに起因するものであ
る。後者は、「のための」という単語ａ６および「の」
という単語ｂ３が脱落した場合の加算値が、それぞれ値
１から値０．４、値１から値０．２に減ったことによ
り、ｄ（ａ６，ｂ３）の値においてＸ３値が最も小さい
値となったことに起因するものである（図３２を参
照）。FIG. 34 shows the relationship between the words indicated by the paths indicated by the arrows on the distance graph YW. As compared with FIG. 25 showing the case where the dropout costs r and q are uniform as described above, FIG.
In the case of No. 4, "Agreement" which was regarded as a dropout in FIG.
The word a4 and the word b6 of “for” which are considered to have a similarity to each other while the word a4 and the word b4 to “cooperation” are considered to have a similarity to each other.
And the word b3 of “no” are considered to be missing from the other character string, respectively. In the former, the added value when the word b3 of “no” is dropped is from value 1 to value 0.2.
In the value of d (a4, b4)
This is due to the fact that one value has become the smallest value. The latter consists of the words a6 for "for" and "no"
When the word b3 is dropped, the added value is reduced from the value 1 to the value 0.4 and the value 1 from the value 0.2, so that the value of d (a6, b3) is the smallest value of the X3 value. (See FIG. 32).

【０１３４】以上の単語列間距離演算処理Ｂの終了後に
続けて行なわれる、文字列間類似度の演算処理Ｂの内容
を、図３５の文字列間類似度演算ルーチンＢに示す。図
２６の文字列間類似度演算ルーチンＡでは、入力文字列
Ａと対比文字列Ｂとの距離の最大値を、入力文字列Ａ中
の単語の総数ｍ個分の脱落コストｒ，ｑに対比文字列Ｂ
中の単語ｂの総数ｎ個分の脱落コストｑを加えた値と
し、この値で全ての単語を含んだ単語列同士の距離ｄ
（ａｍ，ｂｎ）の値を除算して、当該文字列間で想定さ
れる最長距離に対して実際に求められた距離ｄ（ａｍ，
ｂｎ）が占める割合を算出していた。この点、単語脱落
の場合のコストを単語の重要度に応じて設定する本構成
の場合には、入力文字列Ａと対比文字列Ｂとの距離の最
大値は、入力文字列と対比文字列とを構成する個々の単
語についての重要度の値ｗ（ａｉ），ｗ（ｂｊ）の合計
値となる。そこで、この値をＵとし（ステップＳ１２１
０）、この値でｄ（ａｍ，ｂｎ）の値を除算した値につ
いての１の補数を求めることにより（ステップＳ１２３
０，Ｓ１２５０）、入力文字列Ａと対比文字列Ｂとの文
字列間の類似度ｓ（ａｍ，ｂｎ）を求めている。The contents of the character string similarity calculation processing B, which is performed after the end of the above-described word string distance calculation processing B, are shown in a character string similarity calculation routine B of FIG. In the inter-character string similarity calculation routine A shown in FIG. 26, the maximum value of the distance between the input character string A and the comparison character string B is compared with the drop-out costs r and q for m words in the input character string A. String B
A value obtained by adding a drop-out cost q for the total number n of words b in the list, and the distance d between word strings including all words
By dividing the value of (am, bn), the distance d (am, am) actually obtained with respect to the longest distance assumed between the character strings
bn) was calculated. In this regard, in the case of this configuration in which the cost in the case of a word drop is set according to the importance of the word, the maximum value of the distance between the input character string A and the contrast character string B is determined by the input character string and the contrast character string. Is the sum of the importance values w (ai) and w (bj) for the individual words constituting Therefore, this value is set to U (step S121).
0), and calculating the one's complement of the value obtained by dividing the value of d (am, bn) by this value (step S123)
0, S1250), the similarity s (am, bn) between the character strings of the input character string A and the comparison character string B is obtained.

【０１３５】前述の例では、１５個の各単語についての
重要度の値ｗ（ａｉ），ｗ（ｂｊ）の総計値は１５であ
り、全ての単語を含んだ単語列同士の距離ｄ（ａ９，ｂ
６）の値は３．６であるため、文字列間の類似度は値
０．７６となる。従って、脱落コストｒ，ｑを一律に設
定した場合と比べ、入力文字列Ａと対比文字列Ｂとは、
より文字列間の意味上の類似度が高いものと判定されて
いる。In the above-described example, the total value of the importance values w (ai) and w (bj) for each of the 15 words is 15, and the distance d (a9) between the word strings including all the words. , B
Since the value of 6) is 3.6, the degree of similarity between character strings is 0.76. Therefore, the input character string A and the comparison character string B are different from the case where the drop-out costs r and q are set uniformly.
It is determined that the semantic similarity between the character strings is higher.

【０１３６】このように、単語の種類や重要度に応じ
て、脱落コストに異なる値を設定する構成を採ることに
より、類否判断の対象とされる文の性質に応じて、適切
な文間の類似度を判断することが可能となる。例えば、
日本語文字列同士の意味の類否を判断する場合には、概
念語の相違により文の意味が大きく異なるので、概念語
の脱落コストを高くすることが望ましい。一方、外部装
置９０が和英翻訳する翻訳装置の場合には、文の構造が
文の意味を大きく左右するので、機能的表現の脱落コス
トを高くすることで、正確な翻訳を担保することができ
る。As described above, by adopting a configuration in which a different value is set for the dropout cost according to the type and importance of a word, an appropriate inter-sentence can be set in accordance with the nature of the sentence to be compared. Can be determined. For example,
When judging the similarity of the meanings of the Japanese character strings, the meaning of the sentence is greatly different due to the difference of the concept word, so it is desirable to increase the cost of dropping the concept word. On the other hand, when the external device 90 is a translation device that translates between Japanese and English, the structure of the sentence greatly affects the meaning of the sentence, so that accurate translation can be ensured by increasing the cost of dropping the functional expression. .

【０１３７】なお、以上説明した文字列間類否判定処理
では、入力文字列および対比文字列内の機能的表現の有
無や類否に着目して、文字列間の類似度を判定するが、
この処理を、複合表現を含む文字列に関する類否の判定
に応用することも可能である。ここで、複合表現とは、
２以上の単語の結合により１のまとまった意味概念を表
わす表現をいい、例えば、「解析手法」や「新製品」等
の表現が複合表現に該当する。以下、この応用例につい
て説明する。In the above-described character string similarity determination processing, similarity between character strings is determined by focusing on the presence or absence or similarity of a functional expression in an input character string and a comparison character string.
This processing can be applied to the determination of similarity regarding a character string including a compound expression. Here, the compound expression is
It refers to an expression that represents one unified semantic concept by combining two or more words. For example, expressions such as “analysis method” and “new product” correspond to compound expressions. Hereinafter, this application example will be described.

【０１３８】まず、入力文字列として「解析手法」とい
う複合表現からなる文字列が、対比文字列として「分析
の方法」という複合表現を含まない文字列が、それぞれ
類似度検索エンジン１０Ａに入力された場合について説
明する。概念語類義語辞書３６ａには、「解析」という
概念語と「分析」という概念語とが類似する旨の情報
が、語間類似度の値０．７とともに、「手法」という概
念語と「方法」という概念語とが類似する旨の情報が、
語間類似度の値０．７とともに、それぞれ記憶されてい
る。First, a character string composed of a compound expression "analysis method" is input to the similarity search engine 10A as an input character string, and a character string not containing the compound expression of "analysis method" is input as a contrast character string. The following describes the case where The concept word synonym dictionary 36a includes information indicating that the concept word “analysis” and the concept word “analysis” are similar, together with the inter-word similarity value of 0.7, the concept word “method” and the “method”. Is similar to the concept word "
Each is stored together with the value 0.7 of the inter-word similarity.

【０１３９】この語間類似度の値に基づいて単語列間の
類似度が判定（図３のステップＳ１５０）されるが、本
応用例においては、文字列を構成する単語の連続が複合
表現に該当する場合には、単語同士の類似度に、入力文
字列中の複合表現と対比文字列中の複合表現に相当する
表現との間の類似度を加味して、入力文字列と対比文字
列との類似度を判定することとしている。例えば、上例
の場合には、「解析−分析」，「手法−方法」という対
応関係のみならず、「解析手法−分析の方法」という対
応関係について、表現間の類似度を判定する。The similarity between word strings is determined based on the value of the inter-word similarity (step S150 in FIG. 3). In this application example, the continuation of the words constituting the character string is converted into a complex expression. If applicable, the input character string and the contrast character string are taken into account, taking into account the similarity between words and the similarity between the compound expression in the input character string and the expression corresponding to the compound expression in the contrast character string. Is determined. For example, in the case of the above example, the similarity between expressions is determined not only for the correspondence between "analysis-analysis" and "method-method" but also for the correspondence between "analysis method-analysis method".

【０１４０】次に、対比文字列にのみ存在する「の」と
いう単語について、国語辞書に格納された文法情報を参
照する。国語辞書には、「の」という語の種類が「同格
を表わす助詞」である旨および所定の名詞と名詞の間に
用いられた場合には省略可能な旨が、文法情報として格
納されている。これらの情報を得ることにより、「解析
手法−分析の方法」という表現間の類似度は、「解析手
法−分析方法」という表現間の類似度と同じであると判
断する。この場合には、「の」という単語に類似する単
語が入力文字列に存在しないことを理由として、脱落コ
ストｑを付加しない。従って、「解析手法」という単語
列と「分析の方法」という単語列との間の類似度は、
「解析手法」という単語列と「分析方法」という単語列
との間の類似度と同じ値となる。Next, the grammar information stored in the Japanese language dictionary is referred to for the word “no” existing only in the comparison character string. The Japanese language dictionary stores, as grammatical information, the fact that the type of the word "no" is "a particle representing an equivalence" and that it can be omitted when used between predetermined nouns. . By obtaining such information, it is determined that the similarity between the expressions “analysis method-analysis method” is the same as the similarity between the expressions “analysis method-analysis method”. In this case, the omission cost q is not added because a word similar to the word “no” does not exist in the input character string. Therefore, the similarity between the word string “analysis method” and the word string “analysis method”
The value is the same as the similarity between the word string “analysis method” and the word string “analysis method”.

【０１４１】なお、複合表現において省略されている単
語は、「の」や「のための」等のような機能的表現に限
るものではなく、例えば、「解析手法」と「解析する手
法」という場合における「する」という単語のような一
定の名詞に接続されて用いられるサ変動詞の語幹の一部
ないし活用語尾や、「新製品」と「新しい製品」という
場合における「しい」のような形容詞の語幹の一部ない
し活用語尾であってもよい。Note that the words omitted in the compound expression are not limited to functional expressions such as “no” and “for”, but are referred to as “analysis method” and “analysis method”, for example. A part of the stem or inflectional ending of the sa variant used in connection with a certain noun such as the word "to" in the case, or an adjective such as "shi" in the case of "new product" and "new product" May be a part of the stem or the ending of the word.

【０１４２】このような構成を採れば、助詞，用言の語
幹の一部や活用語尾を省略せずに表現した文字列と、こ
れらを省略して同義に用いる複合表現との間の類似度を
正確に判定することができる。With such a configuration, the similarity between a character string expressed without omitting a part of the stem of a particle or a declinable word or an inflected ending and a compound expression omitting these and synonymously. Can be accurately determined.

【０１４３】なお、入力文字列や対比文字列は、「解析
手法」や「分析の方法」以外の他の表現を伴っても差し
支えない。例えば、「素材の解析手法」，「素材の分析
の方法」という文字列でもよい。また、対比文字列が
「解析の手法」である場合のように、入力文字列を構成
する概念語と対比文字列を構成する概念語とが一致する
場合でもよい。It should be noted that the input character string and the comparison character string may have expressions other than “analysis method” and “analysis method”. For example, character strings “material analysis method” and “material analysis method” may be used. Further, as in the case where the comparison character string is the “analysis method”, the case where the concept word forming the input character string matches the concept word forming the comparison character string may be used.

【０１４４】また、入力文字列の複合表現に相当する対
比文字列中の表現が、他の複合表現である場合にも、上
記構成を適用することが可能である。例えば、入力文字
列として「解析手法」という複合表現が、対比文字列と
して「分析方法」という複合表現が、それぞれ入力され
た場合には、「解析−分析」，「手法−方法」という対
応関係のみならず、「解析手法−分析方法」という表現
間の類似度を判定し、「解析−分析」間の語間類似度の
値０．７，「手法−方法」間の語間類似度の値０．７
に、表現間の類似度の値として所定値を付加し、「単語
同士が類似する程度以上に両文字列が類似する」と判定
することも望ましい。The above configuration can be applied to a case where the expression in the contrast character string corresponding to the compound expression of the input character string is another compound expression. For example, when a composite expression “analysis method” is input as an input character string and a composite expression “analysis method” is input as a contrast character string, the corresponding relationships “analysis-analysis” and “method-method” are input. In addition, the similarity between the expressions “analysis method-analysis method” is determined, and the value of the inter-word similarity 0.7 between “analysis-analysis” and the inter-word similarity between the “method-method” are determined. Value 0.7
It is also desirable to add a predetermined value as a value of the degree of similarity between expressions to determine that "the two character strings are similar to each other more than the degree to which the words are similar".

【０１４５】以上は、文字列として、語句、即ち、言葉
の一区切りを入力した場合を例にとって説明したが、入
力される文字列は、複合表現を含む文字列であればよ
く、主語や述語を備える完結した表現を文字列として入
力した場合にも、上記と同様の効果を得ることができ
る。こうすれば、一方の文に用いられた複合表現と、こ
の複合表現に対応する他方の文中の表現との類似度を、
正確に判定することができるので、文全体としての類似
度もより正確なものとなる。In the above description, the case where a phrase, that is, a segment of a word is input as a character string has been described as an example. However, the input character string may be a character string containing a compound expression. The same effect as described above can be obtained even when a complete expression provided is input as a character string. In this way, the similarity between the compound expression used in one sentence and the expression in the other sentence corresponding to this compound expression is
Since the determination can be made accurately, the similarity of the entire sentence becomes more accurate.

【０１４６】次に、本発明の第２実施例について説明す
る。第２実施例は、データ検索装置１Ｂに関するもので
あり、第１実施例のハードウェア構成と同一の構成によ
り実現される。このデータ検索装置１Ｂの概要を図３６
に示した。このデータ検索装置１Ｂは、文字列の類似度
を判定しながら検索を行なう検索エンジン１０Ｂと文字
列を入力する外部装置９０とから構成されている。検索
エンジン１０Ｂと外部装置９０の内部構成は、第１実施
例と同様である。Next, a second embodiment of the present invention will be described. The second embodiment relates to the data search device 1B, and is realized by the same configuration as the hardware configuration of the first embodiment. FIG. 36 shows an outline of the data search device 1B.
It was shown to. The data search device 1B includes a search engine 10B that performs a search while determining the similarity of a character string, and an external device 90 that inputs a character string. The internal configurations of the search engine 10B and the external device 90 are the same as in the first embodiment.

【０１４７】第１実施例の文間類似度判定装置１Ａと比
べると、第１実施例の類似度判定エンジン１０Ａが、与
えられた２つの文の類似度を判定してこれを出力してい
たのに対して、第２実施例の検索エンジン１０Ｂは、外
部装置９０から与えられる自然言語の文（以下、検索キ
ー文という）を入力し、この文と類似度の高い文を、検
出してこれを出力する機能を有する点で異なっている。
また、検索の対象となる複数の検索対象文（以下、デー
タと呼ぶ）は、外部装置９０のハードディスク９０a内
に保存されているデータである。Compared with the inter-sentence similarity determination apparatus 1A of the first embodiment, the similarity determination engine 10A of the first embodiment determines the similarity between two given sentences and outputs the same. In contrast, the search engine 10B of the second embodiment inputs a sentence of a natural language (hereinafter referred to as a search key sentence) provided from the external device 90, detects a sentence having a high similarity to this sentence, and The difference is that it has a function of outputting this.
A plurality of search target sentences (hereinafter, referred to as data) to be searched are data stored in the hard disk 90a of the external device 90.

【０１４８】第２実施例における処理の概要を図３７に
示す。検索エンジン１０Ｂは、このルーチンが起動され
ると、まず検索しようとしている検索キー文を構成する
文字列を入力する処理を行なう（ステップＳ９００）。
この処理（ステップＳ９００）は、第１実施例における
文字列入力処理（図３、ステップＳ１００）と同一であ
る。こうして検索キー文を入力した後、外部装置９０内
に保存されたデータから検索対象文を一つ取り出す処理
を行なう（ステップＳ９３０）。次に、取り出した一つ
の検索対象文と、検索キー文との類似度を判定し、類似
する文字列を検出する処理を行なう（ステップＳ９４
０）。この処理は、第１実施例の単語間類似度判定処理
（図３、ステップＳ１４０），単語列間類似度判定処理
（図３、ステップＳ１５０）と同様である。即ち、概念
語同士の類似度や機能的表現同士の類似度から、各文字
列を構成する単語列間の距離を求め、全ての単語を含む
単語列同士の距離から文字列間の類似度を演算により算
出し、距離検索キー文と、取り出した一つの検索対象文
との類似度を数値として求めるのである。求めた類似度
の数値データは、ＲＡＭ２６上の所定領域に一時的に記
憶される。FIG. 37 shows an outline of the processing in the second embodiment. When this routine is started, the search engine 10B first performs a process of inputting a character string constituting a search key sentence to be searched (step S900).
This processing (Step S900) is the same as the character string input processing (Step S100 in FIG. 3) in the first embodiment. After inputting the search key sentence in this way, a process of extracting one search target sentence from the data stored in the external device 90 is performed (step S930). Next, the similarity between one extracted retrieval target sentence and the retrieval key sentence is determined, and a process of detecting a similar character string is performed (step S94).
0). This process is the same as the inter-word similarity determination process (FIG. 3, step S140) and the inter-word string similarity determination process (FIG. 3, step S150) of the first embodiment. That is, the distance between word strings constituting each character string is obtained from the similarity between conceptual words and the similarity between functional expressions, and the similarity between character strings is calculated from the distance between word strings including all words. The similarity between the distance search key sentence and one extracted search target sentence is calculated as a numerical value. The calculated similarity numerical data is temporarily stored in a predetermined area on the RAM 26.

【０１４９】以上の処理の後、検索対象文がまだハード
ディスク９０ａに残っているか否かを判断し（ステップ
Ｓ９５０）、残っていれば、上述したステップＳ７３０
に戻って検索対象文を取り出す処理から再度実行する。
もはや類似度を判定する検索対象文が残っていない場合
には、各検索対象文との類似度の数値データが記憶され
たＲＡＭ２６上の所定領域を参照し、最も文間の類似度
が高いと判定された検索対象文を、検索結果として出力
する（ステップＳ９６０）。なお、類似度の判定は、数
値として表わされているので、一定の数値以上の類似度
を示した検索対象文をすべて出力するものとしても良
い。あるいは、類似度の高い方から所定数の検索対象文
を出力するものとしても良い。更に、総ての検索対象文
を、類似度の値の順に並べ替え、類似度の高いものから
順次に出力する構成としても良い。After the above processing, it is determined whether or not the sentence to be searched still remains in the hard disk 90a (step S950).
And the process is executed again from the process of extracting the search target sentence.
When there is no more search target sentence for determining the similarity, reference is made to a predetermined area on the RAM 26 in which numerical data of the similarity with each search target sentence is stored. The determined search target sentence is output as a search result (step S960). Since the similarity determination is expressed as a numerical value, all the search target sentences showing the similarity of a certain numerical value or more may be output. Alternatively, a predetermined number of sentences to be searched may be output in descending order of similarity. Further, all the search target sentences may be rearranged in the order of the similarity value, and the sentence with the highest similarity may be sequentially output.

【０１５０】かかる第２実施例のデータ検索装置１Ｂで
は、複数の検索対象文の一つ一つと、検索キー文との類
似度を判定して、類似度の高い文を検索の結果として出
力することができる。この検索キー文との類似度を判定
する際、概念語同士の類似度に、文の枠組みを与える表
現である機能的表現同士の類似度を加味した上で、文間
の語順の対応関係を考慮しつつ判定するので、自然言語
文の検索を精度良く行なうことができる。The data search apparatus 1B of the second embodiment determines the similarity between each of a plurality of search target sentences and a search key sentence, and outputs a sentence having a high similarity as a search result. be able to. When determining the similarity with the search key sentence, the similarity between the concept words is added to the similarity between the functional expressions, which are expressions that give the framework of the sentence, and the word order correspondence between the sentences is determined. Since the judgment is made in consideration of the above, the search of the natural language sentence can be performed with high accuracy.

【０１５１】なお、外部装置９０は、検索エンジン１０
Ｂによる検索結果を受けて、これを単に表示するものと
しても良いし、この検索結果を使って翻訳などの処理を
行なうものとしても良い。前者の構成では、例えば、多
数の論文の抄録の中から、検索しようとした検索キー文
に類似度の高い論文を表示する構成が考えられる。ある
いは、インターネット上の膨大な数のホームページの概
要を説明した多数の要約文の中から、検索しようとした
検索キー文と類似度の高いホームページを探して、これ
を表示する構成などにも適用することができる。後者、
即ち翻訳の場合は、翻訳しようとする文（検索キー文）
に対して、この文とよく似た文を、予め用意した翻訳文
の中から検索し、得られた翻訳文の中の概念語を置き換
えることにより訳文を得るという手法が知られている。
したがって、検索エンジン１０Ｂにより、予め用意した
訳文の一つを検索し、その後、検索した訳文の概念語
を、翻訳しようとする文の概念語の訳語により置き換え
ることにより、翻訳を行なうものとすれば良い。予め用
意した訳文から一致度の高い訳文を検出する場合には、
概念語の類似度より枠組み表現である機能的表現の類似
度の方が重要と考えられるので、本実施例の検索エンジ
ン１０Ｂは、この点で極めて有用である。尚、検索エン
ジン１０Ｂが類似度を判断する際、概念語の類似度と機
能的表現の類似度とのいずれを重視するかは、アプリケ
ーションにより適宜調整すれば良い。論文やホームペー
ジの検索の場合には概念語の比重が重く、訳文の検索の
場合には機能的表現の比重を重くしておくことも好適で
ある。It should be noted that the external device 90 includes the search engine 10
Upon receiving the search result by B, the search result may be simply displayed, or a process such as translation may be performed using the search result. In the former configuration, for example, a configuration is conceivable in which, from among abstracts of a large number of papers, papers having a high similarity to a search key sentence to be searched are displayed. Alternatively, the present invention is also applied to a configuration in which a search is made for a homepage having a high degree of similarity to the search key sentence to be searched from among a large number of summaries describing the outlines of a huge number of homepages on the Internet, and this is displayed. be able to. the latter,
That is, in the case of translation, the sentence to be translated (search key sentence)
On the other hand, a technique is known in which a sentence very similar to this sentence is searched for in a prepared translated sentence, and a translated sentence is obtained by replacing a concept word in the obtained translated sentence.
Therefore, if the search engine 10B searches for one of the translations prepared in advance, and then replaces the concept word of the searched translation with the translation of the concept word of the sentence to be translated, the translation is performed. good. When detecting a translation with a high degree of coincidence from the translations prepared in advance,
Since it is considered that the similarity of the functional expression, which is the framework expression, is more important than the similarity of the concept word, the search engine 10B of this embodiment is extremely useful in this regard. When the search engine 10B determines the similarity, which of the concept word similarity and the functional expression similarity should be emphasized may be appropriately adjusted by an application. It is also preferable that the concept words are heavier when searching for papers or homepages, and that the functional expressions are heavier when searching for translated sentences.

【０１５２】以上、本発明の実施の形態を第１，第２実
施例を用いて説明した。なお、本実施例の単語間類似度
判定処理および単語列間類似度判定処理においては、文
字列間の構造に関する類似度を、文字列に含まれている
機能的表現同士の近似性を比較することにより判定する
が、この判定手法は、機能的表現の「概念語と結びつい
て文の一構造を形成する性質」と「文の持つ意味を大き
く左右する役割」に着目したことによるものである。従
って、単語間類似度判定処理および単語列間類似度判定
処理は、本実施例に記載された方法に限るものではな
く、機能的表現のような文構造の枠組みを支える表現に
着目した他の判定手法を採用することも可能である。例
えば、文字列中での機能的表現の有無，文字列に用いら
れている機能的表現の位置や種類等についても文字列間
の類似度判定の要素としてもよい。The embodiment of the present invention has been described with reference to the first and second examples. In the inter-word similarity determination processing and the inter-word string similarity determination processing of the present embodiment, the similarity regarding the structure between character strings is compared with the similarity between functional expressions included in the character strings. This determination method is based on the focus on the "characteristics that form a structure of a sentence linked to a concept word" and "the role that greatly affects the meaning of a sentence" of functional expressions. . Therefore, the inter-word similarity determination processing and the inter-word string similarity determination processing are not limited to the method described in the present embodiment, and other methods that focus on expressions supporting a sentence structure framework such as functional expressions are used. It is also possible to adopt a judgment method. For example, the presence / absence of a functional expression in a character string, the position and type of the functional expression used in the character string, and the like may also be used as elements of similarity determination between character strings.

【０１５３】また、本実施例では、類似度検索エンジン
１０Ａや検索エンジン１０Ｂ等を外部装置９０とは別の
装置として設けることにより文間類似度判定装置１Ａを
構成するが、外部装置９０と検索エンジン１０Ａや１０
Ｂとを一体として文間類似度判定装置１Ａやデータ検索
装置１Ｂを構成するものとしても差し支えない。例え
ば、文字列間類否判定処理や類似文字列検出処理を実行
するためのプログラムを外部装置９０にインストールし
たり、公衆電話回線ＰＴＬを通じて外部装置９０にダウ
ンロードすることにより、外部装置９０自体で文間類似
度判定装置１Ａと同じ機能を実現することが可能とな
る。In this embodiment, the inter-sentence similarity determination device 1A is configured by providing the similarity search engine 10A, the search engine 10B, and the like as a separate device from the external device 90. Engine 10A or 10
B may be integrated to form the inter-sentence similarity determination device 1A or the data search device 1B. For example, by installing a program for executing the character string similarity determination processing and the similar character string detection processing in the external device 90, or by downloading the program to the external device 90 through the public telephone line PTL, the external device 90 itself outputs a sentence. The same function as the inter-similarity determination device 1A can be realized.

【０１５４】本実施例の類似度検索エンジン１０Ａや検
索エンジン１０Ｂ等は、外部装置９０から文字列を入力
し、入力文字列に関する類似度の判定結果を外部装置９
０に出力する構成としているが、キーボード等の入力手
段を検索エンジン１０Ａや検索エンジン１０Ｂ自体に備
えることにより文字列を入力可能な構成としたり、ディ
スプレイ等の表示手段を用いて判定結果を表示可能な構
成としても差し支えない。The similarity search engine 10A or the search engine 10B of this embodiment inputs a character string from the external device 90 and outputs the similarity determination result regarding the input character string to the external device 9.
Although it is configured to output to 0, it is possible to input a character string by providing an input means such as a keyboard in the search engine 10A or the search engine 10B itself, or to display a determination result using a display means such as a display. It does not matter even if it has a simple configuration.

【０１５５】また、本発明を実施する他の形態として、
上述の文字列間類否判定プログラム等をコンピュータに
よる読み取り可能に記録した、ＦＤ，ＣＤ−ＲＯＭやＲ
ＯＭチップ等の記録媒体を考えることができる。この記
録媒体に格納された情報をコンピュータ内にインストー
ルすることで、コンピュータは、ＣＰＵからの命令に基
づいて文字列間類否判定プログラム等を実行可能な状態
となり、上記した文間類似度判定装置１Ａやデータ検索
装置１Ｂと同様の機能を実現する。従って、上記と同様
の効果を奏することができる。As another mode for carrying out the present invention,
FD, CD-ROM, R
A recording medium such as an OM chip can be considered. By installing the information stored in the recording medium into the computer, the computer is enabled to execute a character string similarity determination program or the like based on an instruction from the CPU. 1A and a function similar to that of the data search device 1B are realized. Therefore, the same effects as described above can be obtained.

【０１５６】これらの媒体は、例えば図１に示したフレ
キシブルディスク装置ＦＤＤにより読み取られて類似度
判定エンジン１０Ａ等に送信され、その内部の主記憶に
展開して実行される。なお、こうした媒体によらず、サ
ーバーＳＶに置かれたプログラムをネットワークＮＷを
介してモデムから読み込み、主記憶に展開して実行する
ものとしてもよい。These media are read by, for example, the flexible disk drive FDD shown in FIG. 1, transmitted to the similarity determination engine 10A, etc., and developed and executed in the internal main memory. Instead of using such a medium, a program stored in the server SV may be read from a modem via the network NW, developed in the main memory, and executed.

【０１５７】なお、本実施例では、文字列間の意味上の
類似度を判定するが、これ以外の文字列間の関係を判定
することも可能である。例えば、２つの文字列中におけ
る機能的表現の相違に着目することで、２つの文字列同
士の強調や限定，推定の程度の相違や，時制の相違等の
関係などを判定することができる。また、これらの関係
のうちのいくつかを使用者が任意に選択することによ
り、選択された関係についての判定を文間類似度判定装
置１０が実行する構成としてもよい。In this embodiment, the semantic similarity between the character strings is determined, but it is also possible to determine the relationship between other character strings. For example, by focusing on the difference between the functional expressions in the two character strings, it is possible to determine the relationship between the two character strings such as emphasis and limitation, a difference in the degree of estimation, a difference in tense, and the like. Alternatively, the user may arbitrarily select some of these relationships, and the sentence similarity determination device 10 may execute the determination on the selected relationship.

【０１５８】また、本実施例では、単語列と単語列との
間の類似度を判定する際、単語間の類否関係が文字列間
で交差することを禁止するが、この交差を許容する構成
を採ることも可能である。例えば、２個の単語幅の範囲
内で交差を許容する場合、図14に示した単語列間距離演
算処理において、注目単語ａｉ、ｂｊの２個前の単語
までの単語列間距離であるｄ｛ａ（ｉ−２），ｂ（ｊ−
２）｝の値、単語ａｉと単語ｂ（ｊ−１）との語間距
離である２｛１−ｔ（ａｉ，ｂ（ｊ−１））｝の値、お
よび単語ａｉ，単語ｂ（ｊ−１）と交差した関係に有
る単語ａ（ｉ−１）と単語ｂｊとの語間距離である２
｛１−ｔ（ａ（ｉ−１），ｂｊ）｝の値という３つの値
の和をＸ４値として求め、Ｘ１値、Ｘ２値、Ｘ３値お
よびＸ４値のうちの最小値を、単語ａｉまでの単語列と
単語ｂｊまでの単語列との間の距離ｄ（ａｉ，ｂｊ）と
すればよい。同様の考え方で、４個の単語幅以内に交差
を許容する場合は、Ｘ１値からＸ７２値までの７２個の
候補の中から最小値を選べばよい。このように、本発明
を一定の幅のなかで交差を許容する構成に拡張すること
も好適である。Further, in this embodiment, when determining the similarity between word strings, the similarity between words is prohibited from intersecting between character strings, but this intersection is allowed. It is also possible to adopt a configuration. For example, when the intersection is allowed within the range of two word widths, in the word string distance calculation processing shown in FIG. 14, d is the word string distance to the word two words before the attention word ai, bj. ｛A (i-2), b (j-
2) The value of 、, the value of 2 {1-t (ai, b (j-1))}, which is the inter-word distance between word ai and word b (j-1), and word ai, word b (j -1) is the inter-word distance between the word a (i-1) and the word bj in a relationship intersecting with 2
The sum of the three values {1-t (a (i-1), bj)} is determined as the X4 value, and the minimum value among the X1, X2, X3, and X4 values is calculated up to the word ai. And the distance d (ai, bj) between the word string of the word and the word string to the word bj. In the same way, when the intersection is allowed within the width of four words, the minimum value may be selected from 72 candidates from the X1 value to the X72 value. Thus, it is also preferable to extend the present invention to a configuration that allows intersection within a certain width.

【０１５９】以上本発明の実施の形態を実施例に基づい
て説明したが、本発明はこうした実施例に何等限定され
るものではなく、本発明の要旨を逸脱しない範囲内にお
いて種々なる様態で実施し得ることは勿論である。The embodiments of the present invention have been described based on the embodiments. However, the present invention is not limited to these embodiments, and may be implemented in various modes without departing from the gist of the present invention. Of course you can.

[Brief description of the drawings]

【図１】本発明の実施例である文間類似度判定装置のハ
ードウェアの構成を示す説明図である。FIG. 1 is an explanatory diagram illustrating a hardware configuration of an inter-sentence similarity determination apparatus according to an embodiment of the present invention.

【図２】文字列間類否判定処理が実行される際の、類似
度検索エンジン１０Ａと外部装置９０との間の情報の流
れを示す説明図である。FIG. 2 is an explanatory diagram showing a flow of information between a similarity search engine 10A and an external device 90 when an inter-character-string similarity determination process is executed.

【図３】文字列間類否判定ルーチンを示すフローチャー
トである。FIG. 3 is a flowchart illustrating a character string similarity determination routine;

【図４】概念語と機能的表現の役割を説明するブロック
図である。FIG. 4 is a block diagram illustrating roles of a concept word and a functional expression.

【図５】単語間類似度判定ルーチンを示すフローチャー
トである。FIG. 5 is a flowchart illustrating an inter-word similarity determination routine.

【図６】単語間類似度判定ルーチンを示すフローチャー
トである。FIG. 6 is a flowchart illustrating an inter-word similarity determination routine.

【図７】概念語類義語辞書３６ａの構造を示す説明図で
ある。FIG. 7 is an explanatory diagram showing the structure of a concept word synonym dictionary 36a.

【図８】機能的表現類義語辞書３６ｂの構造を示す説明
図である。FIG. 8 is an explanatory diagram showing the structure of a functional expression synonym dictionary 36b.

【図９】単語間情報記録テーブルＧＴに語間類似度およ
び語間距離が記録された様子を示す説明図である。FIG. 9 is an explanatory diagram showing a state where an inter-word similarity and an inter-word distance are recorded in an inter-word information recording table GT.

【図１０】単語列間類似度判定ルーチンＡを示すフロー
チャートである。FIG. 10 is a flowchart showing an inter-word string similarity determination routine A.

【図１１】脱落コスト設定ルーチンを示すフローチャー
トである。FIG. 11 is a flowchart illustrating a dropout cost setting routine.

【図１２】脱落コスト設定処理により座標軸が設定され
た距離グラフＹＧを示す説明図である。FIG. 12 is an explanatory diagram showing a distance graph YG in which coordinate axes have been set by drop-out cost setting processing.

【図１３】距離グラフＹＧ上における、各単語列間の距
離の表わし方を示す説明図である。FIG. 13 is an explanatory diagram showing how to represent the distance between word strings on a distance graph YG.

【図１４】単語列間距離演算ルーチンＡを示すフローチ
ャートである。FIG. 14 is a flowchart showing a word string distance calculation routine A.

【図１５】ｄ（ａ１，ｂ１）の値の演算の過程および結
果を記憶した演算バッファＥＴの様子を示す説明図であ
る。FIG. 15 is an explanatory diagram showing a process of calculating a value of d (a1, b1) and a state of a calculation buffer ET storing a result.

【図１６】距離グラフＹＧ上における原点Ｏから座標
（ａ１，ｂ１）に至るまでの経路を示す説明図である。FIG. 16 is an explanatory diagram showing a path from an origin O to coordinates (a1, b1) on a distance graph YG.

【図１７】ｄ（ａ１，ｂ２）の値の演算の過程および結
果を記憶した演算バッファＥＴの様子を示す説明図であ
る。FIG. 17 is an explanatory diagram showing a process of calculating a value of d (a1, b2) and a state of a calculation buffer ET storing a result.

【図１８】距離グラフＹＧ上における原点Ｏから座標
（ａ１，ｂ２）に至るまでの経路を示す説明図である。FIG. 18 is an explanatory diagram illustrating a path from an origin O to coordinates (a1, b2) on a distance graph YG.

【図１９】単語ａ１からなる単語列と対比文字列Ｂの各
単語列との距離の値を記憶した単語列間距離記録テーブ
ルＤＬの様子を示す説明図である。FIG. 19 is an explanatory diagram showing a state of an inter-word-string distance recording table DL in which a distance value between a word string composed of a word a1 and each word string of a comparison character string B is stored.

【図２０】ｄ（ａ２，ｂ２）の値の演算の過程および結
果を記憶した演算バッファＥＴの様子を示す説明図であ
る。FIG. 20 is an explanatory diagram showing a process of calculating a value of d (a2, b2) and a state of a calculation buffer ET storing a result.

【図２１】距離グラフＹＧ上における原点Ｏから座標
（ａ２，ｂ２）に至るまでの経路を示す説明図である。FIG. 21 is an explanatory diagram showing a route from an origin O to coordinates (a2, b2) on a distance graph YG.

【図２２】入力文字列Ａの各単語列と対比文字列Ｂの各
単語列との距離の値を記憶した単語列間距離記録テーブ
ルＤＬの様子を示す説明図である。FIG. 22 is an explanatory diagram showing a state of an inter-word-string distance recording table DL that stores a distance value between each word string of an input character string A and each word string of a comparison character string B.

【図２３】距離グラフＹＧの各座標に、単語列間距離記
録テーブルＤＬに記録された全ての単語列同士の距離の
値を割り当てたときの様子を示す説明図である。FIG. 23 is an explanatory diagram showing a state in which a distance value between all word strings recorded in the inter-word string distance recording table DL is assigned to each coordinate of the distance graph YG.

【図２４】入力文字列Ａと対比文字列Ｂとの文字列間の
距離を距離グラフＹＧ上にパスを用いて示した説明図で
ある。FIG. 24 is an explanatory diagram showing a distance between character strings between an input character string A and a comparison character string B using a path on a distance graph YG.

【図２５】単語列間距離演算処理Ａの結果、脱落とみな
された単語と類似関係にあるとみなされた単語の別を示
す説明図である。FIG. 25 is an explanatory diagram showing, as a result of the inter-word string distance calculation processing A, different words that are determined to have a similarity to words that have been dropped.

【図２６】文字列間類似度演算ルーチンＡを示すフロー
チャートである。FIG. 26 is a flowchart showing an inter-character-string similarity calculation routine A.

【図２７】単語列間類似度判定ルーチンＢを示すフロー
チャートである。FIG. 27 is a flowchart showing an inter-word string similarity determination routine B.

【図２８】単語重要度設定ルーチンを示すフローチャー
トである。FIG. 28 is a flowchart showing a word importance setting routine.

【図２９】脱落コスト設定処理により座標軸が設定され
た距離グラフＹＷを示す説明図である。FIG. 29 is an explanatory diagram showing a distance graph YW in which coordinate axes have been set by drop-out cost setting processing.

【図３０】国語辞書に格納された各単語の重要度の値を
示す説明図である。FIG. 30 is an explanatory diagram showing importance values of respective words stored in the Japanese language dictionary.

【図３１】単語列間距離演算ルーチンＢを示すフローチ
ャートである。FIG. 31 is a flowchart showing a word string distance calculation routine B.

【図３２】入力文字列Ａの各単語列と対比文字列Ｂの各
単語列との距離の値を記憶した単語列間距離記録テーブ
ルＤＷの様子を示す説明図である。FIG. 32 is an explanatory diagram showing a state of an inter-word-string distance recording table DW that stores a distance value between each word string of the input character string A and each word string of the comparison character string B.

【図３３】入力文字列Ａと対比文字列Ｂとの文字列間の
距離を距離グラフＹＷ上にパスを用いて示した説明図で
ある。FIG. 33 is an explanatory diagram showing the distance between the character strings of the input character string A and the comparison character string B using a path on the distance graph YW.

【図３４】単語列間距離演算処理Ｂの結果、脱落とみな
された単語と類似関係にあるとみなされた単語の別を示
す説明図である。FIG. 34 is an explanatory diagram showing, as a result of the inter-word-string distance calculation processing B, different words that are regarded as having a similar relationship to the word regarded as being dropped.

【図３５】文字列間類似度演算ルーチンＢを示すフロー
チャートである。FIG. 35 is a flowchart showing an inter-character string similarity calculation routine B.

【図３６】第２実施例としてのデータ検索装置１Ｂを示
す説明図である。FIG. 36 is an explanatory diagram showing a data search device 1B as a second embodiment.

【図３７】データ検索ルーチンを示すフローチャートで
ある。FIG. 37 is a flowchart showing a data search routine.

[Explanation of symbols]

１Ａ…文間類似度判定装置１Ｂ…データ検索装置１０Ａ…類似度検索エンジン１０Ｂ…検索エンジン１０ａ…ハードディスク１０ｂ…液晶ディスプレイ１０ｃ…コンピュータ２０…入力インタフェース２２…ＣＰＵ２４…ＲＯＭ２６…ＲＡＭ３４…出力インタフェース３５…バス３６…類義語辞書３６ａ…概念語類義語辞書３６ｂ…機能的表現類義語辞書９０…外部装置９０ａ…ハードディスク９０ｂ…ディスプレイ９０ｃ…コンピュータ９０ｄ…キーボード９２…プリンタ９４…モデムＦＤＤ…フレキシブルディスク装置ＮＷ…ネットワークＰＴＬ…公衆電話回線ＳＶ…サーバー 1A ... sentence similarity determination device 1B ... data search device 10A ... similarity search engine 10B ... search engine 10a ... hard disk 10b ... liquid crystal display 10c ... computer 20 ... input interface 22 ... CPU 24 ... ROM 26 ... RAM 34 ... output interface 35 bus 35 synonym dictionary 36a concept synonym dictionary 36b functional expression synonym dictionary 90 external device 90a hard disk 90b display 90c computer 90d keyboard 92 printer 94 modem FDD flexible disk device NW network PTL: Public telephone line SV: Server

Claims

[Claims]

1. A sentence in a predetermined language representing contents having a certain unity, wherein a first sentence to be determined and a first sentence
And a second sentence whose relationship with the first sentence is determined. The first sentence is used by using a constituent unit that constitutes a sentence in the language and that is classified as having a collective meaning. A natural language sentence relation determining apparatus for determining a relation between a sentence of the first sentence and the second sentence, wherein, among the classified constituent units, a conceptual expression classified as a structural unit representing a semantic concept; Relation information storage means for storing at least information representing the relationship between the conceptual expressions and the relationship between the framework expressions, with respect to the framework expressions extracted as constituent units corresponding to the expressions supporting the framework; and the first sentence and the second sentence A constituent unit extracting unit for extracting the constituent unit from a sentence, and referring to the information stored in the relation information storage unit, among the constituent units constituting the extracted first and second sentences. Said A nature determining means for determining the relationship between the first sentence and the second sentence by determining the relationship between the thought expressions and the framework expressions in consideration of the correspondence in word order. Language sentence relation determination device.

2. The framework expression in which the relationship information storage means stores the relationship is a relationship expression that is an expression representing a relationship between concepts such as a case relationship and a causal relationship with respect to a concept expression representing a semantic concept. The natural language sentence relation determination device according to claim 1.

3. The framework expression in which the relation information storage means stores the relation is an advisory expression that gives broadly defined information such as judgment, attitude, and tense on the sentence. Natural language sentence relation determination device.

4. The natural language sentence relation determination device according to claim 1, wherein the relation determination means determines the correspondence while associating the difference in the appearance order of the extracted constituent units as the correspondence of the word order. A natural language sentence relation determination device which is a means for performing.

5. The natural language sentence relation determination device according to claim 1, wherein the relation determination means allows the difference in the appearance order of the extracted constituent units as the correspondence relation of the word order, and A natural language sentence relation determination device, which is a means for performing the determination by prohibiting the intersection of two or more sets of correspondence relations between constituent units.

6. The natural language sentence relation determining device according to claim 1, wherein the relation determining means determines a relation between the conceptual expressions among the constituent units constituting the first and second sentences. A first determination unit that determines by referring to information stored in the relation information storage unit; and a relationship between the framework expressions among the constituent units constituting the first and second sentences, A second determination unit that determines by referring to information stored in the relation information storage unit; and a determination result obtained by the first and second determination units is used while associating the word order. A natural language sentence relation determining apparatus, comprising: a comprehensive determining means for determining a relation between first and second sentences.

7. The natural language sentence relation determination device according to claim 6, wherein the comprehensive determination unit is configured to correspond to each of the constituent units extracted from the first or second sentence. A missing value setting unit that presets a value in the case where the word does not exist in the other sentence as a missing value, and among the constituent units extracted from the first and second sentences while associating the word order with each other. A relationship value assigning means for assigning a value based on the relationship as a relationship value to those having a predetermined relationship, evaluating the assigned relationship value and the set dropout value, A natural language sentence relation determining apparatus, comprising: an inter-sentence relation value calculating means for obtaining a relation value between two sentences.

8. The drop-out value setting means sets the drop-out value as:
8. The natural language sentence relation determination device according to claim 7, wherein the setting unit sets a value according to the importance of the constituent unit.

9. The drop-out value setting unit is a unit that sets a different drop-out value depending on whether a constituent unit that does not exist in the other sentence is the conceptual expression or the framework expression. The natural language sentence relation determination device according to 1.

10. The natural language sentence relation determination device according to claim 6, wherein the relation information storage means includes, as a relation between the concept expressions, a relation between concept words that independently represent a semantic concept, The relationship between a conceptuality affix and a concept word is stored as the information, and the first determination unit stores the relationship between the concept expressions in the relationship information storage unit in addition to the relationship between the concept words. A natural language sentence relation determining apparatus, which is means for determining a relation between the stored conceptuality affix and a conceptual word.

11. The natural language sentence relation determination device according to claim 1, wherein the relation information storage means stores information indicating a degree of similarity as a relation between the conceptual expressions and between the framework expressions. Rejection information storage means, wherein the relation determination means refers to the degree of similarity between concept expressions and framework expressions stored in the resemblance information storage means, and refers to the degree of similarity between the first and second sentences. A natural language sentence relation determination device, which is a similarity determination unit that determines whether or not the determination is made.

12. The natural language sentence relation determination device according to claim 11, wherein the similarity information storage means includes a pair of the conceptual expressions and a pair of the framework expressions and a semantic relationship between the paired expressions. Is stored in an external storage device as information indicating the degree of similarity in the external storage device, and the similarity determination means uses the magnitude of the similarity degree numerical data. A natural language sentence relation determining device for determining the similarity.

13. The natural language sentence relation determining device according to claim 12, wherein the similarity information storage means sets the similarity numerical value data in a range of values from 0 to 1 as the degree of similarity increases as the degree of similarity increases. 1
The similarity determination unit calculates a distance between the conceptual expressions and the framework expressions from the value of the similarity numerical value data, and a sum of the distances is shortest. A natural language sentence relation determining apparatus, comprising: means for specifying a combination; and means for calculating a sum of the distances in the specified combination as a degree of difference between the first and second sentences.

14. A natural language sentence search apparatus for searching a sentence similar to a search key sentence given as a search key from a plurality of search target sentences, wherein the natural language sentence relationship determination according to claim 11. A device, first sentence specifying means for specifying the search key sentence as a first sentence, and sequentially selecting one sentence from the plurality of search target sentences,
A second sentence specifying means for specifying the sentence as a second sentence; and a determination for giving the specified first sentence and the second sentence to the natural language sentence relation determination device to perform the similarity determination. Execution means, and a determination result of the natural language sentence relation determination device is stored according to the assigned second sentence, and a second sentence most similar to a search key sentence given as the first sentence Selecting means for selecting from among the plurality of search target sentences.

15. A dictionary storing at least information indicating a relationship between the concept expressions, which is an expression representing a semantic concept, input means for inputting a first word and a second word, and Extracting means for extracting a word constituting the phrase from the input first and second phrases; and extracting a word constituting the extracted first phrase and a word constituting the second phrase. Word / phrase relation judgment comprising: judgment means for judging a relation by referring to the dictionary; and judgment means for judging a relation between the first word and the second word based on a judgment result by the judgment means. The apparatus, wherein at least one of the first phrase or the second phrase includes a complex expression that is an expression representing a united semantic concept by combining two or more words, and , An expression corresponding to the compound expression Evaluation means for evaluating the relationship between the first word and the second word in consideration of the evaluation result by the evaluation means. apparatus.

16. The phrase relation determining apparatus according to claim 15, wherein: a storage unit that stores information related to a word in advance; and a word included in an expression corresponding to the compound expression does not exist in the compound expression. A word specifying unit that specifies a word; and a type specifying unit that specifies the type of the specified word by referring to the storage unit. The evaluation unit determines whether the type of the word specified by the type specifying unit is If the word is of a predetermined type, similar to the case where the word corresponding to the word exists in the compound expression, the phrase relation determination means for evaluating the relationship between the compound expression and the expression corresponding to the compound expression. apparatus.

17. A sentence in a predetermined language representing contents having a certain unity in a predetermined language, and a first sentence to be determined and a second sentence for which a relationship with the first sentence is determined are input. A method of determining a relationship between the first sentence and the second sentence by using a constituent unit constituting a sentence in the language, which is classified as having a collective meaning, At least, among the classified structural units, a concept expression classified as a structural unit representing a semantic concept, and a framework expression extracted as a structural unit corresponding to an expression supporting a framework of a sentence structure, at least the concept expression Storing information representing a relationship between expressions and framework expressions; extracting the constituent unit from the first sentence and the second sentence; referring to the stored information; 1
And the relationship between the conceptual expressions and the framework expressions among the constituent units constituting the second sentence is determined in consideration of the correspondence between word orders, whereby the first sentence and the second sentence are determined. Natural language sentence relationship determination method for determining the relationship with

18. The natural language sentence relation determination method according to claim 17, wherein the determination of the relation between the first sentence and the second sentence is performed in the first and second sentences. A first process of determining the relationship between the conceptual expressions among the units by referring to the stored information; and a frame process of the framework expressions among the structural units constituting the first and second sentences. A second process of determining a relationship with reference to the stored information, and integrating the determination results of the concept expression and the framework expression while associating the word order with the first and second processes. A natural language sentence relation determining method including a third process of determining a relation between sentences.

19. The natural language sentence relation determining method according to claim 18, wherein the third processing is performed for each constituent unit extracted from the first or second sentence and has a corresponding configuration. An omission value setting process for setting a value in the case where the unit does not exist in the other sentence as an omission value, and a correspondence between the word units and the constituent units extracted from the first and second sentences Among them, a relationship value providing process of providing a value based on the relationship as a relationship value to those having a predetermined relationship, and evaluating the provided relationship value and the set dropout value, And a second sentence relation value calculating process for obtaining a relation value between the second sentences.

20. The natural language sentence relationship determination method according to claim 17, wherein information indicating a degree of similarity is stored as a relationship between the conceptual expressions and the framework expressions, and The determination of the sentence relation is to determine the similarity between the first and second sentences by referring to the degree of similarity between the stored conceptual expressions and between the stored frame expressions. Method.

21. A method for searching a sentence similar to a search key sentence given as a key for search from a plurality of search target sentences, wherein the search key sentence is specified as a first sentence, From the plurality of search target sentences, sequentially select one sentence,
21. Identifying as a second sentence, executing the natural language sentence relation determination method according to claim 20 using the first sentence and the second sentence, and determining a determination result by the natural language sentence relation determination method. A natural language sentence search for saving a second sentence most similar to the search key sentence given as the first sentence from the plurality of search target sentences, storing the sentence according to the given second sentence. Method.

22. A first word and a second word, which are words representing contents having a certain unity, are input, and the input first word and the second word constitute the word. A word is extracted, and the relationship between the word constituting the extracted first phrase and the word constituting the second phrase is converted into at least the relationship between the concept expressions which are expressions representing the semantic concept. A word relationship determination method for determining a relationship between the first word and the second word based on a result of the determination with reference to a dictionary storing information representing the first word and the second word. At least one of the phrase and the second phrase includes a compound expression that is an expression representing a united semantic concept by combining two or more noun words, and corresponds to the compound expression and the compound expression. Evaluate the relationship with the expression Results considering the phrase relationship determining method of determining the relationship between the second word and the first word.

23. A sentence in a predetermined language representing contents having a certain unity, and a first sentence to be determined and a second sentence for which a relationship with the first sentence is determined are input. A program that determines the relationship between the first sentence and the second sentence by using a constituent unit that constitutes a sentence in the language and that is classified as having a collective meaning. A recording medium recorded in a computer-readable manner, among the classified structural units, a conceptual expression classified as a structural unit representing a semantic concept, and a structural unit corresponding to an expression supporting a framework of a sentence structure. A function of recording at least information indicating the relationship between the concept expressions and the relationship between the framework expressions with respect to the extracted framework expressions, and extracting the constituent units from the first sentence and the second sentence; Referring to the stored information, the extracted first
And the relationship between the conceptual expressions and the framework expressions among the constituent units constituting the second sentence is determined in consideration of the correspondence between word orders, whereby the first sentence and the second sentence are determined. And a function of determining the relationship with the recording medium.

24. The recording medium according to claim 23, wherein the function of determining a relationship between the first sentence and the second sentence includes: A first function of determining a relationship between the conceptual expressions with reference to the stored information; and a relationship between the framework expressions in the structural units constituting the first and second sentences. And a second function of determining the first and second functions by referring to the stored information, and determining results of the concept expression and the framework expression while associating the word order. And a third function of determining a relationship between sentences.

25. The recording medium according to claim 24, wherein, as the third function, for each constituent unit extracted from the first or second sentence, a corresponding constituent unit is the other constituent unit. A function of setting a value that does not exist in a sentence in advance as a missing value; and a predetermined relationship among constituent units extracted from the first and second sentences while associating the word order. A function of assigning a value based on the relationship as a relationship value to the objects, and evaluating the assigned relationship value and the set dropout value to obtain a relationship value between the first and second sentences. A recording medium that records functions.

26. The recording medium according to claim 23, wherein information representing a degree of similarity is recorded as a relationship between the conceptual expressions and between the framework expressions. A recording medium that records a function of determining the similarity between the first and second sentences by referring to the degree of similarity between the stored conceptual expressions and between the stored framework expressions as a function of determining a relationship.

27. A computer-readable program for searching a plurality of search target sentences in a predetermined language representing a content having a certain unity and searching for a sentence similar to a search key sentence given as a search key Among the constituent units that are classified as having a unitary meaning, and are constituent units that constitute a sentence in the language.
For the conceptual expressions categorized as structural units representing semantic concepts and the framework expressions extracted as structural units corresponding to the expressions supporting the framework of the sentence structure, at least information representing the relationship between the conceptual expressions and the relationship between the framework expressions is provided. A function of recording and specifying the search key sentence as a first sentence; a function of sequentially specifying one sentence from the plurality of search target sentences as a second sentence; and a function of the first sentence. And a function of extracting the constituent unit from the second sentence, and referring to the stored information to obtain the extracted first unit.
And the relationship between the conceptual expressions and the framework expressions among the constituent units constituting the second sentence is determined in consideration of the correspondence between word orders, whereby the first sentence and the second sentence are determined. And a function of determining the relationship between the first sentence and the second sentence according to the specified second sentence.
And a function of selecting a second sentence most similar to the search key sentence specified as the sentence from the plurality of search target sentences.

28. A first word and a second word, which are words representing contents having a certain unity, are inputted, and the inputted first word and the second word constitute the word. A word is extracted, and the relationship between the word constituting the extracted first phrase and the word constituting the second phrase is converted into at least a relationship between the concept expressions which are expressions representing a semantic concept. And a computer-readable recording medium storing a program for determining a relationship between the first word and the second word based on a result of the determination with reference to a dictionary storing information representing Wherein at least one of the first phrase and the second phrase includes a composite expression that is an expression representing a group of semantic concepts by combining two or more noun words; The expression and the composite table To evaluate the relationship between the expressed correspond to, taking into account the results of this evaluation, a recording medium recording a function of determining a relationship between the second word and the first word.