JPH08278982A

JPH08278982A - Method for retrieving similar word or similar sentense

Info

Publication number: JPH08278982A
Application number: JP7104870A
Authority: JP
Inventors: Tokuji Ota; 徳二太田; Seiji Kawai; 成治川合; Tadashi Maruyama; 忠丸山; Koji Ikematsu; 孝司池松
Original assignee: Fuji Electric Co Ltd
Current assignee: Fuji Electric Co Ltd
Priority date: 1995-04-05
Filing date: 1995-04-05
Publication date: 1996-10-22

Abstract

PURPOSE: To lighten the load of retrieval and decision processing by making it easy to select a necessary document out of retrieval results and eliminating the need to classify terms. CONSTITUTION: An input document α as a routine document is decomposed into plural constituent terms, which are set as reference terms γ. Words similar to standard words matching the reference terms γ are stored in a similar word storage area βtogether with similarities by referring to a term dictionary (f). The similar words in this area are rearranged to obtain words similar to the reference terms γ, and those similar words are set as document retrieval reference terms. As for a document data base (g) where similar documents are stored and a retrieval area is set, a document having terms similar to the terms for document retrieval is stored as a similar document in a similar document storage area ε. The similar documents in this area are rearranged to find the similarities between the input document and similar documents and store them in the similar document storage area ε, and a similar document corresponding to the input document α is outputted.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、入力された文章の構成
用語の類似語を用語辞書から検索する類似語の検索方
法、及び、この類似語に基づいて入力文章の類似文章を
過去の事例の文章等が格納されたファイルから検索する
類似文章の検索方法に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a similar word search method for searching a word dictionary for a similar word of a constituent term of an input sentence, and a similar case of an input sentence in the past based on the similar word. The present invention relates to a method of searching for a similar sentence that is searched from a file in which the sentence etc.

【０００２】[0002]

【従来の技術】従来、コンピュータに登録されている文
章の中から問い合わせられた文章と類似する文章を検索
するためには、キーワードを予め登録しておき、このキ
ーワードを有する文章を検索することにより類似と判定
するか、または、予め各種事例の文章を何らかの分類に
従い分類しておいて検索する方法が一般的であった。2. Description of the Related Art Conventionally, in order to retrieve a sentence similar to an inquired sentence from among sentences registered in a computer, a keyword is registered in advance and a sentence having this keyword is searched. A general method has been to judge that they are similar to each other, or to classify the sentences of various cases in advance according to some classification and perform a search.

【０００３】[0003]

【発明が解決しようとする課題】このうち、キーワード
登録による方法では、キーワードを持つ文章のすべてが
検索されて提示されることになり、事例が多数ある場合
には検索されて提示される文章が余りにも多くなり、実
際に必要とする文章をその中から見つけにくいことがし
ばしばある。また、分類による検索では、検索者が分類
規準を知らないと検索したい文章を見つけるのが困難で
ある。Among these, in the method based on keyword registration, all sentences having keywords are searched and presented, and when there are many cases, the sentences that are searched and presented are Too many and it is often difficult to find the text you actually need. Further, in the search by classification, it is difficult for a searcher to find a sentence to be searched unless the search criteria is known.

【０００４】更に、システム提案書や製品取扱説明書等
を作成する場合には、文書作成時に過去に作成した類似
度の高い文章を検索し、この文章を流用できれば作成工
数の削減を期待することができる。しかしながら、文章
を検索して流用するという観点からの検索方法は未だ実
現されていない。加えて、文章の類似判別では同義語の
個数等を考慮した方法があるが、用語や文章構成全体で
の類似度に着目して実行するものは存在していない。Further, when creating a system proposal or a product instruction manual, a sentence with a high degree of similarity created in the past should be searched at the time of creating the document, and if this sentence can be diverted, it is expected that the number of production steps will be reduced. You can However, a search method from the viewpoint of searching and diverting sentences has not yet been realized. In addition, there is a method that considers the number of synonyms and the like in the similarity determination of sentences, but there is no method that pays attention to the degree of similarity in terms and the entire sentence structure.

【０００５】本発明は上記問題点を解決するためになさ
れたもので、その目的とするところは、多数の類似文章
が検索された場合でもその中から必要な文章の選定を容
易にし、また、用語の分類を不要にして用語の類似判別
に基づいて文章の類似判別を行うようにし、しかも検索
や判別処理の負荷が過大になることがない類似語または
類似文章の検索方法を提供することにある。The present invention has been made to solve the above problems, and an object of the present invention is to facilitate selection of a required sentence from a large number of similar sentences even if a large number of similar sentences are searched for. To provide a method of searching for similar words or similar sentences that does not require classification of terms and performs similarity determination of sentences based on similarity determination of terms, and that does not impose an excessive load on search and determination processing. is there.

【０００６】[0006]

【課題を解決するための手段及び作用】まず始めに、本
発明において使用される用語を以下のように定義する。Ａ．同義と類似、及び類似度同義とは意味内容の完全一致を言い、類似とは部分的な
一致を言う。従って、意味内容の一致の程度を数値で表
わすこととしてそれを類似度と呼べば、同義の場合には
類似度が例えば１．０、類似の場合には類似度が１．０
未満の小数値となる。First, the terms used in the present invention are defined as follows. A. Synonym and similarity, and degree of similarity Synonym means complete agreement of meaning content, and similarity means partial agreement. Therefore, if the degree of coincidence of meaning contents is represented by a numerical value and called as similarity, the similarity is 1.0, for example, and 1.0 for similarity.
It is a decimal value less than.

【０００７】（具体例）・「工程」と「ライン」や、「計画」と「スケジュー
ル」は同義語・「計画作業」と「計画作成」や「計画調整」は類似語・「機械加工工程の生産計画」と「機械加工ラインの生
産スケジュール」は同義文章・「機械加工工程の生産計画作業」と「機械加工工場の
生産計画調整」は類似文章本発明では、このような（同義を含む）類似語や類似文
章に対し、類似度の計算処理を行って検索を行なう。(Specific example)-"Process" and "line" or "plan" and "schedule" are synonymous words- "Planned work" and "planning" or "plan adjustment" are similar words- "Machining process""Productionplan" and "machining line production schedule" are synonymous sentences ・ "Machining process production planning work" and "machining factory production plan adjustment" are similar sentences In the present invention, such (including synonyms ) Similar words and similar sentences are searched by performing similarity calculation processing.

【０００８】Ｂ．用語辞書日常的な辞書には同義語が集められているが、その内容
を上記の同義か類似かで吟味すると、類似語も集められ
ている。この観点から、図２の具体例のような用語辞書
を前提として、この辞書に基づいた類似語の検索を実行
する。なお、図２に示した類似語の中には、規準用語と
しても登録されていて類似関係が整合しているものと、
類似語としてだけ登録されていて規準用語にはなってい
ないものとがある。上述したような用語辞書を標準的に
作成して共通利用することは、一般的には非常に困難で
あるから、分野や対象の機能などに依存して文章の共同
利用者の範囲内で共有化すれば十分である。B. Term Dictionary A synonym is collected in a daily dictionary, but if the contents are examined for synonyms or similarities, similar words are also collected. From this point of view, assuming a term dictionary as in the specific example of FIG. 2, a similar word search is performed based on this dictionary. In addition, among the similar words shown in FIG. 2, it is registered as a reference term and the similar relationship is matched,
Some are registered only as similar words and are not standard terms. It is generally very difficult to create a standard term dictionary as described above and use it in common. Therefore, depending on the field and target function, it can be shared within the scope of joint users of sentences. It is enough to change.

【０００９】Ｃ．定形文章と非定形文章本発明における検索対象の文章は、用語が集まったもの
と考えている。普通の文章は、任意の構成によって用語
を使用するので非定形文章と呼び、これに対して、一定
の形式に従って記述されている文章を定形文章と呼ぶ。
定形文章の場合には、その形式に準拠して文章を構成し
ている用語が分類されることになるので、比較判別を能
率的に行うことができ、しかも結果が明瞭となる。本発
明は、この定形文章の類似検索方法として特に有用であ
る。C. Fixed sentence and non-fixed sentence The sentence to be searched in the present invention is considered to be a collection of terms. Ordinary sentences are called non-standard sentences because they use terms with arbitrary configurations, while sentences described according to a certain format are called fixed sentences.
In the case of a fixed sentence, the terms that compose the sentence are classified according to the format, so that the comparison and determination can be efficiently performed, and the result is clear. The present invention is particularly useful as a method for retrieving similarities in this fixed text.

【００１０】（具体例）・文１（定形文章）；『生産指示に基づきライン構成を頻繁に変更する』この文章を、ソフトウェアや機械の機能を表現するＩＰ
Ｏ（コンピュータのソフトウェア技術として広く知られ
ている；Ｉは該当文章が意味する入力部、Ｐは処理部、
Ｏは出力部）形式に準拠すると、ＩとＯ；ライン構成Ｐ；（頻繁に）変更する補助部；生産指示に基づきのように分解される。このような用語を文１の構成用語
と呼び、この形式に対応する処理を実行する。(Specific example) Sentence 1 (fixed form sentence): "Frequently change the line configuration based on production instructions" This sentence is an IP that expresses the functions of software and machines
O (Widely known as computer software technology; I is an input section that the corresponding sentence means, P is a processing section,
O is the output part) format, I and O; line configuration P; (frequently) changing auxiliary part; disassembled as follows based on production instructions. Such a term is called a constituent term of sentence 1, and a process corresponding to this format is executed.

【００１１】・文２（非定形文章）；『ライン構成の変更がしばしば頻繁に行われる』このような形式についても定形とする場合には定形文章
として処理可能であるが、ＩＰＯ形式に準拠して構成用
語に分解することが難しいので、この文２は非定形文章
と考えることにする。Sentence 2 (non-standard text): "Line structure is frequently changed." When such a format is also standardized, it can be processed as a standardized text, but it conforms to the IPO format. Since it is difficult to break it down into constituent terms, we will consider sentence 2 as an atypical sentence.

【００１２】なお、本発明が適用される定形文章をＩＰ
Ｏ形式に限定する必要はないが、本発明は主として情報
処理用文章の検索を目的としており、実際の情報処理で
はＩＰＯ形式が最も普遍的であると考えられるので、上
記具体例ではこのＩＰＯ形式に準拠して構成用語を分類
したものである。The fixed text to which the present invention is applied is an IP.
Although it is not necessary to limit to the O format, the present invention is mainly intended to search texts for information processing, and the IPO format is considered to be the most universal in actual information processing. It is a classification of constituent terms in accordance with.

【００１３】Ｄ．定形文章に関する類似度計算の重み定形文章ではその構成における重要度を配慮することが
できるので、この重要度を類似度計算時の重みとして使
うこととする。（具体例）ＩＰＯ形式では、Ｐに相当する部分が重要で
あり、ＩとＯはＰの事例となることが多い。その場合
に、Ｐ部分に大きな重みを付け、Ｉ部分とＯ部分とに小
さな重みを付けることにより、Ｐ部分の類似性を最重要
視してＩ部分とＯ部分が相違していても文章としては全
体として良く類似していると判断することができる。逆
に、特殊な機能でＩ部分とＯ部分に関する依存性が高い
場合には、対応する重みを大きくすることにより、Ｉ部
分やＯ部分が異なるなら類似度を小さくして余り類似し
ていないと判断することができる。D. Weight of similarity calculation for fixed text Since the fixed text can take into account the importance of its composition, this importance is used as a weight when calculating the similarity. (Specific example) In the IPO format, the portion corresponding to P is important, and I and O are often cases of P. In that case, a large weight is given to the P portion and a small weight is given to the I portion and the O portion. Can be judged to be very similar as a whole. On the other hand, when the I part and the O part have a high dependency due to a special function, the corresponding weights are increased to reduce the similarity if the I part and the O part are different, so that they are not so similar. You can judge.

【００１４】なお、本発明は、以下の手段により実現さ
れる。ａ．文章入力手段類似文章を検索するべき文章（参照文章）を入力する周
知の手段である。定形文章に類似する文章を検索する場
合には、その形式も同時に入力する。ｂ．文章内の用語抽出手段入力文章内の用語を抽出する周知の手段であり、この手
段により抽出された一群の用語を検索時に参照する用語
（参照用語）とする。The present invention is realized by the following means. a. Text input means This is a well-known means for inputting a text (reference text) to be searched for a similar text. When searching for a sentence that is similar to a fixed phrase, enter the format as well. b. Term Extraction Means in Sentence This is a well-known means for extracting terms in an input sentence, and a group of terms extracted by this means is referred to as a term (reference term) at the time of retrieval.

【００１５】ｃ．類似語検索手段入力された参照用語に類似する用語を用語辞書の中から
検索し、類似語及び類似度を格納領域に格納すると共に
表示する。ｄ．類似文章検索手段入力文章を構成用語に分解し、これらの構成用語の類似
語を使用している文章データベース内の文章を検索し
て、これを類似文章として格納領域に格納すると共に表
示する。C. Similar word search means A term similar to the input reference term is searched from the term dictionary, and the similar word and the degree of similarity are stored in the storage area and displayed. d. Similar sentence retrieving means Decomposes an input sentence into constituent terms, retrieves a sentence in a sentence database that uses similar terms of these constituent terms, stores this as a similar sentence in a storage area, and displays it.

【００１６】ｅ．文章の類似度計算手段類似語検索手段の中に必要に応じて用意されるものであ
り、入力文章と類似文章との類似度をそれぞれの文章を
構成する用語の類似度に基づいて計算する。このとき、
定形文章の類似文章を検索する場合には重み付け処理を
実行する。ｆ．用語辞書図２に示したテーブルを格納したものである。E. Sentence similarity calculating means, which is prepared as needed in the similar word searching means, calculates the similarity between the input sentence and the similar sentence based on the similarity of terms constituting each sentence. At this time,
A weighting process is executed when a similar sentence of a fixed sentence is searched. f. Term dictionary This is a table in which the table shown in FIG. 2 is stored.

【００１７】ｇ．文章データベース検索対象の文章が格納されたデータベースである。な
お、非定形文章では、文字列そのものを格納する。定形
文章では、構成ごとに分解された用語と、用語でない文
字列と、重みとを格納する。G. Sentence database This is a database that stores the sentences to be searched. The non-standard text stores the character string itself. In the fixed text, a term decomposed for each structure, a character string that is not a term, and a weight are stored.

【００１８】ｈ．検索指示・結果表示手段類似語や類似文章を検索する処理を指示し、検索結果を
表示させるものであり、前記文章入力手段と同一のもの
を用いることが可能である。ｉ．結果印刷手段表示のみではなく、必要に応じて印刷する際に用いられ
る。H. Retrieval instruction / result display means For instructing a process for retrieving a similar word or a similar sentence and displaying a retrieval result, the same one as the sentence input means can be used. i. Result printing means Used not only for displaying but also for printing as needed.

【００１９】ｊ．類似文章検索装置本発明の処理の全体を一貫して実行するものである。ｋ．文章構成要素重み表定形文章の構成要素に対して設定された重みを格納した
表である。この重みは、処理ごとに入力する場合もある
が、定形文章に関連して共通に決められている場合もあ
る。J. Similar sentence retrieval device Consistently executes the entire processing of the present invention. k. Text component weight table This is a table that stores the weights set for the components of a fixed text. The weight may be input for each process, but may be commonly determined in relation to the fixed text.

【００２０】上記各手段により実現される本発明の要旨
は、次のとおりである。請求項１記載の第１の発明に係
る類似語の検索方法は、定形文章である入力文章を複数
の構成用語に分解してこれらの構成用語を参照用語に設
定し、用語辞書を参照して前記参照用語に一致する規準
用語があればその類似語を類似度と共に類似語格納領域
に格納し、前記参照用語に一致する規準用語がなければ
類似語が一致する規準用語を当該類似語の類似度と共に
類似語格納領域に格納し、類似語格納領域内の類似語を
整理して参照用語の類似語を得ることを特徴とする。The gist of the present invention realized by each of the above means is as follows. A similar word search method according to the first invention of claim 1 decomposes an input sentence which is a fixed sentence into a plurality of constituent terms, sets these constituent terms as reference terms, and refers to a term dictionary. If there is a canonical term that matches the reference term, the similar word is stored in the similar word storage area together with the similarity, and if there is no canonical term that matches the reference term, the canonical term that the similar word matches is stored as a similarity of the similar word. It is characterized in that the similar words in the similar word storage area are stored together with the degree and the similar words in the similar word storage area are arranged to obtain the similar words of the reference term.

【００２１】請求項２記載の第２の発明に係る類似語の
検索方法は、請求項１記載の検索方法において、用語辞
書には、規準用語の説明や品詞等からなる一般的内容に
加えて、前記規準用語に対応する類似語及び数値にて表
現された類似度が格納されていることを特徴とする。According to a second aspect of the present invention, there is provided a method of retrieving similar words according to the first aspect, wherein the term dictionary has general contents including explanation of standard terms and parts of speech. , Similarities corresponding to the reference terms and the similarities expressed by numerical values are stored.

【００２２】請求項３記載の第３の発明に係る類似文章
の検索方法は、請求項１または２記載の検索方法を実行
して入力文章の複数の構成用語に対応する類似語をそれ
ぞれ検索し、これらの類似語を文章検索参照用語として
設定すると共に、類似文章が格納されかつ検索領域が設
定された文章データベースを対象として前記文章検索参
照用語に類似する用語を有する文章を類似文章として類
似文章格納領域に格納し、類似文章格納領域内の類似文
章を整理してから入力文章と類似文章との類似度を求め
て類似文章格納領域に格納することを特徴とする。A similar sentence search method according to a third aspect of the present invention is to execute the search method according to the first or second aspect to search for similar words corresponding to a plurality of constituent terms of an input sentence. , Setting these similar words as text search reference terms, and targeting a text database in which similar texts are stored and a search area is set, a text having a term similar to the text search reference term as a similar text It is characterized in that it is stored in the storage area, the similar texts in the similar text storage area are organized, and then the degree of similarity between the input text and the similar text is obtained and stored in the similar text storage area.

【００２３】請求項４記載の第４の発明に係る類似文章
の検索方法は、請求項３記載の検索方法において、入力
文章の構成用語に対応する類似語が、類似文章の構成用
語の中にある場合に、この類似文章の構成用語の類似度
を入力文章の構成用語の類似度に設定し、この構成用語
の類似度に基づいて入力文章と類似文章との類似度を求
めることを特徴とする。According to a fourth aspect of the present invention, there is provided a similar sentence search method according to the third aspect, wherein a similar word corresponding to a constituent term of the input sentence is included in the constituent terms of the similar sentence. In some cases, the similarity of the constituent terms of the similar sentence is set to the similarity of the constituent terms of the input sentence, and the similarity between the input sentence and the similar sentence is obtained based on the similarity of the constituent terms. To do.

【００２４】請求項５記載の第５の発明に係る類似文章
の検索方法は、請求項３記載の検索方法において、入力
文章の構成用語に対応する類似語が、類似文章の構成用
語の中にない場合に、無対応構成類似度を入力文章の構
成用語の類似度に設定し、この構成用語の類似度に基づ
いて入力文章と類似文章との類似度を求めることを特徴
とする。According to a fifth aspect of the present invention, there is provided a method for retrieving similar sentences according to the third aspect, wherein the similar words corresponding to the constituent terms of the input sentence are included in the constituent terms of the similar sentence. In the case where there is no correspondence, the non-corresponding composition similarity is set to the similarity of the constituent terms of the input sentence, and the similarity between the input sentence and the similar sentence is obtained based on the similarity of the constituent terms.

【００２５】請求項６記載の第６の発明に係る類似文章
の検索方法は、請求項４または５記載の検索方法におい
て、構成用語の類似度に基づいて入力文章と類似文章と
の類似度を求めるに際し、類似文章の複数の構成用語の
類似度のうち最大値または最小値もしくは類似度の積に
基づいて入力文章と類似文章との類似度を求めることを
特徴とする。A similar sentence search method according to a sixth aspect of the present invention is the search method according to the fourth or fifth aspect, wherein the similarity between the input sentence and the similar sentence is determined based on the similarity of the constituent terms. When obtaining, the similarity between the input sentence and the similar sentence is obtained based on the maximum value, the minimum value, or the product of the similarities of the plurality of constituent terms of the similar sentence.

【００２６】請求項７記載の第７の発明に係る類似語の
検索方法は、請求項１または２記載の検索方法におい
て、検索された類似語のうちその類似度が規準類似度以
上である用語のみを類似語として処理することを特徴と
する。According to a seventh aspect of the present invention, there is provided a method for retrieving a similar word, wherein in the retrieval method according to the first or second aspect, the similarity among the retrieved similar words is equal to or higher than the standard similarity. It is characterized by processing only as a similar word.

【００２７】請求項８記載の第８の発明に係る類似文章
の検索方法は、請求項３から６の何れか１項に記載した
検索方法において、検索された類似語のうちその類似度
が規準類似度以上である用語のみを類似語として処理す
ることを特徴とする。According to an eighth aspect of the present invention, there is provided a method for retrieving a similar sentence according to any one of the third to sixth aspects, wherein the similarity among the retrieved similar words is a criterion. It is characterized in that only terms having a degree of similarity or higher are processed as similar terms.

【００２８】請求項９記載の第９の発明に係る類似文章
の検索方法は、請求項４から６の何れか１項に記載した
検索方法において、定形文章である類似文章の構成用語
の重要性に応じて前記構成用語の類似度に重み付けを行
い、この重み付けされた類似度に基づいて入力文章と類
似文章との類似度を求めることを特徴とする。According to a ninth aspect of the present invention, there is provided a method for retrieving similar sentences, wherein in the retrieval method according to any one of the fourth to sixth aspects, the significance of the constituent terms of the similar sentence which is a fixed sentence. The degree of similarity of the constituent terms is weighted according to, and the degree of similarity between the input sentence and the similar sentence is obtained based on the weighted degree of similarity.

【００２９】請求項１０記載の第１０の発明に係る類似
文章の検索方法は、定形文章である入力文章を複数の構
成用語に分解してこれらの構成用語を参照用語に設定
し、用語辞書を参照して前記参照用語に一致する規準用
語の類似語を類似度と共に類似語格納領域に格納し、こ
の類似語格納領域内の類似語を整理して参照用語の類似
語を得、これらの類似語を文章検索参照用語として設定
すると共に、類似文章が格納されかつ検索領域が設定さ
れた文章データベースを対象として前記文章検索参照用
語に類似する用語を有する文章を類似文章として類似文
章格納領域に格納し、この類似文章格納領域内の類似文
章を整理してから入力文章と類似文章との類似度を求め
て類似文章格納領域に格納し、入力文章に対応する類似
文章を出力することを特徴とする。According to a tenth aspect of the present invention, a similar sentence search method decomposes an input sentence which is a fixed sentence into a plurality of constituent terms, sets these constituent terms as reference terms, and sets a term dictionary. A similar word of the reference term that matches the reference term is stored together with the degree of similarity in the similar word storage area, and the similar words in the similar word storage area are organized to obtain similar words of the reference term. A word is set as a sentence search reference term, and a sentence having a term similar to the sentence search reference term is stored as a similar sentence in the similar sentence storage area for a sentence database in which similar sentences are stored and a search area is set. Then, after arranging the similar sentences in this similar sentence storage area, the similarity between the input sentence and the similar sentence is calculated, stored in the similar sentence storage area, and the similar sentence corresponding to the input sentence is output. And it features.

【００３０】請求項１１記載の第１１の発明に係る類似
文章の検索方法は、請求項１０記載の検索方法により出
力された類似文章では検索が不十分であると判断される
際に、参照用語に類似語が一致する規準用語を用語辞書
から検索して当該類似語の類似度と共に類似語格納領域
に格納し、この類似語格納領域内の類似語を整理して参
照用語の類似語を得、これらの類似語を文章検索参照用
語として設定すると共に、類似文章が格納されかつ検索
領域が設定された文章データベースを対象として前記文
章検索参照用語に類似する用語を有する文章を類似文章
として類似文章格納領域に格納し、この類似文章格納領
域内の類似文章を整理してから入力文章と類似文章との
類似度を求めて類似文章格納領域に格納し、入力文章に
対応する類似文章を出力することを特徴とする。In the similar sentence search method according to the eleventh aspect of the present invention, the reference term is used when it is determined that the similar sentence output by the search method according to the tenth aspect is insufficient for the search. A reference term matching a similar word is searched from the term dictionary and stored in the similar word storage area together with the similarity of the similar word, and the similar words in the similar word storage area are sorted to obtain the similar word of the reference term. , Setting these similar words as text search reference terms, and targeting a text database in which similar texts are stored and a search area is set, a text having a term similar to the text search reference term as a similar text The similar sentence corresponding to the input sentence is stored in the storage area, the similar sentences in the similar sentence storage area are sorted, the similarity between the input sentence and the similar sentence is calculated, and the similarity is stored in the similar sentence storage area. And outputs.

【実施例】以下、図に沿って本発明の実施例を説明す
る。図１は、本発明を実現する類似文章検索装置ｊの全
体構成を、作業領域や周辺の入出力手段等と一緒に示し
たものである。まず、この図に基づいて、処理する用語
と文章の状況とを説明する。Embodiments of the present invention will be described below with reference to the drawings. FIG. 1 shows the overall structure of a similar sentence search device j for implementing the present invention, together with a work area and peripheral input / output means. First, the terms to be processed and the situation of sentences will be described based on this figure.

【００３１】中心の事例としては、・文１；『生産指示に基づきライン構成を頻繁に変更す
る』を想定し、この文章を類似文章を検索するべきキーの入
力文章とする。この文章は文章入力手段ａにより入力さ
れ、作業領域αに格納される。定形文章としては、既に
述べたようにＩＰＯ形式に準拠することとする。この場
合、文章の構成要素は、『入力部；Ｉ』、『補助部』、
『出力部；Ｏ』、『処理部；Ｐ』となっており、作業領
域α内の文１が分解されて構成用語として作業領域βに
格納される。この構成用語の抽出は、用語抽出手段ｂが
実行する。The main cases are as follows: Sentence 1; “Frequently change the line configuration based on production instructions” is assumed, and this sentence is used as the input sentence of a key to search for a similar sentence. This text is input by the text input means a and stored in the work area α. The fixed text is based on the IPO format as described above. In this case, the constituent elements of the sentence are “input part; I”, “auxiliary part”,
The sentence 1 in the work area α is decomposed and stored in the work area β as a constituent term. The extraction of the constituent terms is executed by the term extracting means b.

【００３２】入力部；ライン構成補助部；生産指示に基づき出力部；ライン構成処理部；頻繁に変更するInput section; line configuration auxiliary section; output section based on production instructions; line configuration processing section; change frequently

【００３３】これに類似する文章を文章データベースｇ
の中から検索するわけであるが、ここでは、文章データ
ベースｇに次のような文章が格納されているものとす
る。・文２；『ライン構成の変更が頻繁に行われる』・文３；『ライン構成の変更がしばしば頻繁に行われ
る』・文４；『生産指示に基づき加工機械の割付けを決定す
る』・文５；『部品の入荷に合わせて待ち時間を減らすよう
に生産計画を最適化する』A sentence database g is used for sentences similar to this.
The following sentence is stored in the sentence database g. -Sentence 2; "Line configuration is frequently changed" -Sentence 3; "Line configuration is often changed" -Sentence 4; "Processing machine allocation is determined based on production instructions" -Sentence 5; “Optimize production planning to reduce waiting time as parts arrive”

【００３４】なお、上記各文章のうち文２のみが非定形
文章として格納されているのに対し、文３〜文５は定形
文章として格納されており、表１に示すような構成用語
に分解されているものとする。Of the above-mentioned sentences, only sentence 2 is stored as a non-standard sentence, while sentences 3 to 5 are stored as standard sentences and are decomposed into constituent terms as shown in Table 1. It has been done.

【００３５】[0035]

【表１】 [Table 1]

【００３６】また、本実施例の内容を具体的に説明する
ために必要な用語辞書ｆの例として、図３に示すものを
使用することにする。図３における『変更する』、『作
成する』、『決定する』という三つの用語の類似関係は
多段になっており、このような辞書を作った場合には、
図４に示すような類似度の解釈をしていることになる。The term dictionary f shown in FIG. 3 is used as an example of the term dictionary f necessary for specifically explaining the contents of the present embodiment. The similar relations of the three terms “change”, “create”, and “determine” in FIG. 3 have multiple stages, and when such a dictionary is created,
This means that the similarity is interpreted as shown in FIG.

【００３７】すなわち、『作成する』と『決定する』と
の間の類似度は０．９であり、『変更する』と『作成す
る』との間の類似度も１以下（０．８）であるから、
『変更する』と『決定する』との間の類似度（０．７
２）は、『変更する』と『作成する』との間の類似度
（０．８）よりも小さくなるとして用語を使用してい
る。逆の見方をすれば、『決定する』と『変更する』と
の間の類似度（０．７２）よりも、『決定する』以上に
『変更する』に類似している『作成する』と、『変更す
る』との間の類似度（０．８）の方が大きくなってい
る。That is, the similarity between "create" and "determine" is 0.9, and the similarity between "change" and "create" is 1 or less (0.8). Therefore,
Similarity between "change" and "determine" (0.7
2) uses the term as being smaller than the similarity (0.8) between “change” and “create”. From the opposite viewpoint, “create” is more similar to “change” than “determine” rather than the similarity (0.72) between “decide” and “change”. , The similarity (0.8) between "change" is larger.

【００３８】ここで、『変更する』と『決定する』との
間の類似度（０．７２）は、『変更する』と『作成す
る』との間の類似度（０．８）と、『作成する』と『決
定する』との間の類似度（０．９）との積により求めて
おり、図３の用語辞書は積型によって作られている。な
お、これ以外に類似度の最大値をとる最大値型または最
小値をとる最小値型の用語辞書を用いても良い。Here, the similarity (0.72) between “change” and “determine” is the similarity (0.8) between “change” and “create”. It is calculated by the product of the similarity (0.9) between "create" and "determine", and the term dictionary in FIG. 3 is created by the product type. Other than this, a maximum value type term dictionary that takes the maximum value of the similarity or a minimum value type term dictionary that takes the minimum value may be used.

【００３９】次に、図５は、用語辞書ｆを使用して参照
用語の類似語を検索し格納する処理を示すフローチャー
トであり、請求項１に記載の発明の実施例に相当する。
また、図３に示した用語辞書ｆは請求項２記載の発明の
実施例において使用されるものである。Next, FIG. 5 is a flowchart showing a process of searching for a similar word of a reference term by using the term dictionary f and storing it, and corresponds to the embodiment of the invention described in claim 1.
The term dictionary f shown in FIG. 3 is used in the embodiment of the invention described in claim 2.

【００４０】図５の一連の処理は、図１における類似語
検索手段ｃにより実行される。類似文章を検索するため
の前準備として類似語を検索するだけであるから、図１
における参照用語として『変更する』を作業領域γに設
定する（１０１）。また、用語辞書ｆを共通化すること
は難しいため、用途に合うものを指定する（１０２）。The series of processing shown in FIG. 5 is executed by the similar word searching means c shown in FIG. As a preparation for searching a similar sentence, it is only necessary to search for a similar word.
“Change” is set in the work area γ as a reference term in (101). Further, since it is difficult to make the term dictionary f common, a term that suits the purpose is designated (102).

【００４１】こうして参照用語及び用語辞書ｆが決まっ
たら、用語辞書ｆの規準用語の検索から始める。図３に
示した用語辞書ｆであれば、『変更する』が規準用語と
して登録されているから、１０３で一致する規準用語
『変更する』の欄の用語の中に類似語領域（類似語−
１、類似語−２、……）がある限り（１０５）、類似語
格納領域δへの格納を続ける（１０４）。この結果、
『変更する』の類似語として、『作成する』、『決定す
る』、『行われる』が各々の類似度と共に格納領域δへ
格納される（１０４）。用語辞書ｆには同一の規準用語
は唯一つしか登録されていないから、この処理は１回実
行されるだけである。When the reference term and the term dictionary f are determined in this way, the reference term of the term dictionary f is searched for. In the term dictionary f shown in FIG. 3, since “change” is registered as the reference term, a similar word area (similar word −
As long as there are 1, similar words-2, ...) (105), the storage in the similar word storage area δ is continued (104). As a result,
Similar words of “change”, “create”, “determine”, and “perform” are stored in the storage area δ together with the respective degrees of similarity (104). Since only the same reference term is registered in the term dictionary f, this process is executed only once.

【００４２】１０３において規準用語と一致しなくて
も、類似語として『変更する』が用語辞書ｆ内に登録さ
れているものからも類似語を検索していく（１０６）の
で、用語辞書ｆの類似語に『変更する』が登録されてい
る規準用語『作成する』、『決定する』（この二つは重
複して登録される）が類似語として各々の類似度と共に
格納領域δへ格納される（１０７）。なお、図３には示
されていないが、例えば規準用語『最適化する』につい
てもその類似語領域に『変更する』が登録されていれ
ば、同様に類似語として類似度と共に格納領域δへ格納
される。Even if the term does not match the reference term in 103, the similar word "change" is searched for from the word registered in the term dictionary f (106). The standard terms “create” and “determine” (these two are registered in duplicate), in which “change” is registered as a similar word, are stored in the storage area δ as similar words together with their respective similarities. (107). Although not shown in FIG. 3, for example, also for the reference term “optimize”, if “change” is registered in the similar word area, the similar word is stored in the storage area δ together with the similarity. Is stored.

【００４３】規準用語が替わって次々に進んでいき、辞
書の領域が終われば（１０８）、重複して格納された
『作成する』、『決定する』を消去してそれぞれ一つだ
けを格納する（１０９）。これで、参照用語『変更す
る』に関する類似語の検索を終了する。なお、検索指示
及び検索結果は図１の検索指示・結果表示手段ｈにより
表示され、また、必要に応じて結果印刷手段ｉにより印
刷出力される。When the reference terms change and the dictionary area ends (108), the duplicated "create" and "determine" stored are deleted and only one is stored. (109). This completes the search for similar words related to the reference term “change”. The search instruction and the search result are displayed by the search instruction / result display means h in FIG. 1, and are printed out by the result printing means i as necessary.

【００４４】図６は、図１の文章データベースｇの中か
ら類似文章を検索して格納する処理を示すフローチャー
トであり、請求項３記載の発明の実施例に相当する。こ
の処理は、図１の類似文章検索手段ｄにより実行され
る。以下、この処理手順を、入力文章が前記文１；『生
産指示に基づきライン構成を頻繁に変更する』であると
して説明する。FIG. 6 is a flow chart showing a process of searching for similar sentences from the sentence database g of FIG. 1 and storing them, which corresponds to the embodiment of the invention described in claim 3. This processing is executed by the similar sentence search means d in FIG. Hereinafter, this processing procedure will be described assuming that the input sentence is the sentence 1; “The line configuration is frequently changed based on the production instruction”.

【００４５】文１の構成用語は、『生産指示に基づ
き』、『ライン構成』、『頻繁に変更する』であるか
ら、この三つを参照用語として設定する（２０１）。そ
して、上述した類似語検索手段ｃにより、この三つの参
照用語すべてが終了するまで（２０３）類似語検索処理
を実行し、検索された類似語を類似語格納領域δに格納
する（２０２）。なお、類似語の格納と同時に、入力文
章すなわち文１の構成用語そのものも作業領域βに格納
しておく。Since the constituent terms of sentence 1 are "based on production instruction", "line configuration", and "change frequently", these three are set as reference terms (201). Then, the above-mentioned similar word search means c executes similar word search processing until all of these three reference terms are completed (203), and stores the searched similar words in the similar word storage area δ (202). At the same time when the similar words are stored, the input sentence, that is, the constituent terms of the sentence 1 are also stored in the work area β.

【００４６】この処理において、『頻繁に変更する』を
参照用語に設定する場合には、用語辞書ｆには『頻繁
に』というような修飾語も含んだ規準用語が登録されて
いることが必要であり、その場合には処理が早くて信頼
性が向上する。しかるに反面、そのような用語辞書を作
成するための労力が大きくなる。In this process, when "change frequently" is set as the reference term, it is necessary that the term dictionary f contains a reference term including a modifier such as "frequently". In that case, the processing is quick and the reliability is improved. On the other hand, however, the effort for creating such a term dictionary becomes large.

【００４７】『頻繁に変更する』を『頻繁に』と『変更
する』との二つに分解し、『変更する』が主要部分であ
ることを解読する技術は既に確立されている。従って、
この発明では、『生産指示』、『ライン構成』、『変更
する』が構成用語になっている場合と同様であると考え
ることにするので、『変更する』に対して図３に示した
ような類似語が類似語格納領域δに格納される。更に、
『生産指示』、『ライン構成』についての類似語も用語
辞書ｆにあるものが検索され、格納領域δに格納された
すべての類似語を図５の１０９と同様にして整理する
（２０４）。A technique for decomposing "changing frequently" into "frequently" and "changing" and deciphering that "changing" is the main part has already been established. Therefore,
In the present invention, since it is considered that “production instruction”, “line configuration”, and “change” are the same as the configuration terms, as shown in FIG. 3 for “change”. The similar words are stored in the similar word storage area δ. Furthermore,
Similar words for “production instruction” and “line configuration” are also searched for in the term dictionary f, and all similar words stored in the storage area δ are organized in the same manner as 109 in FIG. 5 (204).

【００４８】勿論、『頻繁に』等が重要な意味を持つ場
合もあるので、そのような処理を実現するのであれば、
用語辞書ｆに修飾語も含めて、しかも類似度も与え、構
成用語のそれぞれに修飾語も配慮した定形を処理すれば
良い。文章データベースｇは多種多様であるから、検索
対象となる領域を設定して（２０５）処理を限定する。
この例では、前記文３、文４等が格納されている文章デ
ータベースｇの領域が設定されたとする。Of course, "frequently" may have an important meaning, so if such processing is to be realized,
It is sufficient to include the modifiers in the term dictionary f, to give the degree of similarity, and process the fixed forms in which the modifiers are taken into consideration for each of the constituent terms. Since the text database g is diverse, a region to be searched is set (205) to limit the processing.
In this example, it is assumed that the area of the sentence database g in which the sentences 3 and 4 are stored is set.

【００４９】類似語格納領域δに格納された類似語を、
文章検索用の参照用語として順を追って設定し（２０
６）、２０５で設定された文章データベースの全領域に
対して（２０８）参照用語を含む類似文章を探索し、類
似文章格納領域εに格納する（２０７）。例えば、文３
は『ライン構成』が文１と一致し、文４は『決定する』
が文１の『変更する』に類似するから、これらが類似文
章格納領域εに格納される（２０７）。The similar words stored in the similar word storage area δ are
Set as reference terms for sentence search in order (20
6), a similar sentence including the reference term is searched for in the entire region of the sentence database set in 205 (208) and stored in the similar sentence storage area ε (207). For example, sentence 3
"Line structure" matches sentence 1 and sentence 4 "decides"
Are similar to "change" in sentence 1, so these are stored in the similar sentence storage area ε (207).

【００５０】更に、前記２０４と同様に、処理効率を考
慮して格納された類似文章を整理する（２０９）。この
ように、２０４や２０９で整理するのではなく、２０２
や２０７の格納時に整理する方が格納領域の節約には有
利であるが、この差異は重要ではないので、図５におけ
る１０９と同様の扱いとしている。このようにして、入
力文章の文１に類似する文３、文４が類似文章格納領域
εに格納されるので、文１と文３との類似度、同じく文
１と文４との類似度を次の図７の手順に従って計算し、
これらを類似文章格納領域εに格納しておく（２１
０）。Further, similar to 204, the stored similar sentences are sorted in consideration of the processing efficiency (209). In this way, instead of organizing by 204 or 209, 202
Although it is more advantageous to save the storage area when storing items 207 and 207, this difference is not important, and thus the same treatment as 109 in FIG. 5 is performed. In this way, since the sentences 3 and 4 similar to the sentence 1 of the input sentence are stored in the similar sentence storage area ε, the similarity between the sentences 1 and 3 and the similarity between the sentences 1 and 4 are similar. Is calculated according to the procedure shown in FIG.
These are stored in the similar sentence storage area ε (21
0).

【００５１】図７は、入力文章と検索された類似文章と
の類似度を計算する処理手順のフローチャートであり、
請求項４記載の発明の実施例に相当する。この処理は、
類似文章検索手段ｄ内の文章類似度演算手段ｅにより実
行される。FIG. 7 is a flowchart of a processing procedure for calculating the similarity between the input sentence and the retrieved similar sentence,
This corresponds to the embodiment of the invention described in claim 4. This process
It is executed by the sentence similarity calculation unit e in the similar sentence search unit d.

【００５２】類似文章を検索するべき入力文章の構成用
語を入力文章構成用語とし、検索された類似文章の構成
用語を類似文章構成用語として、入力文章構成用語を規
準にして類似性を判別し、設定されている用語の類似度
に基づいて入力文章と類似文章との間の類似度を求め
る。先に述べた具体例をここでの手順と対応させ整理す
ると、表２のようになる。The constituent terms of the input sentence for which similar sentences are to be searched are set as the input sentence constituent terms, the constituent terms of the searched similar sentence are set as the similar sentence constituent terms, and the similarity is determined based on the input sentence constituent terms. The similarity between the input sentence and the similar sentence is obtained based on the set similarity of the terms. Table 2 is a summary of the specific examples described above in association with the procedure here.

【００５３】[0053]

【表２】 [Table 2]

【００５４】入力文章、用語辞書及び文章データベース
が表２のような状態である場合を具体例として、図７の
処理手順を以下に説明する。まず、入力文章が不定形の
場合（３０１）、表２のような構成用語に分解すること
は意味がないので、同義語の個数などを使って文章相互
の類似度を算出する従来方法に従う（３１１）。この場
合、前述した文２も入力文章（文１）に類似していると
判定できるが、用語の類似度と対応させることはできな
い。The processing procedure of FIG. 7 will be described below by taking the case where the input sentence, the term dictionary, and the sentence database are as shown in Table 2 as a specific example. First, when the input sentence is indefinite (301), it is meaningless to decompose it into the constituent terms as shown in Table 2. Therefore, the conventional method of calculating the similarity between sentences using the number of synonyms is used ( 311). In this case, the sentence 2 described above can be determined to be similar to the input sentence (sentence 1), but it cannot be associated with the degree of similarity of terms.

【００５５】入力文章が定形である場合（３０１）、入
力文章構成用語の中の処理対象要素を構成要素としてそ
の対象がなくなるまで（３０６）設定し（３０２）、類
似文章構成用語の中に処理対象要素に対応する類似語が
あるかどうかを判定する（３０３）。そして、類似語が
ある場合には、文３については入力部及び出力部の類似
度が１．０、文４については出力部と処理部の類似度が
それぞれ０．８５、０．７２であると設定する（３０
５）。When the input sentence is a fixed form (301), the process target element in the input sentence constituent term is set as a constituent element (306) until there is no target (302), and the similar sentence constituent term is processed. It is determined whether there is a similar word corresponding to the target element (303). When there is a similar word, the similarity between the input unit and the output unit for sentence 3 is 1.0, and the similarity between the output unit and the processing unit for sentence 4 is 0.85 and 0.72, respectively. And set (30
5).

【００５６】類似語のない構成要素に対してこの処理を
実行する際に、文４の『生産指示に基づき』という入力
部が文１の構成要素；入力部に類似していないことは実
質的には意味がない。すなわち、文１では、『変更す
る』の処理対象を厳密に考えて『ライン構成』を入力部
及び出力部にしているが、文４では同じものを構成要素
としないこととして文１の補助部の構成用語を入力部に
割り当てただけである。このようにすると、前述の文５
などのように、更に別の補助部の位置付けが明瞭になる
ので、文４のような例は通常的であると考えられる。When this processing is executed for a component having no similar word, it is substantially the case that the input part "based on production instruction" of sentence 4 is not similar to the component of sentence 1; the input part. Has no meaning. That is, in Sentence 1, the "line configuration" is used as an input unit and an output unit in consideration of the "change" processing target, but in Sentence 4 it is assumed that the same thing is not a constituent element, and the auxiliary unit of Sentence 1 I just assigned the constituent terms in the input section. If you do this, sentence 5
It is considered that the example of sentence 4 is normal because the positioning of another auxiliary part becomes clearer, such as.

【００５７】以上の考察から、文１のような構成用語の
分解では、実は入力部は構成要素として記述されていな
くてもよいと判断される。そうすると、文１の入力部に
対応する用語がなくても十分に類似している可能性があ
ることになるので、このような文章では実は入力部は厳
密に定義しなくてもよいことになる。実際に、もとの文
１では抜けているのに厳密に分析して追加しただけなの
だから、この用語に対応する類似語はなくても類似語を
与えることとするのが３０４の“無対応構成類似度”で
ある。From the above consideration, in the decomposition of the constituent terms such as the sentence 1, it is judged that the input part does not actually have to be described as a constituent element. Then, it is possible that the input part of sentence 1 is sufficiently similar even if there is no term corresponding to the input part. Therefore, in such a sentence, the input part does not actually need to be strictly defined. . In fact, although the original sentence 1 is missing, it was only analyzed strictly and added, so it is decided to give a similar word even if there is no similar word corresponding to this term. “Structure similarity”.

【００５８】この無対応構成類似度を入力文章の構成要
素の類似度に設定して、入力文章と類似文章との類似度
の計算に用いることが請求項５記載の発明の実施例に相
当する。無対応構成類似度としてどのような値を設定す
るかは、文章に依存して決定すればよい。文４の構成要
素から簡明に考えると、補助部に無対応構成類似度を設
定するのは自然である。更に、補助部の構成用語が入力
部や出力部、処理部の構成用語と類似している場合に
は、文章全体としては類似していると判断することがで
きる。This non-corresponding structural similarity is set to the similarity of the constituent elements of the input sentence and is used for calculating the similarity between the input sentence and the similar sentence, which corresponds to the embodiment of the present invention. . What value to set as the non-corresponding configuration similarity may be determined depending on the text. Considering the constituent elements of sentence 4 simply, it is natural to set the non-corresponding structural similarity to the auxiliary unit. Furthermore, if the constituent terms of the auxiliary section are similar to those of the input section, the output section, and the processing section, it can be determined that the entire sentence is similar.

【００５９】構成用語の類似度から文章どおしの類似度
をどのように決めるかは、類似度の使い方に依存する。
しかるに、コンピュータを用いて処理することを考える
ので、類似度計算方法として、最大値型か、最小値型
か、積型かを判別するだけにしており（３０７）、これ
らが請求項６記載の発明の実施例に相当する。How to determine the similarity of each sentence from the similarity of the constituent terms depends on the usage of the similarity.
However, since it is considered that the processing is performed by using a computer, only the maximum value type, the minimum value type, or the product type is discriminated as the similarity calculation method (307). These are described in claim 6. It corresponds to an embodiment of the invention.

【００６０】３０８において、最大値で類似度を計算す
る（各構成要素の類似度のうち最大値をその文章全体の
類似度とする）と、文３の類似度は１．０となり、文４
（無対応構成類似度が設定されていないとする）の類似
度は０．８５となる。構成要素の類似度の最大値を文章
全体の類似度とすることは、最も類似している構成要素
を重視することであり、構成要素のうち一つでも一致し
ていたり類似度が高い場合には、文章全体も一致または
類似度が高いと判断することに相当している。In 308, when the similarity is calculated with the maximum value (the maximum value among the similarities of the respective constituent elements is taken as the similarity of the entire sentence), the similarity of sentence 3 becomes 1.0 and the sentence 4 becomes
The similarity (assuming that the non-corresponding configuration similarity is not set) is 0.85. Making the maximum value of the similarity of the constituent elements the similarity of the entire sentence means placing importance on the most similar constituent elements, and when even one of the constituent elements is the same or the similarity is high. Is equivalent to determining that the entire sentence has a high degree of agreement or similarity.

【００６１】３０９において、最小値で類似度を計算す
る（各構成要素の類似度のうち最小値をその文章全体の
類似度とする）と、文３も文４（無対応構成類似度が設
定されていないとする）も類似度は０．０となってしま
うから、これを防ぐためにも前述した無対応構成類似度
が有効となる。構成要素の類似度の最小値を文章全体の
類似度とすることは、最も相違している構成要素を重視
することであり、構成要素のうち一つでも相違していた
り類似度が低い場合には、文章全体も相違または類似度
が低いと判断することに相当している。In 309, when the similarity is calculated with the minimum value (the minimum value among the similarities of the respective constituents is taken as the similarity of the entire sentence), the sentence 3 and the sentence 4 (the non-corresponding configuration similarity is set). However, since the similarity becomes 0.0, the above-described non-corresponding configuration similarity is effective for preventing this. Making the minimum value of the similarity of the constituent elements the similarity of the entire sentence means placing importance on the constituent elements that are most different, and when even one of the constituent elements is different or the similarity is low. Is equivalent to determining that the entire sentence is also different or has low similarity.

【００６２】このように、最大値型は類似度を高く評価
する結果、類似文章が多数になり、最小値型は類似度を
低く評価する結果、類似文章が少数になる。そして、こ
れら両方共、構成用語の中に類似個数や非類似個数が反
映されにくいという欠点があるので、より微細な差異を
類似度に反映させるために、積型（各構成要素の類似度
の積をその文章全体の類似度とする）も考えている。As described above, the maximum value type evaluates high similarity, resulting in a large number of similar sentences, and the minimum value type evaluates low similarity, resulting in a small number of similar sentences. Since both of these have the drawback that the similar number and the dissimilar number are difficult to be reflected in the constituent terms, in order to reflect a finer difference in the similarity, the product type (similarity of the similarity of each component is The product is taken as the similarity of the whole sentence).

【００６３】３１０において、類似度の積を計算する
と、文３も文４（無対応構成類似度が設定されていない
とする）も類似度は０．０となってしまうが、無対応構
成類似度が設定されていて１．０とか０．０でなければ
その個数で類似度が変化する。そして、文４に『生産指
示に基づき加工機械の割付けを変更する』という類似し
た文章があれば、より高い類似度を与えることができ
る。When the product of the similarities is calculated in 310, the similarity of both sentence 3 and sentence 4 (assuming that the non-corresponding composition similarity is not set) becomes 0.0, but the non-corresponding composition similarity. If the degree is set and is not 1.0 or 0.0, the degree of similarity changes depending on the number. Then, if the sentence 4 has a similar sentence of “changing the allocation of the processing machine based on the production instruction”, a higher degree of similarity can be given.

【００６４】文章を構成用語に分解して用語の類似度と
文章の類似度を対応させる本発明では、文章の類似度を
きめ細かく判別することが大きな意味を持つ。従って、
この積型の類似度計算が重要な役割を演じ、後述する重
み付けとも関連して非常に有効となる。In the present invention in which a sentence is decomposed into constituent terms and the similarity between terms and the similarity between sentences correspond to each other, it is significant to finely determine the similarity between sentences. Therefore,
This product-type similarity calculation plays an important role, and is very effective in connection with the weighting described later.

【００６５】この積型の類似度計算式は、次のようにな
る。文章の類似度＝入力部の類似度×補助部の類似度×出力
部の類似度×処理部の類似度いま、文章をｎ個の構成要素に分割するとして、上記類
似度計算式を一般的に記述すると、数式１のようにな
る。The product-type similarity calculation formula is as follows. Sentence similarity = similarity of input section × similarity of auxiliary section × similarity of output section × similarity of processing section Now, assuming that a sentence is divided into n constituent elements, the above similarity calculation formula is generally used. When it is described in, the equation 1 is obtained.

【００６６】[0066]

【数１】 [Equation 1]

【００６７】この数式１のような計算式とするのは、文
章の一致は構成する用語のすべてが一致した場合であ
り、この文章の一致を完全な類似と考えることにすれ
ば、構成要素の一致判別の論理積に相当することに基づ
いている。このため、類似性に対しても論理積（ＡＮ
Ｄ）で考えるようにしたのであり、そこから積の計算が
帰結するものである。The calculation formula such as the formula 1 is used when the sentences match when all of the constituent terms match, and if the matching of the sentences is considered to be completely similar, the constituent elements It is based on the fact that it corresponds to the logical product of the match discrimination. Therefore, the logical product (AN
I tried to think about it in D), which results in the calculation of the product.

【００６８】図８は、類似候補の切捨て処理の手順を示
すものであり、請求項７及び８記載の発明の実施例に相
当する。具体的には、類似検索の際に予め与えられた規
準類似度以上の類似度を持つ類似語の候補（類似候補）
に絞って処理の負荷を減らすことを目的としているた
め、図５の１０４，１０７（従って図６における２０
２）、図６の２０７，２１０により類似語を格納する際
に利用される。この処理は、類似語検索手段ｃや類似文
章検索手段ｄにより実行される。すなわち、上記各ステ
ップ１０４，１０７，２０２，２０７，２１０において
検索または算出された類似度が予め設定された規準類似
度以上であれば（４０１）、類似候補を類似語として処
理する（４０２）が、規準類似度よりも小さければ（４
０１）、類似語ではないとして処理する（４０３）。FIG. 8 shows a procedure of a process of discarding similar candidates, and corresponds to the embodiment of the invention described in claims 7 and 8. Specifically, similar word candidates (similarity candidates) having a degree of similarity equal to or higher than the criterion similarity degree given in advance during similarity search.
Since the purpose is to reduce the processing load by narrowing down to 104, 107 in FIG.
2), used when storing similar words by 207 and 210 in FIG. This processing is executed by the similar word search means c and the similar sentence search means d. That is, if the degree of similarity searched or calculated in each of the steps 104, 107, 202, 207, and 210 is equal to or greater than the preset reference degree of similarity (401), the similar candidate is processed as a similar word (402). , If it is smaller than the standard similarity (4
01), it is processed as not a similar word (403).

【００６９】図９は、文章構成要素の重要性を考慮する
ための重み付け処理の手順を示している。この処理は請
求項９記載の発明の実施例に相当しており、文章類似度
演算手段ｅにより実行される。具体的には、図１に示し
た文章構成要素重み表ｋの中に構成要素ごとに格納され
ている予め設定された重みを、検索した類似語の類似度
に掛け合わせることによって重要性を考慮するものであ
る。FIG. 9 shows the procedure of the weighting process for considering the importance of the sentence constituents. This processing corresponds to the embodiment of the invention described in claim 9, and is executed by the sentence similarity calculation means e. Specifically, the importance is considered by multiplying the similarity of the retrieved similar word by a preset weight stored for each component in the sentence component weight table k shown in FIG. To do.

【００７０】つまり、図７の３０８，３０９，３１０に
より類似度を計算する際し、図９の５０１のような判別
を行ない、重みを考慮しないのであれば検索結果の類似
度をそのまま使用し（５０３）、重みを考慮するのであ
れば数式２により類似度を計算し直すこととする（５０
２）。なお、数式２におけるｘは、重み表ｋの中に格納
されている“重み−ｉ”から数式３により求める。That is, when the similarity is calculated by 308, 309, and 310 in FIG. 7, a determination like 501 in FIG. 9 is made, and if the weight is not considered, the similarity of the search result is used as it is ( 503), if the weight is taken into consideration, the degree of similarity is recalculated according to Equation 2 (50
2). Note that x in Expression 2 is obtained by Expression 3 from "weight-i" stored in the weight table k.

【００７１】[0071]

【数２】 [Equation 2]

【００７２】[0072]

【数３】 (Equation 3)

【００７３】このようにして計算される重み付き類似度
は、用語に対して設定された類似度とは数値が異なるの
であるが、次の性質を持つから、類似度に対する重みと
して有効である。一致の類似度（１．０）は不変である。重みが小さい構成要素では一致に近付ける（１．０に
近付く）。重みを大きくしていけば、用語の類似度に近付く。もともと、類似度という数値自体が相対的なものである
から、全体として統一的に適用するならばこのような重
みを使うことが可能である。The weighted similarity calculated in this way has a numerical value different from the similarity set for the term, but since it has the following characteristics, it is effective as a weight for the similarity. The similarity of matching (1.0) is unchanged. A component with a small weight approaches a match (close to 1.0). The greater the weight, the closer to the term similarity. Originally, since the numerical value of the degree of similarity itself is relative, it is possible to use such weights if it is applied uniformly as a whole.

【００７４】一般的に、ソフトウェアや自動化機械の機
能を記述するＩＰＯ形式の文章では、処理部（Ｐ）が重
要であるから、ここを重視して類似語を検索できるよう
にするのが目的であり、上記数式２により処理部の重み
を他より大きくすればこの目的を達成することができ
る。In general, the processing section (P) is important in the text of the IPO format that describes the functions of software and automated machines, so the purpose is to allow similar words to be searched for. Therefore, if the weight of the processing unit is set to be larger than the others according to the above-mentioned formula 2, this object can be achieved.

【００７５】すなわち、処理部の重みを大きくして他の
部分の重みを小さくすれば、処理部の類似度は用語の類
似度に近付き、他の要素の重みは処理部が大きい分だけ
小さくなるから、重み付き類似度は１．０に近くなるの
で文章の類似度には反映しなくなる。従って、重要でな
い構成要素には重み０．０を設定すれば、用語の類似度
に関係ないように処理することができる。That is, if the weight of the processing unit is increased and the weights of other portions are decreased, the similarity of the processing unit becomes closer to the similarity of the term, and the weights of other elements are reduced by the larger the processing unit. Therefore, since the weighted similarity is close to 1.0, it is not reflected in the similarity of sentences. Therefore, if a weight of 0.0 is set for an insignificant component, processing can be performed regardless of the degree of similarity between terms.

【００７６】以上のような重みは数式３を使わなくても
定義することが可能である。例えば、各構成要素にｘそ
のものを重みとして設定するなど多種多様に考えられる
ため、本発明は数式１及び数式２に特徴を有するもので
ある。なお、構成要素のもとの類似度と数式２による重
み付き類似度との関係の一例を示すと、図１１のように
なる。The above weights can be defined without using Equation 3. For example, the present invention is characterized by Formula 1 and Formula 2 because there are various possibilities such as setting x itself as a weight for each component. Note that an example of the relationship between the original similarity of the constituent elements and the weighted similarity according to Expression 2 is shown in FIG. 11.

【００７７】図１０は、図１の類似文章検索装置ｊによ
る類似文章検索の一貫的な処理を示しており、請求項１
０記載の発明の実施例に相当している。個別の処理につ
いては既に説明したため、ここでは全体としての処理例
を前記具体例を用いて概説する。FIG. 10 shows the consistent processing of the similar sentence search by the similar sentence search device j of FIG.
0 corresponds to an embodiment of the invention. Since the individual processing has already been described, an example of the processing as a whole will be outlined here using the specific example.

【００７８】まず、６０１では文１が入力される。同時
に、入力された文章の形式が定形か不定形かに応じ、本
発明により処理するか否かを区別する。６０２では、文
１が既に述べたような構成用語に分解される。６０３に
おいて、処理の高速化のために切捨て処理用の規準類似
度が入力されるなら、６０５，６０９の処理ではその値
より小さい類似度のものは類似候補としての取扱いを止
めてしまう。一般には、分野ごとに特殊な用語が使われ
たり用語の特殊な使い方がされたりするので、６０４で
使用する用語辞書を指定する。First, in 601 sentence 1 is input. At the same time, whether or not to process according to the present invention is distinguished depending on whether the format of the input sentence is fixed or irregular. At 602, sentence 1 is decomposed into constituent terms as previously described. In 603, if the standard similarity for truncation processing is input in order to speed up the processing, in the processing of 605 and 609, the processing with a similarity smaller than that value is stopped as a similarity candidate. Generally, since a special term is used or a special use of the term is made for each field, the term dictionary used in 604 is designated.

【００７９】６０５以後は図６の処理と同様であるが、
実際の処理としては、６０６で検索指示・結果表示手段
ｈを用いて類似語を表示し、６０７で必要に応じて重み
を入力し、６０８で文章データベースを指定した後に類
似文章の検索及び格納処理を実行し（６０９）、検索結
果を出力すると共に必要に応じて印刷・表示する（６１
０）。そして、検索が満足できるものであれば処理を終
了する（６１１）。After 605, the process is similar to that of FIG.
As actual processing, similar words are displayed using the search instruction / result display means h in 606, weights are input as necessary in 607, a text database is specified in 608, and then similar text search and storage processing is performed. Is executed (609), the search result is output, and printed / displayed as necessary (61).
0). If the search is satisfactory, the process ends (611).

【００８０】上述したような標準的な処理では文５を検
索することはできないが、『生産計画』を『最適化す
る』ことと『ライン構成』を『変更すること』とは本質
的に非常に類似しており、その際に入力部や補助部とし
て使う情報が文１と文５で一致または類似しているか相
違しているかは不明である。それにも関わらず、用語辞
書からは直接的には『最適化する』が検索されないの
で、文５が類似候補にならなかった。Although the sentence 5 cannot be retrieved by the standard processing as described above, "optimizing" the "production plan" and "changing" the "line configuration" are essentially extremely difficult. It is unknown whether the information used as the input unit or the auxiliary unit at that time is the same or similar or different between the sentences 1 and 5. Nevertheless, sentence 5 was not a similar candidate because "optimize" was not searched directly from the term dictionary.

【００８１】これを改善するために、検索結果の文章が
少ないとか期待していた文章が検索されなかった場合に
は、６１１で検索に不満と判定し、更に検索を続けるた
めに６１２の処理を実行する。すなわち、構成用語の格
納領域の内容を入力文章の構成用語から構成用語の類似
語に置換して、６０５からの処理を繰り返す。In order to improve this, when there are few sentences in the retrieval result or when the expected sentence is not retrieved, it is determined in 611 that the retrieval is unsatisfactory, and the processing in 612 is performed to continue the retrieval. Run. That is, the contents of the storage area for the constituent terms are replaced with the constituent terms of the input sentence with similar words of the constituent terms, and the processing from 605 is repeated.

【００８２】この再処理が請求項１１記載の発明の実施
例に相当する。この再処理により、文１の処理部の構成
用語が『変更する』から『決定する』に置き換えられ、
図３に示した用語辞書の具体例の規準用語『最適化す
る』の類似語に『決定する』が登録されているため、
『最適化する』が検索され、更に文５が検索されて格納
されることになる。This reprocessing corresponds to the embodiment of the invention described in claim 11. By this reprocessing, the constituent term of the processing part of sentence 1 is replaced from "change" to "determine",
Since “determine” is registered as a similar word of the reference term “optimize” in the specific example of the term dictionary shown in FIG. 3,
“Optimize” is searched, and sentence 5 is searched and stored.

【００８３】[0083]

【発明の効果】以上のように本発明は、用語の類似度と
文章の類似度とを関連付けて最終的に類似文章を検索す
るものであり、例えばソフトウェアの機能を説明する文
章を処理する場合に極めて有用である。すなわち、文章
の一致や類似の判別によってソフトウェアの機能の一致
や類似の判別を可能にするためには、文章の構成用語な
どの部分的な一致と相違とを明確にすることがソフトウ
ェアの仕様の部分的な一致や相違の判別を実現すること
になる。この点、本発明によれば、類似ソフトウェアの
検索やその機能の一致部分及び相違部分の検索が可能に
なる。As described above, the present invention finally searches for similar sentences by associating the degree of similarity between terms and the degree of similarity between sentences. For example, in the case of processing a sentence explaining the function of software. Extremely useful for In other words, in order to enable matching of software functions and determination of similarities by the matching and similarity of sentences, it is necessary to clarify the partial matching and difference of the constituent terms of the sentences. It is possible to realize partial coincidence and difference discrimination. In this respect, according to the present invention, it is possible to search for similar software and a matching portion and a different portion of the function thereof.

【００８４】また、本発明によれば、必要に応じて類似
候補の切捨て処理や重み付け処理等を施すことにより、
類似検索対象を絞り、処理速度を向上させることが可能
である。更に、用語の分類が不要になるため、検索者が
分類規準を知らなくても有効かつ適確な検索を簡単に行
うことができる。Further, according to the present invention, by truncating or weighting similar candidates as necessary,
It is possible to narrow down the similar search target and improve the processing speed. Furthermore, since classification of terms is not required, a searcher can easily perform an effective and accurate search without knowing the classification criteria.

[Brief description of drawings]

【図１】本発明の実施例が適用される類似文章検索装置
の全体構成図である。FIG. 1 is an overall configuration diagram of a similar sentence search device to which an embodiment of the present invention is applied.

【図２】用語辞書の具体例を示す図である。FIG. 2 is a diagram showing a specific example of a term dictionary.

【図３】用語辞書の具体例を示す図である。FIG. 3 is a diagram showing a specific example of a term dictionary.

【図４】類似語相互間の類似度を説明する図である。FIG. 4 is a diagram illustrating the degree of similarity between similar words.

【図５】用語辞書を使用して類似語を検索し格納する処
理手順を示すフローチャートである。FIG. 5 is a flowchart showing a processing procedure for searching for and storing a similar word using a term dictionary.

【図６】文章データベースの中から類似文章を検索して
格納する処理手順を示すフローチャートである。FIG. 6 is a flowchart showing a processing procedure for searching for similar sentences in a sentence database and storing the same.

【図７】入力文章と検索された類似文章との類似度を計
算する処理手順を示すフローチャートである。FIG. 7 is a flowchart showing a processing procedure for calculating the similarity between an input sentence and a retrieved similar sentence.

【図８】類似候補を切り捨てる処理手順を示すフローチ
ャートである。FIG. 8 is a flowchart showing a processing procedure for discarding similar candidates.

【図９】文章構成要素の重み付けの処理手順を示すフロ
ーチャートである。FIG. 9 is a flowchart showing a processing procedure for weighting a text component.

【図１０】類似文章検索の一貫した処理手順を示すフロ
ーチャートである。FIG. 10 is a flowchart showing a consistent processing procedure for similar sentence search.

【図１１】構成要素のもとの類似度と重み付き類似度と
の関係の一例を示す図である。FIG. 11 is a diagram showing an example of a relationship between original similarity of components and weighted similarity.

[Explanation of symbols]

α，β，γ，δ，ε 作業領域ないし格納領域ａ文章入力手段ｂ用語抽出手段ｃ類似語検索手段ｄ類似文章検索手段ｅ文章類似度演算手段ｆ用語辞書ｇ文章データベースｈ検索指示・結果表示手段ｉ結果印刷手段ｊ類似文章検索装置ｋ文章構成要素重み表 α, β, γ, δ, ε Work area or storage area a Text input means b Term extraction means c Similar word search means d Similar text search means e Text similarity calculation means f Term dictionary g Text database h Search instruction / result display Means i result printing means j similar sentence search device k sentence constituent element weight table

───────────────────────────────────────────────────── フロントページの続き (72)発明者池松孝司神奈川県川崎市川崎区田辺新田１番１号富士電機株式会社内 ─────────────────────────────────────────────────── ─── Continued Front Page (72) Inventor Takashi Ikematsu 1-1, Tanabe Nitta, Kawasaki-ku, Kawasaki-shi, Kanagawa Fuji Electric Co., Ltd.

Claims

[Claims]

1. An input sentence, which is a fixed sentence, is decomposed into a plurality of constituent terms, these constituent terms are set as reference terms, and if there is a reference term that matches the reference term by referring to a term dictionary, it is similar. A word is stored in the similar word storage area together with the similarity, and if there is no reference term that matches the reference term, the reference term in which the similar word matches is stored in the similar word storage area together with the similarity of the similar word, and the similar word storage A similar word search method characterized by organizing similar words in a region to obtain similar words to a reference term.

2. The search method according to claim 1, wherein in the term dictionary, in addition to general contents including explanations of standard terms and parts of speech, similar terms and numerical values corresponding to the standard terms are used. A method of searching for a similar word, characterized in that the similarity is stored.

3. The search method according to claim 1 or 2 is executed to search for similar words corresponding to a plurality of constituent terms of an input sentence, and these similar words are set as sentence search reference terms. Storing sentences having a term similar to the sentence search reference term in a sentence database in which sentences are stored and a search area is set is stored as a similar sentence in the similar sentence storage area, and the similar sentences in the similar sentence storage area are organized. A similar sentence search method, characterized in that the similarity between the input sentence and the similar sentence is then obtained and stored in the similar sentence storage area.

4. The search method according to claim 3, wherein when a similar word corresponding to a constituent term of the input sentence is included in the constituent terms of the similar sentence, the similarity of the constituent terms of the similar sentence is input. A method for retrieving similar sentences, characterized in that the similarity between the input sentence and the similar sentence is obtained based on the similarity of the constituent terms of the above.

5. The retrieval method according to claim 3, wherein when the similar word corresponding to the constituent term of the input sentence is not included in the constituent terms of the similar sentence, the uncorresponding constituent similarity is set to the constituent term of the input sentence. A similar sentence search method characterized in that the similarity is set and the similarity between the input sentence and the similar sentence is obtained based on the similarity of the constituent terms.

6. The search method according to claim 4 or 5, wherein, when obtaining the similarity between the input sentence and the similar sentence based on the similarity of the constituent terms, the maximum among the similarities of the plurality of constituent terms of the similar sentence. A similar sentence search method, wherein the similarity between an input sentence and a similar sentence is obtained based on a value, a minimum value, or a product of similarities.

7. The search method according to claim 1 or 2, wherein, of the searched similar words, only the words whose similarity is equal to or higher than the standard similarity are processed as the similar words. Method.

8. The search method according to claim 3, wherein, of the searched similar words, only terms whose similarity is equal to or higher than the standard similarity are processed as similar words. How to search for similar sentences.

9. The search method according to claim 4, wherein the degree of similarity of the constituent terms is weighted according to the importance of the constituent terms of the similar sentence that is a fixed sentence, and the weighting is performed. A similar sentence search method, characterized in that a similarity between an input sentence and a similar sentence is obtained based on the calculated similarity.

10. An input sentence which is a fixed sentence is decomposed into a plurality of constituent terms and these constituent terms are set as reference terms,
By referring to the term dictionary, the similar terms of the reference terms that match the reference term are stored in the similar term storage area together with the degree of similarity, and the similar terms in the similar term storage area are organized to obtain the similar terms of the reference term, These similar words are set as text search reference terms, and a text database in which similar texts are stored and a search area is set, and a text having a term similar to the text search reference term is stored as a similar text as a similar text. Storing in a region, rearranging the similar sentences in this similar sentence storage region, obtaining the similarity between the input sentence and the similar sentence, storing it in the similar sentence storage region, and outputting the similar sentence corresponding to the input sentence. A search method for similar sentences characterized by.

11. When it is determined that the similar sentence output by the search method according to claim 10 is insufficient for the search, a reference term having a similar word to the reference term is searched from the term dictionary, and the reference term is searched. The similar words in the similar word storage area are stored together with the similarities of the similar words, and the similar words in the similar word storage area are organized to obtain similar words of the reference term. A sentence having a term similar to the sentence search reference term is stored as a similar sentence in a similar sentence storage area for a sentence database in which a sentence is stored and a search area is set, and the similar sentence in this similar sentence storage area is stored. A method for searching for similar sentences, characterized in that the similarities between the input sentence and similar sentences are calculated, stored in a similar sentence storage area, and the similar sentence corresponding to the input sentence is output.