JP2001357065A - Method and device for retrieving similar sentence and recording medium having similar sentence retrieval program recorded thereon - Google Patents

Method and device for retrieving similar sentence and recording medium having similar sentence retrieval program recorded thereon

Info

Publication number
JP2001357065A
JP2001357065A JP2000178367A JP2000178367A JP2001357065A JP 2001357065 A JP2001357065 A JP 2001357065A JP 2000178367 A JP2000178367 A JP 2000178367A JP 2000178367 A JP2000178367 A JP 2000178367A JP 2001357065 A JP2001357065 A JP 2001357065A
Authority
JP
Japan
Prior art keywords
sentence
similar
sentences
data
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
JP2000178367A
Other languages
Japanese (ja)
Inventor
Takayuki Adachi
貴行 足立
Kura Furuse
蔵 古瀬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nippon Telegraph and Telephone Corp
Original Assignee
Nippon Telegraph and Telephone Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nippon Telegraph and Telephone Corp filed Critical Nippon Telegraph and Telephone Corp
Priority to JP2000178367A priority Critical patent/JP2001357065A/en
Publication of JP2001357065A publication Critical patent/JP2001357065A/en
Pending legal-status Critical Current

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

PROBLEM TO BE SOLVED: To provide a method and a device for retrieving similar sentences, with which a grammatically or semantically similar sentence can be retrieved, and to provide a recording medium having a similar sentence retrieval program recorded thereon. SOLUTION: First, the information on a part and kind for which grammatically or semantically replacement, deletion and addition can be performed is imparted concerning the similar candidate sentences of an example sentence collection and information on a part and kind which can be replaced or which can be matched with the added part of a similar candidate sentence is imparted similarly to an input sentence as well. Then, in the case of calculating similarity between the input sentence and each of similar candidate sentences, a processing is performed while considering the coincidence of the replaced part of the same kind, the deletion of an unwanted part or addition of a lacking to the different part in each of sentence, and the similar candidate sentence having highest similarity is extracted as a similar sentence together with similarity.

Description

【発明の詳細な説明】DETAILED DESCRIPTION OF THE INVENTION

【0001】[0001]

【発明の属する技術分野】本発明は、自然言語の入力文
に対し類似文を検索する類似文検索方法及び装置並びに
類似文検索プログラムを記録した記録媒体に関する。な
お、検索した類似文に対応する訳文が存在する場合、そ
の訳文を抽出する。また、その類似文と訳文を利用して
入力文の訳文を生成する実例型翻訳方法およびその装置
の一部に適用できる。
BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a similar sentence retrieval method and apparatus for retrieving a similar sentence from an input sentence of a natural language, and a recording medium storing a similar sentence retrieval program. If there is a translated sentence corresponding to the searched similar sentence, the translated sentence is extracted. In addition, the present invention can be applied to a part of an example type translation method and an apparatus for generating a translation of an input sentence using the similar sentence and the translation.

【0002】[0002]

【従来の技術】従来の類似文検索方法として、従来方法
1「Emmanuel Planas,et al.,
“Formalizing Translation
Memories”,MTSummit VII,Se
ptember,1999」内に記載されている方法が
ある。この方法では形態素解析で区切った単位を利用
し、語の表記だけでなく標準形さらに品詞まで一致対象
を拡張して処理を行うもので、入力文中の表記で一致し
た語の割合、入力文中の標準形で一致した語の割合、入
力文中の品詞で一致した語の割合、候補文中で入力文中
の語と共通な語の割合、入力文中で候補文中の語と共通
な語の割合について、上記列挙した順に2文間の類似度
を比較して、類似文検索を行っている。
2. Description of the Related Art As a conventional similar sentence retrieval method, a conventional method 1 "Emmanuel Planas, et al.,
“Formalizing Translation
Memories ", MTSummit VII, Se
ptember, 1999 ". In this method, units that are separated by morphological analysis are used, and processing is performed by expanding the matching target not only in word notation but also in standard form and part of speech, and the percentage of words matched by the notation in the input sentence, The percentage of words that match in standard form, the percentage of words that match in the part of speech in the input sentence, the percentage of words in the candidate sentence that are common to the words in the input sentence, and the percentage of words that are common to the words in the candidate sentence in the input sentence A similar sentence search is performed by comparing the similarity between two sentences in the listed order.

【0003】また、別の類似文検索方法として、従来方
法2「特開平6−290210号の自然言語の翻訳装
置」内に記載されている方法がある。この方法では入力
文および検索対象の文から構文的な表層パターンを生成
してそれらを比較し、パターンの類似度によって類似文
検索を行っている。
As another similar sentence retrieval method, there is a method described in a conventional method 2 "Translation device for natural language disclosed in Japanese Patent Laid-Open No. 6-290210". In this method, a syntactic surface pattern is generated from an input sentence and a sentence to be searched, and these patterns are compared, and a similar sentence search is performed based on the similarity of the pattern.

【0004】また、別の類似文検索方法として、従来方
法3「隅田英一郎,堤豊,“翻訳支援のための類似用例
の実用的検索法”,電子情報通信学会論文誌D−II,
Vol.J74−D−II,No.10,1991」に
記載されている方法がある。この方法では形態素解析し
た後、入力文に完全一致する文を検索する。一致しない
場合には入力文の品詞を一般化して、入力文と完全一致
する文を検索する。
As another similar sentence retrieval method, a conventional method 3 "Eiichiro Sumida, Yutaka Tsutsumi," A practical retrieval method of similar examples for translation support ", IEICE Transactions D-II,
Vol. J74-D-II, No. 10, 1991 ". In this method, after performing morphological analysis, a sentence that completely matches the input sentence is searched. If they do not match, the part of speech of the input sentence is generalized to search for a sentence that exactly matches the input sentence.

【0005】[0005]

【発明が解決しようとする課題】しかしながら、従来方
法1は、一致対象が形態素解析で区切った形態素単位で
あるため、ある文の複数の形態素からなる表現と別の文
の表現を一致させて類似度計算を行うことができない。
However, in the conventional method 1, since the matching target is a morpheme unit divided by morphological analysis, the expression composed of a plurality of morphemes of a certain sentence is matched with the expression of another sentence to obtain a similarity. The degree calculation cannot be performed.

【0006】従来方法2では、パターンに必ず動詞が現
れている必要があり、動詞が省略された文は扱えない。
In the conventional method 2, a verb must always appear in the pattern, and a sentence in which the verb is omitted cannot be handled.

【0007】従来方法3では、一般化するのが入力文の
みで、検索対象の文に関しては一般化が行われないの
で、検索の適用範囲は狭い。
In the conventional method 3, since only the input sentence is generalized and the sentence to be searched is not generalized, the applicable range of the search is narrow.

【0008】従来方法1、3では、入力文と対訳用例の
同じ自然言語の文で類似文検索を行い、類似文の訳文を
編集して翻訳を行う実例型翻訳の一部として利用する場
合に、入力文と比べて不足している語句を類似文に追加
し、その訳文にも対応する語句を追加して適切な訳文を
生成する場合を考慮した類似文検索を行っていない。
In the conventional methods 1 and 3, a similar sentence search is performed using a sentence in the same natural language as an input sentence and a bilingual example, and the translated sentence of the similar sentence is edited and used as a part of an example-type translation. In addition, a similar sentence search is not performed in consideration of a case where a missing phrase is added to the similar sentence compared to the input sentence and a corresponding phrase is added to the translated sentence to generate an appropriate translated sentence.

【0009】また、従来方法1〜3では、文法的に類似
した文の検索であり、意味的な類似文の検索は考慮され
ていない。
Further, in the conventional methods 1 to 3, the search for grammatically similar sentences is performed, and the search for semantically similar sentences is not considered.

【0010】本発明は上記の事情に鑑みてなされたもの
で、表記だけでは類似度が高くない場合でも、文法的ま
たは意味的に類似した文が検索でき、また、入力文に似
ていない文をあらかじめ削除することで、類似度計算の
時間を短縮できる類似文検索方法及び装置並びに類似文
検索プログラムを記録した記録媒体を提供することを目
的とする。
The present invention has been made in view of the above circumstances. Even when notation alone is not high in similarity, a sentence that is grammatically or semantically similar can be searched, and a sentence that does not resemble an input sentence can be searched. The object of the present invention is to provide a similar sentence search method and apparatus capable of shortening the time of similarity calculation by deleting in advance, and a recording medium recording a similar sentence search program.

【0011】[0011]

【課題を解決するための手段】上記目的を達成するため
に本発明は、例文集から入力文の類似文を検索する類似
文検索方法において、例文集の類似候補文について事前
に文法的もしくは意味的に置換、削除、追加が可能な箇
所に各情報を付与し、入力文にも同様に置換が可能な箇
所や類似候補文の追加可能箇所との一致が可能な箇所に
各情報を付与した上で、入力文と類似候補文との類似度
計算の際に、各文の差分箇所に対しての同種の置換箇所
の一致、不要箇所の削除や不足箇所の追加を考慮した処
理を行い、最も類似度の高い類似候補文を類似文として
類似度とともに抽出することを特徴とする。
In order to achieve the above object, the present invention provides a similar sentence retrieval method for retrieving a similar sentence of an input sentence from an example sentence collection. Each piece of information is added to places where replacement, deletion, and addition are possible, and each piece of information is added to input sentences in places where replacement is possible and where similar candidate sentences can be added. Above, when calculating the similarity between the input sentence and the similar candidate sentence, perform processing in consideration of the matching of the same type of replacement part with the difference part of each sentence, deletion of unnecessary parts and addition of missing parts, A similar candidate sentence having the highest similarity is extracted as a similar sentence together with the similarity.

【0012】また本発明は、前記類似文検索方法におい
て、類似度の最も高い類似候補文に加え、類似度が高い
方から所定の数の類似候補文を類似文として出力するこ
とを特徴とする。
Further, the present invention is characterized in that, in the similar sentence search method, a predetermined number of similar candidate sentences having higher similarities are output as similar sentences in addition to the similar candidate sentences having the highest similarity. .

【0013】また本発明は、前記類似文検索方法におい
て、置換、削除、追加の情報を付与するための基となる
データとして、汎用的に利用できるものと、文書の分野
に依存するものに分けて各データを作成し、文書分野に
依存するデータの自動作成において、既存の汎用的もし
くは分野依存のデータを用いて情報を例文に付与し、置
換可能かつ削除可能な箇所を削った例文集から類似して
いる文を集め、文中の置換情報が付与されていない箇所
で、その前後の箇所の表記や置換の種類が一致してお
り、該当箇所の情報が同じで表記の異なるものの集合を
新たな置換対象のデータとして作成し、同時に、新たな
置換対象のデータと前後の表記などを考慮して、新たな
削除対象のデータとして作成することを特徴とする。
Further, according to the present invention, in the similar sentence search method, data that can be used for general purposes and data that depends on the field of documents are classified as data serving as bases for adding replacement, deletion, and additional information. In the automatic creation of data that depends on the document field, information is added to the example sentences using existing general-purpose or field-dependent data, and from the example sentence collection where parts that can be replaced and deleted are deleted. A similar sentence is collected, and the notation and replacement type of the preceding and succeeding parts in the part where the replacement information is not added in the sentence match, and a new set of the same information and the different notation is added. It is characterized in that it is created as new data to be replaced, and at the same time it is created as new data to be deleted in consideration of the new data to be replaced and the notation before and after.

【0014】また本発明は、前記類似文検索方法におい
て、類似候補文について、例文集の文が大量にある場合
に、入力文の語句と同じ語句の数が所定の閾値以上であ
る類似候補文を新たな類似候補文とすることを特徴とす
る。
Further, according to the present invention, in the similar sentence search method, when there are a large number of sentences in the example sentence collection of similar candidate sentences, the number of similar phrases in the input sentence is equal to or greater than a predetermined threshold. As a new similar candidate sentence.

【0015】また本発明は、前記類似文検索方法におい
て、例文集の各文と訳文の組である対訳用例を用いて、
入力文の類似文とその対訳を抽出することを特徴とす
る。
Further, the present invention provides the similar sentence search method, wherein a bilingual translation example, which is a set of each sentence of the example sentence collection and a translated sentence, is used.
It is characterized by extracting a similar sentence of the input sentence and its translation.

【0016】また本発明の類似文検索装置は、用例文を
複数保存した用例部と、入力文を読み込む入力手段と、
前記用例部の用例文から得られる類似候補文を語句単位
に解析し、文法的もしくは意味的に置換、削除、追加が
可能な箇所に各情報を付与する用例解析・情報付与手段
と、前記入力手段によって読み込まれた入力文を語句単
位に解析し、文法的もしくは意味的に置換が可能な箇所
や類似候補文の追加可能箇所との一致が可能な箇所に各
情報を付与する解析・情報付与手段と、解析された類似
候補文について、入力文と類似候補文との類似度計算の
際に、各文の差分箇所に対して同種の置換箇所の一致、
不要箇所の削除や不足箇所の追加を考慮した上で類似度
を計算し、最も類似度が高い類似候補文を類似文として
抽出する検索手段、前記検索手段により抽出された類似
文を類似度とともに出力する出力手段とを有することを
特徴とするものである。
Further, the similar sentence retrieval apparatus of the present invention comprises an example section storing a plurality of example sentences, an input means for reading an input sentence,
An example analyzing / information adding means for analyzing a similar candidate sentence obtained from the example sentence of the example unit on a phrase basis, and adding each information to a place where grammatical or semantic replacement, deletion, and addition are possible; Analyze and input information by analyzing the input sentence read by the means on a word-by-phrase basis and assigning each information to a place where grammatical or semantic substitution is possible or a place where a similar candidate sentence can be added. Means for calculating the similarity between the input sentence and the similar candidate sentence with respect to the analyzed similar candidate sentence;
Search means for calculating similarity in consideration of deletion of unnecessary parts and addition of missing parts, and extracting the similar candidate sentence having the highest similarity as a similar sentence, the similar sentence extracted by the search means together with the similarity Output means for outputting.

【0017】また本発明は、前記類似文検索装置におい
て、検索手段が、類似度の最も高い類似候補文に加え、
類似度が高い方から所定の数の類似候補文を類似文とし
て抽出することを特徴とするものである。
Further, according to the present invention, in the similar sentence search apparatus, the search means includes a similar candidate sentence having the highest similarity,
It is characterized in that a predetermined number of similar candidate sentences from the one with the highest similarity are extracted as similar sentences.

【0018】また本発明は、前記類似文検索装置におい
て、置換、削除、追加の情報の付与において、基となる
データとして、汎用的に利用できるものと、文書の分野
に依存するものに分けて各データを記述しておき、文書
の分野に依存するデータの自動作成において、既存の汎
用的もしくは分野依存のデータを用いて置換可能かつ削
除可能な箇所を削った例文集の文から類似している文を
集め、文中の置換情報が付与されていない箇所で、その
前後の箇所の表記や置換の種類が一致しており、該当箇
所の情報が同じで表記の異なるものの集合を新たな置換
対象のデータとして作成し、同時に、新たな置換対象の
データと前後の表記などを考慮して、新たな削除対象の
データとして作成するデータ作成手段を有することを特
徴とするものである。
Further, according to the present invention, in the similar sentence retrieval apparatus, when replacing, deleting, or adding additional information, data that can be used for general purposes and data that depends on the field of documents are divided into basic data. Each data is described, and in the automatic creation of data depending on the field of the document, similar to the sentence of the example sentence collection where parts that can be replaced and deleted are deleted using existing general-purpose or field-dependent data Of the sentence where the replacement information has not been added, the notation and the type of replacement before and after it are the same, and a set of items with the same information but different notations is added as a new replacement target. Data creation means for creating new data to be deleted, and at the same time creating new data to be deleted in consideration of the new data to be replaced and the notation before and after. .

【0019】また本発明は、前記類似文検索装置におい
て、検索手段において、事前に入力文の語句と同じ語句
の数が所定の閾値以上の文を類似候補文として検索対象
とすることを特徴とするものである。
Further, the present invention is characterized in that, in the similar sentence search device, the search means sets in advance a sentence in which the number of phrases equal to the input sentence is equal to or greater than a predetermined threshold as a similar candidate sentence. Is what you do.

【0020】また本発明は、前記類似文検索装置におい
て、用例文に対して訳文が対応づけられた対訳用例を用
いた場合に、前記検索手段により抽出された類似文とそ
の訳文を出力する出力手段とを有することを特徴とする
ものである。
Further, according to the present invention, in the similar sentence retrieval apparatus, when a bilingual translation example in which a translated sentence is associated with an example sentence is used, a similar sentence extracted by the retrieval means and an output for outputting the translated sentence are output. Means.

【0021】また本発明は、例文集から入力文の類似文
を検索する類似文検索プログラムを記録した記録媒体に
おいて、例文集の類似候補文について事前に文法的もし
くは意味的に置換、削除、追加が可能な箇所に各情報を
付与し、入力文にも同様に置換が可能な箇所や類似候補
文の追加可能箇所との一致が可能な箇所に各情報を付与
した上で、入力文と類似候補文との類似度計算の際に、
各文の差分箇所に対しての同種の置換箇所の一致、不要
箇所の削除や不足箇所の追加を考慮した処理を行い、最
も類似度の高い類似候補文を類似文として類似度ととも
に抽出する処理をコンピュータに実行させるためのもの
である。
According to the present invention, a similar sentence search program for retrieving a similar sentence of an input sentence from an example sentence collection is recorded on a recording medium. Each information is added to a place where the input sentence is possible, and each information is added to a place where the input sentence can be replaced and a place where a similar candidate sentence can be added. When calculating the similarity with the candidate sentence,
A process that considers matching of the same type of replacement part to the difference part of each sentence, deleting unnecessary parts and adding missing parts, and extracting the similar candidate sentence with the highest similarity as similar sentence together with similarity Is executed by a computer.

【0022】また本発明は、前記類似文検索プログラム
を記録した記録媒体において、類似度の最も高い類似候
補文に加え、類似度が高い方から所定の数の類似候補文
を類似文として出力する処理をコンピュータに実行させ
るためのものである。
Further, according to the present invention, in a recording medium on which the similar sentence search program is recorded, a predetermined number of similar candidate sentences having the highest similarity are output as similar sentences in addition to the similar candidate sentences having the highest similarity. This is for causing a computer to execute processing.

【0023】また本発明は、前記類似文検索プログラム
を記録した記録媒体において、置換、削除、追加の情報
を付与するための基となるデータとして、汎用的に利用
できるものと、文書の分野に依存するものに分けて各デ
ータを作成し、文書分野に依存するデータの自動作成に
おいて、既存の汎用的もしくは分野依存のデータを用い
て情報を例文に付与し、置換可能かつ削除可能な箇所を
削った例文集から類似している文を集め、文中の置換情
報が付与されていない箇所で、その前後の箇所の表記や
置換の種類が一致しており、該当箇所の情報が同じで表
記の異なるものの集合を新たな置換対象のデータとして
作成し、同時に、新たな置換対象のデータと前後の表記
などを考慮して、新たな削除対象のデータとして作成す
る処理をコンピュータに実行させるためのものである。
[0023] The present invention also relates to a recording medium on which the similar sentence search program is recorded, which can be generally used as base data for adding replacement, deletion, and additional information, and in the field of documents. Create each data separately for dependent items, and in the automatic creation of data that depends on the document field, add information to example sentences using existing general-purpose or field-dependent data, and specify places that can be replaced and deleted Collect similar sentences from the cut example sentence collection, and in places where replacement information is not added in the sentence, the notation and replacement type of the preceding and following parts match, and the information of the corresponding part is the same and the notation A process of creating a set of different items as new data to be replaced and, at the same time, creating new data to be deleted in consideration of the new data to be replaced and the notation before and after, etc. It is intended to be executed by the data.

【0024】また本発明は、前記類似文検索プログラム
を記録した記録媒体において、類似候補文について、例
文集の文が大量にある場合に、入力文の語句と同じ語句
の数が所定の閾値以上である類似候補文を新たな類似候
補文とする処理をコンピュータに実行させるためのもの
である。
Further, according to the present invention, in the recording medium storing the similar sentence search program, when there are a large number of sentences in the example sentence collection of similar candidate sentences, the number of the same phrases as the words of the input sentence is not less than a predetermined threshold value This is for causing the computer to execute a process of setting the similar candidate sentence as a new similar candidate sentence.

【0025】また本発明は、前記類似文検索プログラム
を記録した記録媒体において、例文集の各文と訳文の組
である対訳用例を用いて、入力文の類似文とその対訳を
抽出する処理をコンピュータに実行させるためのもので
ある。
Further, according to the present invention, a process for extracting a similar sentence of an input sentence and a bilingual translation thereof on a recording medium on which the similar sentence search program is recorded, using a bilingual example which is a set of each sentence of the example sentence and a translated sentence. It is intended to be executed by a computer.

【0026】本発明は対訳例文集にある類似候補文から
入力文の類似文を検索する方法において、文法的もしく
は意味的に置換可能箇所、類似候補文の追加可能箇所と
の一致が可能な箇所の情報を付与された入力文と事前に
文法的もしくは意味的に置換、削除、追加可能箇所の情
報を付与された類似候補文を用いて、入力文と類似候補
文で表現が異なる箇所について、置換(入力文と類似候
補文)および削除、追加(類似候補文)を行って類似度
を計算し、最も類似度が高い類似候補文を類似文として
類似度とともに抽出すると同時に類似文の訳文を抽出す
るようにした。
According to the present invention, in a method of searching for a similar sentence of an input sentence from similar candidate sentences in a bilingual example sentence collection, a grammatically or semantically replaceable portion and a portion where a similar candidate sentence can be added to a possible addition portion Using a similar candidate sentence to which the information of the input sentence to which information is added and a grammatically or semantically replaced, deleted, and addable portion in advance is used, for a portion where the expression differs between the input sentence and the similar candidate sentence, The similarity is calculated by performing replacement (input sentence and similar candidate sentence), and deleting and adding (similar candidate sentence), extracting the similar candidate sentence having the highest similarity as a similar sentence together with the similarity, and simultaneously translating the similar sentence. I tried to extract.

【0027】また、装置構成として対訳用例に関するデ
ータを保存した用例部と、入力文を読み込む入力手段
と、前記用例部による類似候補文を語句単位に解析し、
文法的もしくは意味的に置換、削除、追加が可能な箇所
と種類の情報を付与する用例解析・情報付与手段と、前
記入力手段による入力文を語句単位に解析し、文法的も
しくは意味的に置換が可能な箇所や類似候補文の追加可
能箇所との一致が可能な箇所に各情報を付与する解析・
情報付与手段と、解析された類似候補文と解析された入
力文とで語句が異なる箇所について、入力文と類似候補
文の語句を置換したり、類似候補文の語句を削除した
り、類似候補文に語句を追加して、入力文との類似度を
計算し、最も類似度が高い類似候補文を類似文として抽
出すると同時に対訳も抽出する検索手段、検索結果を出
力する出力手段とを有するようにした。
[0027] Further, an example section storing data relating to a translation example as an apparatus configuration, input means for reading an input sentence, and analyzing similar candidate sentences by the example section in terms of words and phrases,
Grammatically or semantically replace, delete, add example and assignable information that can be added and added, and an input analysis unit that analyzes the input sentence by the input unit, and replaces it grammatically or semantically. Analysis that assigns each information to a place where a match is possible and a place where a similar candidate sentence can be added
The information adding means replaces the input sentence and the phrase of the similar candidate sentence, deletes the phrase of the similar candidate sentence, deletes the phrase of the similar candidate sentence, and the like in a portion where the phrase differs between the analyzed similar candidate sentence and the analyzed input sentence. A search unit for calculating a similarity with the input sentence by adding a word to the sentence, extracting a similar candidate sentence having the highest similarity as a similar sentence, and simultaneously extracting a translation, and an output unit for outputting a search result I did it.

【0028】また、用例部において、用例文をあらかじ
め解析し、かつ解析された語句に対して、置換、削除、
追加が可能な箇所に各情報を自動または手動により設定
するデータ作成手段を有するようにした。
In the example section, the example sentence is analyzed in advance, and the analyzed words are replaced, deleted,
A data creation means for automatically or manually setting each information at a place where addition is possible is provided.

【0029】[0029]

【発明の実施の形態】以下図面を参照して本発明の実施
形態例を詳細に説明する。
Embodiments of the present invention will be described below in detail with reference to the drawings.

【0030】図1は、本発明の一実施形態例に係る類似
文検索装置の処理手順ならびに装置構成を示したもの
で、1は第1自然言語文を入力する入力部、2は解析・
情報付与部で、入力部1で読み込まれた文を図2(b)
に示すように形態素解析等によって、文を語句に分解し
後述する置換可能箇所や類似候補文の追加可能箇所との
一致可能箇所を付与する。3は、解析された入力文と後
述する解析された用例文とを比較して類似文を検索する
検索部、4は、検索部3で抽出された類似文、類似度、
類似文の訳文を出力する出力部である。5は、後述する
対訳用例集などを含む用例部である。
FIG. 1 shows a processing procedure and an apparatus configuration of a similar sentence retrieval apparatus according to an embodiment of the present invention, wherein 1 is an input section for inputting a first natural language sentence, and 2 is an analysis section.
The information reading unit reads the sentence read by the input unit 1 as shown in FIG.
As shown in (1), the sentence is decomposed into words and phrases by morphological analysis and the like, and a replaceable portion described later and a matchable portion with the addable portion of similar candidate sentences are added. Reference numeral 3 denotes a search unit for comparing the analyzed input sentence with an analyzed example sentence to be described later to search for a similar sentence, and 4 denotes a similar sentence extracted by the search unit 3,
An output unit that outputs a translation of a similar sentence. Reference numeral 5 denotes an example section including a bilingual example collection to be described later.

【0031】検索部3は、図13に示すように解析・情
報付与部2で解析・情報付与し、入力文の類似文を解析
済み対訳用例集60から検索する。まず、類似候補文抽
出部301では入力文に含まれる語句が閾値以上の数だ
け含まれている用例文を類似候補文として絞り込む。次
に類似候補文・入力文加工部302では、解析して情報
が付与された入力文と類似候補文を用いて、入力文と類
似候補文との差分箇所を同じ種類の記号に置換したり、
類似候補文だけにある不要箇所を削除したり、類似候補
文に不足している箇所の語句を追加して、お互いの文が
文法的もしくは意味的に類似するように加工を施す。次
に類似度計算部303で加工された入力文と類似候補文
を類似度計算し、最も類似度が高い文を類似度とともに
抽出する。
The search unit 3 analyzes and assigns information by the analysis and information assigning unit 2 as shown in FIG. 13, and searches the analyzed sentence collection 60 for a similar sentence of the input sentence. First, the similar candidate sentence extraction unit 301 narrows down, as similar candidate sentences, example sentences in which the number of words included in the input sentence is equal to or greater than the threshold value. Next, the similar candidate sentence / input sentence processing unit 302 replaces the difference between the input sentence and the similar candidate sentence with a symbol of the same type using the input sentence and the similar candidate sentence to which the information is added by analysis. ,
Unnecessary portions only in the similar candidate sentences are deleted, or words of the missing portions are added to the similar candidate sentences, and processing is performed so that the sentences are grammatically or semantically similar to each other. Next, the input sentence and the similar candidate sentence processed by the similarity calculation unit 303 are calculated for similarity, and the sentence having the highest similarity is extracted together with the similarity.

【0032】10〜60は、用例部5に含まれ類似文検
索のためにあらかじめデータベースとして準備されてい
る対訳用例集およびその用例解析・情報付与部等で、4
0は図3に例示するように文番号の日本語文(第1自然
言語)とそれに対応する英語文(第2自然言語)とが対
になった対訳用例が複数データとして記憶されている対
訳用例集で、図3では、サッカーについての日本語と英
語の対訳の場合について示したもので、このような対訳
用例が必要に応じて分野毎、翻訳する言語間毎に用意さ
れている。
Reference numerals 10 to 60 denote translation example collections and example analysis / information adding units included in the example section 5 and prepared in advance as a database for similar sentence retrieval.
0 is a bilingual example in which a Japanese sentence (first natural language) with a sentence number and a corresponding English sentence (second natural language) are stored as a plurality of data as illustrated in FIG. FIG. 3 shows a case of Japanese and English bilingual translations of soccer. Such bilingual examples are prepared for each field and for each language to be translated as necessary.

【0033】10は、分野毎の用例に依存しない共通的
に使用される用例非依存の語句データ(データベース)
であり、用例非依存の語句データの例を図4に示す。図
4では、語句をそのまま列挙した辞書、複数の品詞や表
記を列挙した対訳パターン、ある条件により後述の対象
を決定するルールに分けられ、かつそれぞれ、入力文に
おける置換対象、追加一致対象、類似候補文における置
換対象、削除対象、追加対象をあらかじめ定めてある。
この各対象の指定は、あらかじめ手動または自動で行わ
れる。入力文における追加一致対象は類似候補文の追加
対象との一致を調べる対象であり、類似候補文における
削除対象と共通である。
Reference numeral 10 denotes example-independent phrase data (database) commonly used without depending on examples in each field.
FIG. 4 shows an example of example-independent phrase data. In FIG. 4, the dictionary is divided into a dictionary in which words are enumerated as it is, a bilingual pattern in which a plurality of parts of speech and notations are enumerated, and rules for determining an object to be described later according to certain conditions. Replacement targets, deletion targets, and addition targets in the candidate sentence are determined in advance.
The designation of each target is performed manually or automatically in advance. The additional matching target in the input sentence is a target for checking the match with the adding target of the similar candidate sentence, and is common to the deletion target in the similar candidate sentence.

【0034】入力文、類似候補文における置換対象は、
解析・情報付与部2、用例解析・情報付与部50で形態
素解析した結果をもとに、接続詞や副詞や数詞や連体詞
などのパターンを置換対象とする。また、ルールによっ
て形容詞、形容動詞は各々の活用や型ごとに分けて指定
する。
The replacement target in the input sentence and the similar candidate sentence is
Based on the results of the morphological analysis performed by the analysis / information providing unit 2 and the example analysis / information providing unit 50, patterns such as connectives, adverbs, numerals, adnominals, etc. are set as replacement targets. In addition, adjectives and adjective verbs are specified for each conjugation or type according to rules.

【0035】入力文における追加一致対象、類似候補文
における削除対象は、修飾する語や独立している語を主
に対象とする。解析・情報付与部2、用例解析・情報付
与部50で形態素解析した結果をもとに、接続詞や副詞
や連体詞、形容詞の連体形および形容動詞の連体
形、“、”、名詞+“に”などを削除対象として指定す
る。
The additional matching target in the input sentence and the deletion target in the similar candidate sentence mainly target a word to be modified or an independent word. Based on the results of the morphological analysis performed by the analysis / information addition unit 2 and the example analysis / information addition unit 50, a conjunction, an adverb, an adverb, an adjective adnominal and an adjective verb adjunct, “,”, noun + “ni” Is specified as a deletion target.

【0036】類似候補文における追加対象は、その種類
と語句の追加位置を指定する。図4のルールでは、ある
用例文1から削除可能箇所を挟んで前n個の語、後m個
の語を含むパターンを抽出し、そのうち削除可能箇所が
抜けているパターンを含むある用例文2を探し、前n
個、後m個の語に対応するものがそれぞれ品詞、型、活
用形で一致し、ある用例文1と同じ削除可能箇所の要素
がある用例文2に含まれていない場合に、その用例文2
の前n個と後m個の語間に追加可能箇所の情報を付与す
る。
The type of the object to be added in the similar candidate sentence and the position to add the phrase are specified. According to the rule of FIG. 4, a pattern including n words before and m words after the deletable portion is extracted from a certain example sentence 1, and a certain example sentence 2 including a pattern in which the deletable portion is missing among them. Search for
If the words corresponding to the m and the last m words match in the part of speech, the type, and the inflected form, respectively, and the element in the same deletable part as the example sentence 1 is not included in the example sentence 2, the example sentence 2
Is added between the n preceding words and the m following words.

【0037】他の置換、削除、追加対象には、人手もし
くは既にある一般の辞書を利用して、必要最小限の辞
書、パターン、ルールを指定する。その例は図4のパタ
ーンの欄にある「[時間]」や「[時間]“に”」が該
当する。
For other replacements, deletions, and additions, the required minimum dictionaries, patterns, and rules are designated using a human or an existing general dictionary. The example corresponds to “[time]” or “[time]“ ni ”” in the pattern column of FIG.

【0038】20は用例依存の語句データ(データベー
ス)で、分野毎に用意されている。これは、分野非依存
な置換、削除、追加一致、追加対象とならない特有の表
現に関して補うことができる。用例依存の語句データ2
0の例を図5(b)に示す。図5(b)も図4と同様の
構成になっている。
Reference numeral 20 denotes example-dependent phrase data (database) prepared for each field. This can be supplemented for field-independent substitutions, deletions, additional matches, and specific expressions not to be added. Example dependent phrase data 2
An example of 0 is shown in FIG. FIG. 5B has the same configuration as FIG.

【0039】30は用例依存データを自動的に作成する
データ作成部であり、例えば、置換対象などを得ること
ができる。置換対象を自動で得るためには、まず、対訳
用例集(データベース)40を用例非依存、用例依存の
語句対応データ(図5(a))を順に用いて、用例解析
・情報付与部50で用例文に情報を付与する。次に、そ
れらの情報を元に置換可能箇所と削除可能箇所が全く同
じ範囲に現れるものや、削除可能箇所に含まれていて全
ての名詞や形容詞といった自立語が置換可能箇所となっ
ている削除可能箇所を削り、得られた用例間で後述の入
力文と類似候補文との類似度計算(置換のみを考慮)を
利用し、類似度が閾値T以上の文を抽出し、各用例で置
換対象となっていないL語が同じ品詞列であり、その前
後K個が同じ要素である場合に、各用例のL語の語句を
新たな置換対象と定める。図6にその例を示す。その結
果追加されたものが図5(b)の「[置換1]」であ
る。この場合、「PK」も「フリーキック」も名詞であ
るので文法的に一致しているが、分野依存の用例文から
似ている文を選び、前後の要素の一致で制限しているこ
とから、より意味的に類似しているものが得られてい
る。また、得られた置換対象が名詞の場合は、「[置換
1]“で”」を削除対象とする。閾値Tを最初は高めに
設定し、所定の下限まで順次閾値を下げて置換対象を抽
出する。
Reference numeral 30 denotes a data creation unit for automatically creating example-dependent data, for example, an object to be replaced can be obtained. In order to automatically obtain the replacement target, first, the example analyzing / information adding unit 50 uses the bilingual example collection (database) 40 in the example-independent and example-dependent word correspondence data (FIG. 5A) in order. Add information to example sentences. Next, based on such information, replaceable parts and deleteable parts appear in exactly the same range, or deleteable parts that are included in deleteable parts and all independent words such as nouns and adjectives are replaceable parts A possible portion is cut off, and a sentence having a similarity greater than or equal to a threshold T is extracted between the obtained examples by using a similarity calculation between an input sentence and a similar candidate sentence (considering only replacement) described later, and replaced by each example. If the L words that are not the target are the same part-of-speech sequence and the preceding and succeeding K words are the same element, the L word phrase of each example is determined as a new replacement target. FIG. 6 shows an example. As a result, “[Replacement 1]” in FIG. 5B is added. In this case, both "PK" and "free kick" are nouns, so they match grammatically. However, similar sentences are selected from field-dependent example sentences and restricted by matching before and after elements. And more semantically similar. If the obtained replacement target is a noun, “[replacement 1]“ de ”is to be deleted. The threshold value T is initially set to a higher value, and the threshold value is sequentially lowered to a predetermined lower limit to extract a replacement target.

【0040】50は、用例非依存の語句データ10、用
例依存の語句データ20、対訳用例集40から類似文と
して抽出された候補文を図2(a)に示すように形態素
解析等により解析する用例解析・情報付与部で、処理す
る入力文の解析・情報付与部2(図2(b))と同様に
置換、削除、追加可能箇所を調べ、情報を付与する。
Reference numeral 50 denotes an example-independent phrase data 10, an example-dependent phrase data 20, and a candidate sentence extracted as a similar sentence from the bilingual example collection 40 by morphological analysis or the like as shown in FIG. In the example analysis / information addition unit, replacement / deletion / addition possible portions are checked in the same manner as in the analysis / information addition unit 2 (FIG. 2B) of the input sentence to be processed, and information is added.

【0041】60は、解析済み対訳用例集(データベー
ス)で、用例解析・情報付与部50の出力を保持し、情
報付与された入力文と検索部3において比較するための
部分であり、その例を図8および図9に示す。
Reference numeral 60 denotes an analyzed bilingual example collection (database) for holding the output of the example analyzing / information adding unit 50 and comparing the input sentence with the information with the search unit 3. Are shown in FIGS. 8 and 9.

【0042】図8は、図3に示した対訳用例集の日本語
用例を、それぞれの文について解析し(図7)、図4に
示した用例非依存の語句データおよび図5(b)に示し
た用例依存の語句データにしたがい置換、削除、追加可
能箇所を示したものである。解析結果からの情報付与
は、用例非依存、用例依存の順に、置換対象、削除対
象、追加対象の順で行う。用例非依存と用例依存の語句
データは同じ種類であれば、非依存、依存に関係なく同
じ集合として扱う(例、[時間])。図7における品詞
および図8における置換、削除、追加の欄は、図9にあ
るように略称で示してある。ただし、図8の削除の欄は
削除語句の範囲を示したものである。また解析済み対訳
用例集60には、図2(a)の処理により文中の単語と
文番号の対応表(図10)も格納されている。
FIG. 8 shows an example of the Japanese translation of the bilingual example collection shown in FIG. 3 analyzed for each sentence (FIG. 7), and the example-independent phrase data shown in FIG. 4 and FIG. It shows replacement, deletion, and addition locations according to the example-dependent phrase data shown. Information is added from the analysis result in the order of example-independent and example-dependent, in the order of replacement target, deletion target, and addition target. As long as the example-independent and example-dependent phrase data are of the same type, they are treated as the same set regardless of the dependency or dependence (eg, [time]). The parts of speech in FIG. 7 and the replacement, deletion, and addition fields in FIG. 8 are abbreviated as shown in FIG. However, the column of deletion in FIG. 8 shows the range of the deletion phrase. The analyzed translation example book 60 also stores a correspondence table (FIG. 10) between words in a sentence and a sentence number by the process of FIG. 2A.

【0043】用例解析・情報付与部50および解析済み
対訳用例集60は、実施例では各データ10,20,対
訳用例集40に対応してあらかじめ解析済みデータを保
持している場合について説明するが、処理毎に各データ
10,20,対訳用例集40から抽出し、情報を付与す
る方式としてもよい。
In the embodiment, the example analysis / information adding unit 50 and the analyzed bilingual example collection 60 will be described in a case where the analyzed data is stored in advance in correspondence with the data 10, 20, and the bilingual example collection 40. Alternatively, a method may be used in which data is extracted from each of the data 10, 20 and the bilingual example collection 40 for each process and information is added.

【0044】[0044]

【実施例】以下、図面と共に本発明の実施例を説明す
る。以下の実施例では入力される語句を日本語、検索さ
れた類似文の訳文の語句を英語として説明するが、これ
に限定されない。
Embodiments of the present invention will be described below with reference to the drawings. In the following embodiments, the input phrase will be described as Japanese and the translated phrase of the searched similar sentence will be described as English, but the present invention is not limited to this.

【0045】[実施例1]まず、事前に図1の用例部5
にあるデータを準備する。
[Embodiment 1] First, the example section 5 of FIG.
Prepare the data in.

【0046】図3を図1の対訳用例集40の対訳用例と
すると、図1の用例解析・情報付与部50内において、
図2の用例文の解析で形態素解析処理により図7が作成
される。図7は文節ごとに“|”で、品詞ごとに“/”
で区切っており、品詞、型の番号、活用形を記してい
る。また、図7と用例非依存の語句データ10と用例依
存の語句データ20を用いて、用例文ごとに置換、削
除、追加可能箇所を調べ、情報を付与すると図8が作成
される。このとき同時に、用例文に含まれている単語と
文番号の対応表(図10)も作成される。図8と図10
は、図1の解析済み対訳用例集60に蓄積される。
Assuming that FIG. 3 is a bilingual example of the bilingual example collection 40 of FIG. 1, the example analyzing / information adding unit 50 of FIG.
FIG. 7 is created by the morphological analysis processing in the analysis of the example sentence of FIG. FIG. 7 shows “|” for each phrase and “/” for each part of speech.
The part of speech, type number, and inflected form are described. In addition, using FIG. 7, the example-independent phrase data 10 and the example-dependent phrase data 20, a possible replacement, deletion, and addition portion is checked for each example sentence, and information is added, thereby creating FIG. At this time, a correspondence table (FIG. 10) between words and sentence numbers included in the example sentence is also created. 8 and 10
Are stored in the analyzed parallel translation example collection 60 of FIG.

【0047】図8を作成する際に利用される図1の用例
依存の語句データ20(図5(b))の一部は自動的に
作成される。まず、用例依存の語句対応データの置換箇
所を追加するために、対訳用例集40について、用例非
依存の語句データ10(図4)と用例依存の語句データ
20(図5(a))を用いて情報付与を行う。次に、図
6に従うように置換可能箇所と削除可能箇所が全く同じ
範囲に現れているものや削除可能箇所に含まれている全
ての自立語が置換箇所となっている削除可能箇所を削
る。次に、後述の入力文と類似候補文の類似度計算と同
じ方法で得られた用例間の類似度計算(置換のみを考
慮)を行い、類似度が閾値T以上の文を抽出する。次
に、各用例で置換対象となっていないL語が同じ品詞列
であり、その前後K個が同じ要素となる場合に、各用例
のL語の語句を置換対象と定める。その結果、図5
(b)の置換対象の辞書の欄に「[置換1]」が、「対
訳辞書:[置換1]」が追加される。また、得られた置
換対象である[置換1]が名詞の場合は、他の単語と同
じように「[置換1]“で”」を削除対象のパターンに
追加する。閾値Tを最初は高めに設定し、所定の下限ま
で順次閾値を下げて置換対象を抽出することも可能であ
る。
A part of the example-dependent phrase data 20 (FIG. 5B) of FIG. 1 used when creating FIG. 8 is automatically created. First, in order to add a replacement part of the example-dependent phrase correspondence data, the example-independent word data 10 (FIG. 4) and the example-dependent word data 20 (FIG. 5A) are used for the bilingual example collection 40. To give information. Next, as shown in FIG. 6, a part where the replaceable part and the part which can be deleted appear in exactly the same range, and a part which can be deleted where all the independent words included in the part which can be deleted are replaced are deleted. Next, the similarity between the examples obtained by the same method as the similarity calculation between the input sentence and the similar candidate sentence, which will be described later, is calculated (considering only substitution), and a sentence having a similarity equal to or greater than the threshold T is extracted. Next, in a case where the L words that are not replaced in each example are the same part-of-speech sequence and the preceding and succeeding K words are the same element, the L word in each example is determined as a replacement target. As a result, FIG.
“[Replacement 1]” and “Bilingual dictionary: [Replacement 1]” are added to the column of the dictionary to be replaced in (b). When the obtained replacement target [replacement 1] is a noun, “[replacement 1]“ de ”” is added to the pattern to be deleted as in the case of other words. It is also possible to initially set the threshold value T to a higher value and then sequentially lower the threshold value to a predetermined lower limit to extract the replacement target.

【0048】次に、実際の処理において、図1の1から
入力文「中田がPKで貴重な得点をあげた。」が入力さ
れ、図1の解析、情報付与部2内において、図2の入力
文の解析で形態素解析処理により図11が作成される。
また、置換可能箇所と類似候補文との追加一致箇所を調
べ、情報を付与すると図12が作成される。
Next, in the actual processing, the input sentence “Nakada gave a valuable score in PK” was input from 1 in FIG. 1, and in the analysis and information adding unit 2 in FIG. FIG. 11 is created by the morphological analysis processing in the analysis of the input sentence.
In addition, FIG. 12 is created by examining an additional matching portion between the replaceable portion and the similar candidate sentence and adding information.

【0049】次に、図1の検索部3において、入力文と
図3の用例文(日本語用例)から入力文の類似文を検索
する。検索部3は図13のようになっている。
Next, the search unit 3 in FIG. 1 searches the input sentence and the example sentence (Japanese example) in FIG. 3 for a similar sentence of the input sentence. The search unit 3 is as shown in FIG.

【0050】類似候補文抽出部301では、事前にあま
り似ていない用例文の処理を省くために、入力文の単語
と同じ単語が所定の閾値以上含んでいる文を選択する。
実際は入力文の単語を図10で調べ、入力文の単語数に
おける一致単語数の割合が閾値以上の文を選択する。文
1=8/10=0.8、文2=8/10=0.8、文3
=9/10=0.9、文4=1/10=0.1となり、
閾値が0.7であったとすると、文1〜3が選択され
る。
The similar candidate sentence extracting unit 301 selects a sentence in which the same word as the word of the input sentence exceeds a predetermined threshold value in order to omit the processing of the example sentence that is not very similar.
Actually, the words of the input sentence are checked in FIG. 10, and a sentence in which the ratio of the number of matching words to the number of words of the input sentence is equal to or more than a threshold is selected. Sentence 1 = 8/10 = 0.8, Sentence 2 = 8/10 = 0.8, Sentence 3
= 9/10 = 0.9, sentence 4 = 1/10 = 0.1,
If the threshold is 0.7, sentences 1 to 3 are selected.

【0051】類似候補文・入力文加工部302で、入力
文と類似候補文がより類似するように類似候補文につい
て置換箇所の一般化や不要語句の削除や必要語句の追加
を行い、入力文について置換箇所の一般化を行う。図1
4に入力文と文3の例を示す。まず、入力文、類似候補
文の解析結果から、文節ごとに表記の一致を調べる。図
14のでは、入力文の先頭の文節から文3に同じもの
があるかどうか調べると、“中田が”、“PKで”、
“貴重な”、“得点を”が一致するので、対応している
印として1を格納している。次に、表記が一致しない文
節について語句の置換を施した文節ごとに一致を調べ
る。置換を施す際には、図14のの文3のように、1
文節内の表現が複数考えられる場合があるので、表記そ
のものが多く含まれるものを優先して、文節内の語と置
換可能箇所からなる要素の合計数が多く、置換可能箇所
に該当する単語数が少ない順に並列に蓄積しておき、そ
の順番で一致を調べる。文3の“30分に”は、
“[数]分 に”と“[時間]に”があるが、前者の方
が優先される。結果、図14のでは、入力文の“[動
_241_用]た。”は一致しない。図14のでは、
語句の置換を施した文節でも一致しない場合について、
その文節内で取りうる全ての単語や置換箇所を単位とし
て一致を調べる。ここでは、優先順位を複数の単語が一
致する置換可能箇所、1語の表記、1語の置換可能箇所
として順に並列に蓄積しておき、その順番で一致を調べ
る。結果、図14のでは、“た”と“。”が一致する
ので、対応している印として1を格納している。図14
のでは、類似候補文において1度も一致しない箇所に
ついて削除可能箇所ならば削除を施し、入力文において
1度も一致しない箇所と同じものが類似候補文に追加可
能箇所としてあれば、類似候補文にそれを追加する。ま
ず、入力文、文3ともに〜で対応した箇所を調べ、
次に文3の削除可能箇所を調べる。削除可能箇所は、ま
ず文節全体で削除できるものがあるか調べ、なければ残
りの削除可能箇所を組合せて最も多くの単語が削除でき
る削除可能箇所を選ぶ。結果、“そして、”と“30分
に”が削除される。次に文3において追加可能箇所を調
べるが、そのようなものはないため、入力文は「中田/
が/PK/で/貴重な/得点/を/あげ/た/。」、文
3は「中田/が/PK/で/貴重な/得点/を/し/た
/。」に加工される。同様に、文1、文2も加工すると
図15のようになる。
The similar candidate sentence / input sentence processing unit 302 generalizes replacement parts, deletes unnecessary phrases, and adds necessary phrases so that the input sentence and the similar candidate sentence are more similar to each other. Is generalized for the replacement part. FIG.
FIG. 4 shows an example of the input sentence and the sentence 3. First, based on the analysis result of the input sentence and the similar candidate sentence, the matching of the notation is checked for each phrase. In FIG. 14, if it is checked whether the same sentence is found in sentence 3 from the head clause of the input sentence, “Nakata”, “PK”,
Since "precious" and "score" match, "1" is stored as the corresponding mark. Next, a match is checked for each of the clauses in which notation does not match and the phrase is replaced. When performing replacement, as shown in sentence 3 of FIG.
Since there may be more than one expression in a clause, the one that contains a lot of notation itself is prioritized, and the total number of elements consisting of words and replaceable parts in the clause is large, and the number of words corresponding to replaceable parts Are stored in parallel in ascending order, and matching is checked in that order. In sentence 3, "in 30 minutes"
There are "in [number] minutes" and "in [time]", but the former takes precedence. As a result, in FIG. 14, the input sentence “[Dynamic_241_for]” does not match. In FIG. 14,
If the phrase does not match even if the phrase is replaced,
A match is checked for all possible words and replacements in the phrase. Here, priorities are sequentially stored in parallel as a replaceable portion where a plurality of words match, a notation of one word, and a replaceable portion of one word, and matching is checked in that order. As a result, in FIG. 14, since “ta” and “.” Match, “1” is stored as the corresponding mark. FIG.
In a similar candidate sentence, if a part that does not match even once is a part that can be deleted, it is deleted. If a part that does not match even once in the input sentence is a part that can be added to the similar candidate sentence, a similar candidate sentence is deleted. Add it to First, the input sentence and sentence 3 are checked for the corresponding parts in
Next, the portion where the sentence 3 can be deleted is checked. First, a check is made to see if there is any part that can be deleted in the whole phrase, and if not, the remaining deleteable parts are combined to select a part that can delete the most words. As a result, "and" and "in 30 minutes" are deleted. Next, the place where addition is possible is examined in sentence 3, but there is no such thing, so the input sentence is “Nakada /
But / PK / in / precious / score / in / up / in /. And sentence 3 are processed into "Nakada / ga / PK // precious / score / was / ta /." Similarly, sentence 1 and sentence 2 are processed as shown in FIG.

【0052】次に、類似度計算部303において、加工
した入力文と類似候補文を類似度計算する。ここでは、
類似度の計算式を以下のようにするが、他の方法で適切
なものがあればそれを利用しても構わない。
Next, the similarity calculator 303 calculates the similarity between the processed input sentence and the similar candidate sentence. here,
The formula for calculating the degree of similarity is as follows, but any other suitable method may be used.

【0053】類似度=2×一致要素数/(入力文の要素
数+類似候補文の要素数) 結果、図15にあるように、文2が類似文として選択さ
れる。
Similarity = 2 × number of matching elements / (number of elements of input sentence + number of elements of similar candidate sentence) As a result, as shown in FIG. 15, sentence 2 is selected as a similar sentence.

【0054】最後に、図1の出力部において類似文と類
似文の訳文が出力される。この例では、文2「中山がフ
リーキックで貴重な得点をあげた。」と「Nakaya
maadded a valuable goal f
rom a free kick.」が出力される。仮
に、類似度が同じものが複数あった場合は、図15に示
すように、での削除後の要素(単語や置換可能箇所)
一致度、での要素一致度、での文節一致度、での
文節一致度を順に比較し、類似度に差が出た時点で類似
度の高い方を選択する。要素の一致とは、文節中に含ま
れる単語や置換箇所のレベルで行うことである。
Finally, a similar sentence and a translated sentence of the similar sentence are output from the output unit in FIG. In this example, sentence 2 "Nakayama scored a precious score in a free kick."
maadded a variable goal f
roma free kick. Is output. If there are a plurality of elements having the same similarity, as shown in FIG. 15, the elements (words and replaceable parts) after the deletion
The degree of matching, the degree of element matching at, the degree of phrase matching at, and the degree of phrase matching at, are compared in order, and the one with the higher degree of similarity is selected when there is a difference in the degree of similarity. Element matching is performed at the level of a word or replacement part included in a phrase.

【0055】もし、図5(b)の置換対象として「[置
換1]」と「対訳辞書:[置換1]」を自動的に追加で
きていなければ、文1〜文3まで類似度が2×9/(1
0+10)=0.9となり、仮に複数の類似文を結果と
して出力せず、同じ類似度では文番号の早いものを出力
するとした場合、文全体の意味として他の文よりも入力
文に似ていない文1が選択されてしまう。
If “[Replacement 1]” and “Bilingual dictionary: [Replacement 1]” cannot be automatically added as replacement targets in FIG. 5B, the similarity between sentences 1 to 3 is 2 × 9 / (1
0 + 10) = 0.9. If it is assumed that a plurality of similar sentences are not output as a result, and that a sentence with the same similarity is output at a higher sentence number, the sentence as a whole is more similar to the input sentence than the other sentences. Missing sentence 1 is selected.

【0056】この例では、特定の分野の用例文を用い
て、文書の分野に依存するデータを自動作成しているた
め、文法的または意味的な置換箇所を増やすことで、よ
り細かな点を考慮した類似度計算をすることができ、ま
た、入力文に似ていない文をあらかじめ削除すること
で、類似度計算の時間を短縮できることを示した。
In this example, since the data depending on the field of the document is automatically created using the example sentence of the specific field, more detailed points can be obtained by increasing the number of grammatical or semantic replacement parts. It was shown that the similarity calculation can be performed in consideration of the above, and that the time required for the similarity calculation can be reduced by deleting sentences that are not similar to the input sentence in advance.

【0057】[実施例2]実施例1と同様に説明する。[Embodiment 2] A description will be given in the same manner as in Embodiment 1.

【0058】まず、事前に図1の用例部5にあるデータ
を準備する。
First, data in the example section 5 of FIG. 1 is prepared in advance.

【0059】図16を図1の対訳用例集40の対訳用例
とすると、図1の用例解析・情報付与部50内におい
て、図2(a)の用例文の解析で形態素解析処理により
図17が作成される。図17は文節ごとに“|”で、品
詞ごとに“/”で区切っており、品詞、型の番号、活用
形を記している。また、図17と用例非依存の語句デー
タ10と用例依存の語句データ20を用いて、用例文ご
とに置換、削除、追加可能箇所を調べ、情報を付与する
と図18が作成される。このとき同時に、各対訳用例の
文ごとに含まれている単語と文番号の対応表(図19)
も作成される。図18と図19は、図1の解析済み対訳
用例集60に蓄積される。
Assuming that FIG. 16 is a bilingual example of the bilingual example collection 40 of FIG. 1, in the example analyzing / information adding unit 50 of FIG. 1, FIG. 17 is obtained by analyzing the example sentence of FIG. Created. In FIG. 17, each phrase is delimited by “|” and each part of speech is delimited by “/”, and the part of speech, type number, and inflected form are described. In addition, using FIG. 17, the example-independent phrase data 10 and the example-dependent phrase data 20, the possible replacement, deletion, and addition locations are checked for each example sentence, and information is added to create FIG. At this time, at the same time, the correspondence table between the words and the sentence numbers included in each sentence of each translation example (FIG. 19)
Is also created. 18 and 19 are stored in the analyzed bilingual example collection 60 of FIG.

【0060】用例依存の語句データ20は、この例の対
訳用例が図16であるため、図5(a)が利用される。
また、図17、図18の置換、削除可能箇所と図4の用
例非依存の語句データにある追加対象のルール(m=2
とした場合)により、図18の文1の文頭に追加可能箇
所が付与されている(文2の文頭:「そして/、/固_
190_*/が」と文1の文頭:「固_190_*/が」に
おいて、「そして/、」が削除可能箇所となってい
る)。
FIG. 5A is used for the phrase data 20 depending on the example because the translation example of this example is shown in FIG.
In addition, the addition target rule (m = 2) in the replacement / deletion points in FIGS. 17 and 18 and the example-independent phrase data in FIG.
18), an addable portion is added to the beginning of sentence 1 in FIG. 18 (the beginning of sentence 2: "and /, / fix_"
190 _ * / ga ”and the beginning of sentence 1:“ in _190 _ * / ga ”,“ and /, ”is a part that can be deleted).

【0061】次に、実際の処理において、図1の入力部
1から入力文「そして、中田が貴重な得点をした。」が
入力され、図1の解析・情報付与部2内において、図2
(b)の入力文の解析で形態素解析処理により図20が
作成される。また、置換可能箇所と類似候補文との追加
一致箇所を調べ、情報を付与すると図21が作成され
る。
Next, in the actual processing, the input sentence "And Nakata scored a valuable score." Is input from the input unit 1 of FIG. 1, and the input sentence of FIG.
FIG. 20 is created by the morphological analysis processing in the analysis of the input sentence of (b). In addition, FIG. 21 is created by examining an additional matching portion between the replaceable portion and the similar candidate sentence and adding information.

【0062】次に、図1の検索部3において、入力文と
図16の用例文(日本語用例)から入力文の類似文を検
索する。図13に沿って説明する。
Next, the search unit 3 in FIG. 1 searches the input sentence and the example sentence (Japanese example) in FIG. 16 for a similar sentence of the input sentence. This will be described with reference to FIG.

【0063】類似候補文抽出部301では、事前にあま
り似ていない用例文の処理を省くために、入力文の単語
と同じ単語が所定の閾値以上含んでいる文を選択する。
実際は入力文の単語を図19で調べ、入力文の単語数に
おける一致単語数の割合が閾値以上の文を選択する。文
1=7/10=0.7、文2=8/10=0.8とな
り、閾値が0.7であったとすると、文1、文2の両方
が選択される。
The similar candidate sentence extracting unit 301 selects a sentence containing the same word as the word of the input sentence in a predetermined threshold or more in order to omit the processing of the example sentence that is not very similar in advance.
Actually, the words of the input sentence are checked in FIG. 19, and a sentence in which the ratio of the number of matching words to the number of words of the input sentence is equal to or more than a threshold is selected. Sentence 1 = 7/10 = 0.7, sentence 2 = 8/10 = 0.8, and if the threshold value is 0.7, both sentence 1 and sentence 2 are selected.

【0064】類似候補文・入力文加工部302では、入
力文と類似候補文がより類似するように類似候補文につ
いて置換箇所の一般化や不要語句の削除や必要語句の追
加を行い、入力文について置換箇所の一般化を行う。図
22に入力文と文1の例を示す。まず、入力文、類似候
補文の解析結果から、文節ごとに表記の一致を調べる
と、において、“貴重な”、“得点 を”、“し た
。”が一致する。次に、表記が一致しない文節につい
て語句の置換を施した文節ごとに一致を調べると、に
おいて、“[固_190]が”一致する。語句の置換を
施しても一致しない文節について、その文節内で取りう
る全ての単語や置換箇所の要素の一致を調べると、文1
にはそのようなものがないのでは処理が省略される。
では、類似候補文において1度も一致しない箇所につ
いて削除可能箇所ならば削除を施し、入力文において1
度も一致しない箇所と同じ追加可能箇所が類似候補文に
あれば、類似候補文にそれを追加する。まず、入力文、
文1ともに〜で対応した箇所を調べる。次に文1の
削除可能箇所を調べると、文1に削除可能箇所はないの
で削除は行われない。次に入力文に一致していないもの
があり、文1に追加可能箇所があるので、それが一致す
るか調べる。処理は、、と同様の処理を行うが、
違いは入力文側のまだ対応が付いていない箇所と、文1
側は追加可能箇所の一致である。結果、“[接]、”で
一致するので、入力文は「(接 、 )/[固_19
0]/が/貴重な/得点/を/し/た/。」、文1は
「(接 、)/[固_190]/が/貴重な/得点/を
/し/た/。」に加工される。同様に、文2も加工する
と図23のようになる。
The similar candidate sentence / input sentence processing unit 302 generalizes replacement parts, deletes unnecessary words, and adds necessary words so that the input sentence and the similar candidate sentence are more similar to each other. Is generalized for the replacement part. FIG. 22 shows an example of the input sentence and the sentence 1. First, from the analysis result of the input sentence and the similar candidate sentence, when the matching of the notation is checked for each phrase, "precious", "score", and "do" match. Next, when a match is checked for each phrase in which the notation does not match, the phrase is replaced with “[Gu_190]”. When a phrase that does not match even after the replacement of a phrase is checked for a match between all possible words in the phrase and the element of the replacement part, sentence 1
If there is no such, the processing is omitted.
In a similar candidate sentence, a portion that does not match at least once is deleted if it can be deleted.
If there is an addable part in the similar candidate sentence that is the same as the part that does not match the degree, it is added to the similar candidate statement. First, the input sentence,
In the first sentence 1, the corresponding part is checked by. Next, when the deleteable portion of the sentence 1 is examined, there is no deleteable portion in the sentence 1, so the deletion is not performed. Next, there is an input sentence that does not match, and since there is an addable part in sentence 1, it is checked whether it matches. The processing is the same as that of,
The difference is that there is no correspondence on the input sentence side and sentence 1
The side is a match of the addable part. As a result, since “[tangent]” matches, the input sentence is “(tangent,) / [fix_19”
0] / was / precious / score / was / was /. The sentence 1 is processed into “(contact,) / [solid_190] / is / precious / score / is / is / was /.”. Similarly, sentence 2 is processed as shown in FIG.

【0065】次に、類似度計算部303において、加工
した入力文と類似候補文を類似度計算する。
Next, the similarity calculator 303 calculates the similarity between the processed input sentence and the similar candidate sentence.

【0066】結果、図23にあるように、文1が類似類
として選択される。
As a result, as shown in FIG. 23, sentence 1 is selected as a similar class.

【0067】最後に、図1の出力部4において類似文と
類似文の訳文が出力される。この例では、文1「中山が
貴重な得点をした。」と「Nakayama adde
da valuable goal.」が出力される。
仮に、類似度が同じなのが複数あった場合は、図23に
示すように、での削除後の要素(単語や置換可能箇
所)一致度、での要素一致度、での文節一致度、
での文節一致度の順に比較し、類似度に差が出た時点で
類似度の高い方を選択する。要素の一致とは、文節中に
含まれる単語や置換箇所のレベルで一致を行うことであ
る。
Finally, the output unit 4 of FIG. 1 outputs a similar sentence and a translated sentence of the similar sentence. In this example, sentence 1 “Nakayama scored a valuable score.” And “Nakayama adde”
da value goal. Is output.
If there is a plurality of similarities, the element matching degree of the element (word or replaceable part) after deletion, the phrase matching degree of
Are compared in the order of the degree of phrase matching in the above, and when a difference is found in the similarity, the one with the higher similarity is selected. Element matching refers to matching at the level of a word or replacement part included in a phrase.

【0068】もし、図13の類似候補文・入力文加工部
302において、類似候補文に追加対象を扱えなけれ
ば、文1に「[接、]」を追加できなくなるので、文1
の類似度が2×8/(10+8)=0.88となり、文
全体の意味として文1よりも入力文に似ていない文2が
選択されてしまう。
If the similar candidate sentence / input sentence processing unit 302 shown in FIG. 13 cannot handle an addition target in the similar candidate sentence, it is not possible to add “[contact,]” to sentence 1.
Is 2.times.8 / (10 + 8) = 0.88, and sentence 2 which is less similar to the input sentence than sentence 1 is selected as the meaning of the entire sentence.

【0069】この例では、置換、削除のみならず、追加
も考慮することで、より細かな類似度計算をすることが
できることを示した。
In this example, it has been shown that a more detailed similarity calculation can be performed by considering not only replacement and deletion but also addition.

【0070】[実施例3]実施例1,2と同様に説明す
る。
[Embodiment 3] A description will be given in the same manner as Embodiments 1 and 2.

【0071】まず、事前に図1の用例部5にあるデータ
を準備する。
First, data in the example section 5 of FIG. 1 is prepared in advance.

【0072】図24を図1の対訳用例集40の対訳用例
とすると、図1の用例解析・情報付与部50内におい
て、図2(a)の用例文の解析で形態素解析処理により
図25が作成される。図25は文節ごとに“|”で、品
詞ごとに“/”で区切っており、品詞、型の番号、活用
形を記している。また、図25と用例非依存の語句デー
タ10と用例依存の語句データ20を用いて、用例ごと
に置換、削除、追加可能箇所を調べ、情報を付与すると
図26が作成される。このとき同時に、各対訳用例の文
ごとに含まれている単語と文番号の対応表(図27)も
作成される。図26と図27は、図1の解析済み対訳用
例集60に蓄積される。
If FIG. 24 is a bilingual example of the bilingual example set 40 of FIG. 1, the example analysis / information adding unit 50 of FIG. 1 analyzes the example sentence of FIG. Created. In FIG. 25, each phrase is delimited by "|" and each part of speech is delimited by "/", and the part of speech, type number, and inflected form are described. Further, using FIG. 25, the example-independent phrase data 10 and the example-dependent phrase data 20, the replaceable, deleteable, and addable portions are checked for each example, and information is added, thereby creating FIG. 26. At this time, a correspondence table (FIG. 27) between words and sentence numbers included in each sentence of each translation example is also created. 26 and 27 are stored in the analyzed bilingual example collection 60 of FIG.

【0073】用例依存の語句データは、この例の対訳用
例が図24であるため、図5(a)が利用される。
FIG. 24A is used as the example-dependent phrase data because the bilingual example of this example is shown in FIG.

【0074】次に、実際の処理において、図1の入力部
1から入力文「中山が30分に貴重な得点をした。」が
入力され、図1の解析・情報付与部2内において、図2
(b)の入力文の解析で形態素解析処理により図28が
作成される。また、置換可能箇所と類似候補文との追加
一致箇所を調べ、情報を付与すると図29が作成され
る。
Next, in the actual processing, the input sentence “Nakayama scored a valuable score in 30 minutes.” Is input from the input unit 1 of FIG. 2
FIG. 28 is created by the morphological analysis processing in the analysis of the input sentence of (b). In addition, FIG. 29 is created by examining an additional matching portion between the replaceable portion and the similar candidate sentence and adding information.

【0075】次に、図1の検索部3において、入力文と
図24の用例文(日本語用例)から入力文の類似文を検
索する。図13に沿って説明する。
Next, the search unit 3 of FIG. 1 searches the input sentence and the example sentence (Japanese example) of FIG. 24 for a similar sentence of the input sentence. This will be described with reference to FIG.

【0076】類似候補文抽出部301では、事前にあま
り似ていない用例文の処理を省くために、入力文の単語
と同じ単語が所定の閾値以上含んでいる文を選択する。
実際は入力文の単語を図27で調べ、入力文の単語数に
おける一致単語数の割合が閾値以上の文を選択する。文
1=9/11=0.81、文2=10/11=0.90
となり、閾値が0.7であったとすると、文1、文2の
両方が選択される。
The similar candidate sentence extracting unit 301 selects a sentence in which the same word as the word of the input sentence exceeds a predetermined threshold value in order to omit the processing of an example sentence that is not very similar in advance.
Actually, the words of the input sentence are checked in FIG. 27, and a sentence in which the ratio of the number of matching words to the number of words of the input sentence is equal to or more than a threshold is selected. Sentence 1 = 9/11 = 0.81, Sentence 2 = 10/11 = 0.90
Assuming that the threshold is 0.7, both sentence 1 and sentence 2 are selected.

【0077】類似候補文・入力文加工部302では、各
類似候補文と入力文がより類似するように類似候補文に
ついて置換箇所の一般化や不要語句の削除や必要語句の
追加を行い、入力文について置換箇所の一般化を行う。
実施例1、実施例2と同様な方法で処理を行った結果、
図30に示す文のようになる。
The similar candidate sentence / input sentence processing unit 302 generalizes replacement parts, deletes unnecessary words, and adds necessary words so as to make each similar candidate sentence more similar to the input sentence. Generalize the replacement part of the sentence.
As a result of performing processing in the same manner as in Example 1 and Example 2,
The sentence is as shown in FIG.

【0078】類似度計算部303では、加工した入力文
と類似候補文を類似度計算する。
The similarity calculator 303 calculates the similarity between the processed input sentence and the similar candidate sentence.

【0079】結果、図30にあるように、文1が類似文
として選択される。
As a result, as shown in FIG. 30, sentence 1 is selected as a similar sentence.

【0080】最後に、図1の出力部4において類似文と
類似文の訳文が出力される。この例では、文1「中山が
開始直後貴重な得点をした。」と「After beg
inning,Nakayama added a v
aluable goal.」が出力される。仮に、類
似度が同じものが複数あった場合は、図30に示すよう
に、での削除後の要素(単語や置換可能箇所)一致
度、での要素一致度、での文節一致度、での文節
一致度の順に比較し、類似度に差が出た時点で類似度の
高い方を選択する。要素の一致とは、文節中に含まれる
単語や置換箇所のレベルで一致を行うことである。
Finally, the output unit 4 of FIG. 1 outputs a similar sentence and a translated sentence of the similar sentence. In this example, the sentence 1 “Nakayama scored a valuable score immediately after the start” and “After beg”
inning, Nakayama added a v
available goal. Is output. If there is a plurality of items having the same similarity, as shown in FIG. 30, the element (word or replaceable part) coincidence after deletion, the phrase coincidence with Are compared in the order of the phrase coincidence, and when the similarity differs, the one with the higher similarity is selected. Element matching refers to matching at the level of a word or replacement part included in a phrase.

【0081】もし、図13の類似候補文・入力文加工部
302において、置換対象として複数の形態素列を1つ
の置換対象として扱えないならば、文1の類似度が2×
9/(11+11)=0.81となり、文全体の意味と
して他の文よりも入力文に似ていない文2が選択されて
しまう。
If the similar candidate sentence / input sentence processing unit 302 in FIG. 13 cannot handle a plurality of morpheme strings as one to be replaced as one to be replaced, the similarity of sentence 1 is 2 ×
9 / (11 + 11) = 0.81, and sentence 2 is selected as the meaning of the entire sentence that is less similar to the input sentence than other sentences.

【0082】この例では、置換可能箇所の単位を複数の
単語も許すことで、より細かな点を考慮した類似度計算
をすることができることを示した。
In this example, it has been shown that the similarity calculation can be performed in consideration of more detailed points by allowing a plurality of words as the unit of the replaceable portion.

【0083】尚、図1の検索部3は、類似度の最も高い
類似候補文に加え、類似度が高い方から所定の数の類似
候補文を類似文として抽出するようにしてもよい。
The search unit 3 shown in FIG. 1 may extract a predetermined number of similar candidate sentences in descending order of similarity as similar sentences in addition to the similar candidate sentences having the highest similarity.

【0084】また、本発明における類似文検索方法は、
具体的にはパーソナルコンピュータ(PC)等のコンピ
ュータにより、予め所定のコンピュータ読み取り可能な
記録媒体に記録された類似文検索プログラムに基づいて
実行される。すなわち、例文集から入力文の類似文を検
索する類似文検索プログラムを記録したコンピュータ読
み取り可能な記録媒体において、例文集の類似候補文に
ついて事前に文法的もしくは意味的に置換、削除、追加
が可能な箇所に各情報を付与し、入力文にも同様に置換
が可能な箇所や類似候補文の追加可能箇所との一致が可
能な箇所に各情報を付与した上で、入力文と類似候補文
との類似度計算の際に、各文の差分箇所に対しての同種
の置換箇所の一致、不要箇所の削除や不足箇所の追加を
考慮した処理を行い、最も類似度の高い類似候補文を類
似文として類似度とともに抽出する処理をコンピュータ
に実行させる。
The similar sentence retrieval method according to the present invention
Specifically, it is executed by a computer such as a personal computer (PC) based on a similar sentence search program recorded in a predetermined computer-readable recording medium in advance. In other words, similar candidate sentences in the example sentence can be grammatically or semantically replaced, deleted, or added in advance on a computer-readable recording medium that records a similar sentence search program that searches the example sentence for a similar sentence of the input sentence. Information is added to the input sentence, and the input sentence and the similar candidate sentence are added after adding each information to a place where the input sentence can be replaced or a similar candidate sentence can be added. When calculating the similarity, the similar candidate sentence with the highest similarity is determined by matching the same type of replacement with the difference of each sentence, deleting unnecessary parts, and adding missing parts. The computer is caused to execute a process of extracting the similar sentence together with the similarity.

【0085】また本発明は、前記類似文検索プログラム
を記録した記録媒体において、類似度の最も高い類似候
補文に加え、類似度が高い方から所定の数の類似候補文
を類似文として出力する処理をコンピュータに実行させ
る。
According to the present invention, a predetermined number of similar candidate sentences having the highest similarity are output as similar sentences in addition to the similar candidate sentences having the highest similarity in the recording medium storing the similar sentence search program. Causes the computer to execute the processing.

【0086】また本発明は、前記類似文検索プログラム
を記録した記録媒体において、置換、削除、追加の情報
を付与するための基となるデータとして、汎用的に利用
できるものと、文書の分野に依存するものに分けて各デ
ータを作成し、文書分野に依存するデータの自動作成に
おいて、既存の汎用的もしくは分野依存のデータを用い
て情報を例文に付与し、置換可能かつ削除可能な箇所を
削った例文集から類似している文を集め、文中の置換情
報が付与されていない箇所で、その前後の箇所の表記や
置換の種類が一致しており、該当箇所の情報が同じで表
記の異なるものの集合を新たな置換対象のデータとして
作成し、同時に、新たな置換対象のデータと前後の表記
などを考慮して、新たな削除対象のデータを作成する処
理をコンピュータに実行させる。
The present invention also relates to a recording medium on which the similar sentence search program is recorded, which can be generally used as data as a base for adding replacement, deletion, and additional information, as well as in the field of documents. Create each data separately for dependent items, and in the automatic creation of data that depends on the document field, add information to example sentences using existing general-purpose or field-dependent data, and specify places that can be replaced and deleted Collect similar sentences from the cut example sentence collection, and in places where replacement information is not added in the sentence, the notation and replacement type of the preceding and following parts match, and the information of the corresponding part is the same and the notation The computer creates a set of different data as new data to be replaced and, at the same time, creates new data to be deleted in consideration of the new data to be replaced and the notation before and after. To be executed.

【0087】また本発明は、前記類似文検索プログラム
を記録した記録媒体において、類似候補文について、例
文集の文が大量にある場合に、入力文の語句と同じ語句
の数が所定の閾値以上である類似候補文を新たな類似候
補文とする処理をコンピュータに実行させる。
Further, according to the present invention, in a recording medium on which the similar sentence retrieval program is recorded, when there are a large number of sentences in the example sentence collection of similar candidate sentences, the number of the same phrases as the input sentence exceeds a predetermined threshold value. Is made to be executed by the computer as a new similar candidate sentence.

【0088】また本発明は、前記類似文検索プログラム
を記録した記録媒体において、例文集の各文と訳文の組
である対訳用例を用いて、入力文の類似文とその対訳を
抽出する処理をコンピュータに実行させる。
Further, according to the present invention, a process for extracting a similar sentence of an input sentence and a bilingual translation thereof on a recording medium storing the similar sentence search program, using a bilingual example which is a set of each sentence of the example sentence collection and a translated sentence. Let the computer run.

【0089】[0089]

【発明の効果】以上述べたように本発明によれば、第1
自然言語の文のみ、もしくは、それに対応する第2自然
言語の文の組を含む対訳例文集を用いて、読み込まれた
第1自然言語の入力文から類似文を選択する際に、解析
された入力文と、解析された対訳用例文の類似度の比較
において、表記そのままや置換や削除だけでなく追加も
考慮し、また、置換、削除、追加の単位を複数単語列も
考慮するので、表記だけでは類似度が高くない場合で
も、文法的または意味的に類似した文が検索できる。ま
た、一部の置換や削除や追加の情報を付与するための元
となるデータを特定分野の対訳用例から自動的に得るこ
とができる。また、入力文に似ていない文をあらかじめ
削除することで、類似度計算の時間を短縮できる。
As described above, according to the present invention, the first
When a similar sentence is selected from the input sentence of the first natural language using the bilingual example sentence including only the sentence of the natural language or the corresponding set of sentences of the second natural language, the sentence is analyzed. When comparing the similarity between the input sentence and the analyzed bilingual example sentence, not only the notation itself, replacement or deletion, but also addition is taken into consideration, and replacement, deletion, and addition units are also considered for multiple word strings. Even if the degree of similarity is not high by itself, sentences that are grammatically or semantically similar can be searched. In addition, it is possible to automatically obtain data serving as a base for providing partial replacement, deletion, or additional information from a bilingual translation example in a specific field. In addition, by deleting sentences that are not similar to the input sentence in advance, the time for calculating the similarity can be reduced.

【0090】また、特に対訳用例文の訳文を編集して入
力文の翻訳を行う処理の一部として、編集が容易で適切
な訳文となる対訳用例を選択するのに利用できる。
Further, as a part of the process of editing the translation of the bilingual example sentence and translating the input sentence, the present invention can be used to select a bilingual example which is easy to edit and which becomes an appropriate translated sentence.

【図面の簡単な説明】[Brief description of the drawings]

【図1】本発明の実施形態例に係る類似文検索方法によ
る処理手順および類似文検索装置の構成説明図である。
FIG. 1 is an explanatory diagram of a processing procedure by a similar sentence search method and a configuration of a similar sentence search device according to an embodiment of the present invention.

【図2】本発明の実施例に係る解析処理の説明図であ
る。
FIG. 2 is an explanatory diagram of an analysis process according to the embodiment of the present invention.

【図3】本発明の実施例に係る対訳用例集の例(1)を
示す説明図である。
FIG. 3 is an explanatory diagram showing an example (1) of a bilingual example book according to the embodiment of the present invention.

【図4】本発明の実施例に係る用例非依存の語句データ
の例を示す説明図である。
FIG. 4 is an explanatory diagram showing an example of example-independent phrase data according to the embodiment of the present invention.

【図5】本発明の実施例に係る用例依存の語句データの
例を示す説明図である。
FIG. 5 is an explanatory diagram showing an example of example-dependent phrase data according to the embodiment of the present invention.

【図6】本発明の実施例に係る用例依存の語句データの
自動抽出例を示す説明図である。
FIG. 6 is an explanatory diagram showing an example of automatic extraction of example-dependent phrase data according to the embodiment of the present invention.

【図7】本発明の実施例に係る対訳用例集の例(1)の
日本語用例の形態素解析結果を示す説明図である。
FIG. 7 is an explanatory diagram showing a morphological analysis result of a Japanese example of an example (1) of a bilingual example collection according to the embodiment of the present invention.

【図8】本発明の実施例に係る対訳用例集の例(1)の
日本語用例の解析済み用例集を示す説明図である。
FIG. 8 is an explanatory diagram showing an analyzed example collection of Japanese examples in example (1) of a bilingual example collection according to the embodiment of the present invention.

【図9】本発明の実施例に係る品詞、区分の説明図であ
る。
FIG. 9 is an explanatory diagram of part of speech and division according to the embodiment of the present invention.

【図10】本発明の実施例に係る対訳用例集の例(1)
の日本語用例における各文に含まれている単語と文番号
の対応を示す説明図である。
FIG. 10 is an example of a bilingual example collection according to the embodiment of the present invention (1).
FIG. 8 is an explanatory diagram showing correspondence between words and sentence numbers included in each sentence in the Japanese example of FIG.

【図11】本発明の実施例に係る入力文の形態素解析結
果(1)を示す説明図である。
FIG. 11 is an explanatory diagram showing a morphological analysis result (1) of an input sentence according to the embodiment of the present invention.

【図12】本発明の実施例に係る入力文の解析結果
(1)を示す説明図である。
FIG. 12 is an explanatory diagram showing an analysis result (1) of an input sentence according to the embodiment of the present invention.

【図13】本発明の実施例に係る検索部を示す構成説明
図である。
FIG. 13 is an explanatory diagram illustrating a configuration of a search unit according to an embodiment of the present invention.

【図14】本発明の実施例に係る類似候補用例文と入力
文の加工の例(1)を示す説明図である。
FIG. 14 is an explanatory diagram showing an example (1) of processing a similar candidate example sentence and an input sentence according to the embodiment of the present invention.

【図15】本発明の実施例に係る類似度計算に利用する
文と類似度の例(1)を示す説明図である。
FIG. 15 is an explanatory diagram showing an example (1) of a sentence used for similarity calculation and a similarity according to the embodiment of the present invention.

【図16】本発明の実施例に係る対訳用例集の例(2)
を示す説明図である。
FIG. 16 shows an example of a bilingual translation example (2) according to the embodiment of the present invention.
FIG.

【図17】本発明の実施例に係る対訳用例集の例(2)
の日本語用例の形態素解析結果を示す説明図である。
FIG. 17 shows an example of a bilingual translation example according to the embodiment of the present invention (2).
FIG. 10 is an explanatory diagram showing a morphological analysis result of the Japanese example.

【図18】本発明の実施例に係る対訳用例集の例(2)
の日本語用例の解析済み用例集を示す説明図である。
FIG. 18 is an example (2) of a bilingual translation example book according to the embodiment of the present invention.
It is explanatory drawing which shows the analyzed example collection of Japanese examples.

【図19】本発明の実施例に係る対訳用例集の例(2)
の日本語用例における各文に含まれている単語と文番号
の対応を示す説明図である。
FIG. 19 is an example of a bilingual translation example according to the embodiment of the present invention (2).
FIG. 8 is an explanatory diagram showing correspondence between words and sentence numbers included in each sentence in the Japanese example of FIG.

【図20】本発明の実施例に係る入力文の形態素解析結
果(2)を示す説明図である。
FIG. 20 is an explanatory diagram showing a morphological analysis result (2) of an input sentence according to the embodiment of the present invention.

【図21】本発明の実施例に係る入力文の解析結果
(2)を示す説明図である。
FIG. 21 is an explanatory diagram showing an analysis result (2) of an input sentence according to the embodiment of the present invention.

【図22】本発明の実施例に係る類似候補用例文と入力
文の加工の例(2)を示す説明図である。
FIG. 22 is an explanatory diagram showing an example (2) of processing a similar candidate example sentence and an input sentence according to the embodiment of the present invention.

【図23】本発明の実施例に係る類似度計算に利用する
文と類似度の例(2)を示す説明図である。
FIG. 23 is an explanatory diagram showing an example (2) of a sentence used for similarity calculation and a similarity according to the embodiment of the present invention.

【図24】本発明の実施例に係る対訳用例集の例(3)
を示す説明図である。
FIG. 24 is an example of a bilingual example book according to the embodiment of the present invention (3).
FIG.

【図25】本発明の実施例に係る対訳用例集の例(3)
の日本語用例の形態素解析結果を示す説明図である。
FIG. 25 is an example (3) of a bilingual translation example book according to the embodiment of the present invention.
FIG. 10 is an explanatory diagram showing a morphological analysis result of the Japanese example.

【図26】本発明の実施例に係る対訳用例集の例(3)
の日本語用例の解析済み用例集を示す説明図である。
FIG. 26 is an example (3) of a bilingual translation example book according to the embodiment of the present invention.
It is explanatory drawing which shows the analyzed example collection of Japanese examples.

【図27】本発明の実施例に係る対訳用例集の例(3)
の日本語用例における各文に含まれている単語と文番号
の対応を示す説明図である。
FIG. 27 is an example (3) of a bilingual translation example book according to the embodiment of the present invention.
FIG. 8 is an explanatory diagram showing correspondence between words and sentence numbers included in each sentence in the Japanese example of FIG.

【図28】本発明の実施例に係る入力文の形態素解析結
果(3)を示す説明図である。
FIG. 28 is an explanatory diagram showing a morphological analysis result (3) of an input sentence according to the embodiment of the present invention.

【図29】本発明の実施例に係る入力文の解析結果
(3)を示す説明図である。
FIG. 29 is an explanatory diagram showing an analysis result (3) of an input sentence according to the embodiment of the present invention.

【図30】本発明の実施例に係る類似度計算に利用する
文と類似度の例(3)を示す説明図である。
FIG. 30 is an explanatory diagram showing an example (3) of a sentence used for similarity calculation and a similarity according to the embodiment of the present invention.

【符号の説明】[Explanation of symbols]

1 入力部 2 解析・情報付与部 3 検索部 4 出力部 5 用例部 10 用例非依存の語句データ 20 用例依存の語句データ 30 データ作成部 40 対訳用例集 50 用例解析・情報付与部 60 解析済み対訳用例集 301 類似候補文抽出部 302 類似候補文・入力文加工部 303 類似度計算部 DESCRIPTION OF SYMBOLS 1 Input part 2 Analysis and information provision part 3 Retrieval part 4 Output part 5 Example part 10 Example-independent phrase data 20 Example-dependent phrase data 30 Data creation part 40 Parallel translation example collection 50 Example analysis and information provision part 60 Analyzed translation Example collection 301 Similar candidate sentence extraction unit 302 Similar candidate sentence / input sentence processing unit 303 Similarity calculation unit

Claims (15)

【特許請求の範囲】[Claims] 【請求項1】 例文集から入力文の類似文を検索する類
似文検索方法において、例文集の類似候補文について事
前に文法的もしくは意味的に置換、削除、追加が可能な
箇所に各情報を付与し、入力文にも同様に置換が可能な
箇所や類似候補文の追加可能箇所との一致が可能な箇所
に各情報を付与した上で、入力文と類似候補文との類似
度計算の際に、各文の差分箇所に対しての同種の置換箇
所の一致、不要箇所の削除や不足箇所の追加を考慮した
処理を行い、最も類似度の高い類似候補文を類似文とし
て類似度とともに抽出することを特徴とする類似文検索
方法。
In a similar sentence search method for searching for a similar sentence of an input sentence from an example sentence collection, each information is stored in a place where a similar candidate sentence of an example sentence can be grammatically or semantically replaced, deleted, or added in advance. After assigning each information to a place where the input sentence can be replaced and a place where a similar candidate sentence can be added, a similarity calculation between the input sentence and the similar candidate sentence is performed. At this time, processing is performed in consideration of matching of the same type of replacement part with respect to the difference part of each sentence, deletion of unnecessary parts and addition of missing parts, and the similar candidate sentence with the highest similarity as a similar sentence along with similarity A similar sentence search method characterized by extraction.
【請求項2】 類似度の最も高い類似候補文に加え、類
似度が高い方から所定の数の類似候補文を類似文として
出力することを特徴とする請求項1記載の類似文検索方
法。
2. The similar sentence search method according to claim 1, wherein a predetermined number of similar candidate sentences having the highest similarity are output as similar sentences in addition to the similar candidate sentences having the highest similarity.
【請求項3】 置換、削除、追加の情報を付与するため
の基となるデータとして、汎用的に利用できるものと、
文書の分野に依存するものに分けて各データを作成し、
文書分野に依存するデータの自動作成において、既存の
汎用的もしくは分野依存のデータを用いて情報を例文に
付与し、置換可能かつ削除可能な箇所を削った例文集か
ら類似している文を集め、文中の置換情報が付与されて
いない箇所で、その前後の箇所の表記や置換の種類が一
致しており、該当箇所の情報が同じで表記の異なるもの
の集合を新たな置換対象のデータとして作成し、同時
に、新たな置換対象のデータと前後の表記などを考慮し
て、新たな削除対象のデータとして作成することを特徴
とする請求項1又は2記載の類似文検索方法。
3. Data that can be universally used as base data for adding replacement, deletion, and additional information;
Create each data separately according to the field of the document,
In the automatic creation of data that depends on the document field, information is added to the example sentences using existing general-purpose or field-dependent data, and similar sentences are collected from the example sentence collection in which replaceable and deleteable parts have been deleted. Creates a set of data with the same information but different notation at the place where the replacement information in the sentence is not added and the place before and after it is the same, and the information of the place is the same as the new data to be replaced 3. A similar sentence search method according to claim 1 or 2, wherein the similar sentence search method is created as new data to be deleted in consideration of the new data to be replaced and the notation before and after the new data.
【請求項4】 類似候補文について、例文集の文が大量
にある場合に、入力文の語句と同じ語句の数が所定の閾
値以上である類似候補文を新たな類似候補文とすること
を特徴とする請求項1、2叉は3記載の類似文検索方
法。
4. When there are a large number of sentences in an example sentence collection of similar candidate sentences, a similar candidate sentence in which the number of words identical to the words of the input sentence is equal to or greater than a predetermined threshold value is determined as a new similar candidate sentence. 4. The similar sentence search method according to claim 1, 2 or 3.
【請求項5】 例文集の各文と訳文の組である対訳用例
を用いて、入力文の類似文とその対訳を抽出することを
特徴とする請求項1、2、3叉は4記載の類似文検索方
法。
5. A similar sentence of an input sentence and a bilingual translation thereof are extracted by using a bilingual example which is a set of each sentence and a translated sentence of an example sentence collection. Similar sentence search method.
【請求項6】 用例文を複数保存した用例部と、入力文
を読み込む入力手段と、前記用例部の用例文から得られ
る類似候補文を語句単位に解析し、文法的もしくは意味
的に置換、削除、追加が可能な箇所に各情報を付与する
用例解析・情報付与手段と、前記入力手段によって読み
込まれた入力文を語句単位に解析し、文法的もしくは意
味的に置換が可能な箇所や類似候補文の追加可能箇所と
の一致が可能な箇所に各情報を付与する解析・情報付与
手段と、解析された類似候補文について、入力文と類似
候補文との類似度計算の際に、各文の差分箇所に対して
同種の置換箇所の一致、不要箇所の削除や不足箇所の追
加を考慮した上で類似度を計算し、最も類似度が高い類
似候補文を類似文として抽出する検索手段、前記検索手
段により抽出された類似文を類似度とともに出力する出
力手段とを有することを特徴とする類似文検索装置。
6. An example section in which a plurality of example sentences are stored, input means for reading an input sentence, and a similar candidate sentence obtained from the example sentence of the example section is analyzed in terms of phrases, and grammatically or semantically replaced. Example analysis / information addition means for adding each information to a place where deletion and addition are possible, and an input sentence read by the input means are analyzed in terms of words and phrases, grammatically or semantically replaceable parts or similar. An analysis and information adding means for adding each information to a position where the candidate sentence can be matched with an addable portion; and for calculating the similarity between the input sentence and the similar candidate sentence for the analyzed similar candidate sentence, A search unit that calculates the similarity in consideration of matching of the same type of replacement part with respect to the difference part of the sentence, deletion of an unnecessary part and addition of a missing part, and extracts a similar candidate sentence having the highest similarity as a similar sentence. , Extracted by the search means An output unit for outputting a similar sentence together with the degree of similarity.
【請求項7】 検索手段が、類似度の最も高い類似候補
文に加え、類似度が高い方から所定の数の類似候補文を
類似文として抽出することを特徴とする請求項1記載の
類似文検索装置。
7. The similarity according to claim 1, wherein the retrieval means extracts a predetermined number of similar candidate sentences from the one with the highest similarity as similar sentences in addition to the similar candidate sentences with the highest similarity. Sentence search device.
【請求項8】 置換、削除、追加の情報の付与におい
て、基となるデータとして、汎用的に利用できるもの
と、文書の分野に依存するものに分けて各データを記述
しておき、文書の分野に依存するデータの自動作成にお
いて、既存の汎用的もしくは分野依存のデータを用いて
置換可能かつ削除可能な箇所を削った例文集の文から類
似している文を集め、文中の置換情報が付与されていな
い箇所で、その前後の箇所の表記や置換の種類が一致し
ており、該当箇所の情報が同じで表記の異なるものの集
合を新たな置換対象のデータとして作成し、同時に、新
たな置換対象のデータと前後の表記などを考慮して、新
たな削除対象のデータとして作成するデータ作成手段を
有することを特徴とする請求項6叉は7記載の類似文検
索装置。
8. When replacing, deleting, and adding additional information, each data is described separately as data that can be used for general purposes and data that depends on the field of the document. In the automatic creation of domain-dependent data, similar sentences are collected from the sentences in the example sentence collection in which replaceable and deleteable parts are deleted using existing general-purpose or field-dependent data, and the replacement information in the sentence is collected. In the parts that are not assigned, the notation and replacement types before and after that are the same, a set of items with the same information but different notations is created as new data to be replaced, and at the same time, a new 8. The similar sentence retrieval apparatus according to claim 6, further comprising data creation means for creating data to be newly deleted in consideration of the data to be replaced and the notation before and after.
【請求項9】 検索手段において、事前に入力文の語句
と同じ語句の数が所定の閾値以上の文を類似候補文とし
て検索対象とすることを特徴とする請求項6、7叉は8
記載の類似文検索装置。
9. The search means according to claim 6, 7, or 8, wherein a sentence in which the number of words in the input sentence is equal to or greater than a predetermined threshold value is determined as a similar candidate sentence in advance.
Similar sentence search device.
【請求項10】 用例文に対して訳文が対応づけられた
対訳用例を用いた場合に、前記検索手段により抽出され
た類似文とその訳文を出力する出力手段とを有すること
を特徴とする請求項6、7、8叉は9記載の類似文検索
装置。
10. A parallel sentence in which a translated sentence is associated with an example sentence, comprising a similar sentence extracted by the search unit and an output unit for outputting the translated sentence. Item 6, 7, 8 or 9 similar sentence retrieval apparatus.
【請求項11】 例文集から入力文の類似文を検索する
類似文検索プログラムを記録した記録媒体において、例
文集の類似候補文について事前に文法的もしくは意味的
に置換、削除、追加が可能な箇所に各情報を付与し、入
力文にも同様に置換が可能な箇所や類似候補文の追加可
能箇所との一致が可能な箇所に各情報を付与した上で、
入力文と類似候補文との類似度計算の際に、各文の差分
箇所に対しての同種の置換箇所の一致、不要箇所の削除
や不足箇所の追加を考慮した処理を行い、最も類似度の
高い類似候補文を類似文として類似度とともに抽出する
処理をコンピュータに実行させるための類似文検索プロ
グラムを記録したコンピュータ読み取り可能な記録媒
体。
11. A storage medium storing a similar sentence search program for searching a similar sentence of an input sentence from a set of example sentences, in which a similar candidate sentence of the set of example sentences can be grammatically or semantically replaced, deleted, or added in advance. Each information is added to the location, and each information is added to a location where the input sentence can be replaced and a location where a similar candidate sentence can be added can be matched.
When calculating the similarity between the input sentence and the similar candidate sentence, processing is performed in consideration of the matching of the same type of replacement part with the difference part of each sentence, the deletion of unnecessary parts, and the addition of missing parts. A computer-readable recording medium storing a similar sentence search program for causing a computer to execute a process of extracting a similar candidate sentence having a high similarity as a similar sentence together with the similarity.
【請求項12】 請求項11記載の類似文検索プログラ
ムを記録した記録媒体において、類似度の最も高い類似
候補文に加え、類似度が高い方から所定の数の類似候補
文を類似文として出力する処理をコンピュータに実行さ
せるための類似文検索プログラムを記録したコンピュー
タ読み取り可能な記録媒体。
12. A storage medium storing the similar sentence search program according to claim 11, wherein a predetermined number of similar candidate sentences having the highest similarity are output as similar sentences in addition to the similar candidate sentences having the highest similarity. And a computer-readable recording medium storing a similar sentence search program for causing a computer to execute a process to be performed.
【請求項13】 請求項11又は12記載の類似文検索
プログラムを記録した記録媒体において、置換、削除、
追加の情報を付与するための基となるデータとして、汎
用的に利用できるものと、文書の分野に依存するものに
分けて各データを作成し、文書分野に依存するデータの
自動作成において、既存の汎用的もしくは分野依存のデ
ータを用いて情報を例文に付与し、置換可能かつ削除可
能な箇所を削った例文集から類似している文を集め、文
中の置換情報が付与されていない箇所で、その前後の箇
所の表記や置換の種類が一致しており、該当箇所の情報
が同じで表記の異なるものの集合を新たな置換対象のデ
ータとして作成し、同時に、新たな置換対象のデータと
前後の表記などを考慮して、新たな削除対象のデータと
して作成する処理をコンピュータに実行させるための類
似文検索プログラムを記録したコンピュータ読み取り可
能な記録媒体。
13. A recording medium on which the similar sentence retrieval program according to claim 11 or 12 is recorded, wherein replacement, deletion,
The data to be used as a basis for adding additional information is divided into data that can be used for general purposes and data that depends on the field of the document. Information is given to example sentences using general-purpose or field-dependent data, and similar sentences are collected from an example sentence collection in which replaceable and deleteable parts are deleted. , The notation and the type of replacement before and after it match, and a set of items with the same information but different notations is created as new data to be replaced, and at the same time, the new data to be replaced A computer-readable recording medium that records a similar sentence search program for causing a computer to execute a process of creating new data to be deleted in consideration of the notation of a document.
【請求項14】 請求項11、12又は13記載の類似
文検索プログラムを記録した記録媒体において、類似候
補文について、例文集の文が大量にある場合に、入力文
の語句と同じ語句の数が所定の閾値以上である類似候補
文を新たな類似候補文とする処理をコンピュータに実行
させるための類似文検索プログラムを記録したコンピュ
ータ読み取り可能な記録媒体。
14. A recording medium on which the similar sentence retrieval program according to claim 11, 12 or 13 is recorded, wherein, when there are a large number of sentences in an example sentence collection, the number of phrases that are the same as the words in the input sentence for similar candidate sentences. A computer-readable recording medium storing a similar sentence search program for causing a computer to execute a process of setting a similar candidate sentence having a value equal to or greater than a predetermined threshold value as a new similar candidate sentence.
【請求項15】 請求項11、12、13又は14記載
の類似文検索プログラムを記録した記録媒体において、
例文集の各文と訳文の組である対訳用例を用いて、入力
文の類似文とその対訳を抽出する処理をコンピュータに
実行させるための類似文検索プログラムを記録したコン
ピュータ読み取り可能な記録媒体。
15. A recording medium on which the similar sentence retrieval program according to claim 11, 12, 13, or 14 is recorded,
A computer-readable recording medium which records a similar sentence search program for causing a computer to execute a process of extracting a similar sentence of an input sentence and a translation thereof using a translation example which is a set of each sentence of the example sentence and a translated sentence.
JP2000178367A 2000-06-14 2000-06-14 Method and device for retrieving similar sentence and recording medium having similar sentence retrieval program recorded thereon Pending JP2001357065A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2000178367A JP2001357065A (en) 2000-06-14 2000-06-14 Method and device for retrieving similar sentence and recording medium having similar sentence retrieval program recorded thereon

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP2000178367A JP2001357065A (en) 2000-06-14 2000-06-14 Method and device for retrieving similar sentence and recording medium having similar sentence retrieval program recorded thereon

Publications (1)

Publication Number Publication Date
JP2001357065A true JP2001357065A (en) 2001-12-26

Family

ID=18679810

Family Applications (1)

Application Number Title Priority Date Filing Date
JP2000178367A Pending JP2001357065A (en) 2000-06-14 2000-06-14 Method and device for retrieving similar sentence and recording medium having similar sentence retrieval program recorded thereon

Country Status (1)

Country Link
JP (1) JP2001357065A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004110835A (en) * 2002-09-19 2004-04-08 Microsoft Corp Method and system for retrieving confirmation text
JP2009080777A (en) * 2007-09-27 2009-04-16 Toshiba Corp Machine translation device and machine translation program
CN104951469A (en) * 2014-03-28 2015-09-30 株式会社东芝 Method and device for optimizing corpus
CN113505593A (en) * 2021-07-23 2021-10-15 北京中科凡语科技有限公司 Similar statement retrieval method and device, electronic equipment and readable storage medium

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004110835A (en) * 2002-09-19 2004-04-08 Microsoft Corp Method and system for retrieving confirmation text
US7974963B2 (en) 2002-09-19 2011-07-05 Joseph R. Kelly Method and system for retrieving confirming sentences
JP2009080777A (en) * 2007-09-27 2009-04-16 Toshiba Corp Machine translation device and machine translation program
JP4528818B2 (en) * 2007-09-27 2010-08-25 株式会社東芝 Machine translation apparatus and machine translation program
CN104951469A (en) * 2014-03-28 2015-09-30 株式会社东芝 Method and device for optimizing corpus
CN104951469B (en) * 2014-03-28 2018-04-06 株式会社东芝 Optimize the method and apparatus of corpus
CN113505593A (en) * 2021-07-23 2021-10-15 北京中科凡语科技有限公司 Similar statement retrieval method and device, electronic equipment and readable storage medium
CN113505593B (en) * 2021-07-23 2024-03-29 北京中科凡语科技有限公司 Similar sentence retrieval method, device, electronic equipment and readable storage medium

Similar Documents

Publication Publication Date Title
US5794177A (en) Method and apparatus for morphological analysis and generation of natural language text
KR100453227B1 (en) Similar sentence retrieval method for translation aid
US6098034A (en) Method for standardizing phrasing in a document
US20070233460A1 (en) Computer-Implemented Method for Use in a Translation System
US20050171757A1 (en) Machine translation
WO1997004405A9 (en) Method and apparatus for automated search and retrieval processing
EP1941405A2 (en) System and method for cross-language knowledge searching
JPS63231674A (en) Word processing system
JPH083815B2 (en) Natural language co-occurrence relation dictionary maintenance method
JP2001357065A (en) Method and device for retrieving similar sentence and recording medium having similar sentence retrieval program recorded thereon
Goweder et al. Identifying broken plurals in unvowelised arabic tex
JPH03132872A (en) Index information generating device
JP2005025555A (en) Thesaurus construction system, thesaurus construction method, program for executing the method, and storage medium with the program stored thereon
JPH10177575A (en) Device and method for extracting word and phrase and information storing medium
JP3419748B2 (en) Dictionary creation device and method, and recording medium recording dictionary creation program
Schwarz The TINA Project: text content analysis at the Corporate Research Laboratories at Siemens
JP3348872B2 (en) Japanese morphological analyzer
Hickey Corpus data processing with Lexa
JPH0561902A (en) Mechanical translation system
JP4262529B2 (en) Full-text search device, method, program, and recording medium
JP3508312B2 (en) Keyword extraction device
Kadam Develop a Marathi Lemmatizer for Common Nouns and Simple Tenses of Verbs
Zakharov Russian corpus of the 19th century
JPS6389976A (en) Language analyzer
JPS63109572A (en) Derivative processing system

Legal Events

Date Code Title Description
A131 Notification of reasons for refusal

Free format text: JAPANESE INTERMEDIATE CODE: A131

Effective date: 20040824

A02 Decision of refusal

Free format text: JAPANESE INTERMEDIATE CODE: A02

Effective date: 20041221